Issue 1912303002: OpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic.

Patrick Monette

pmonette@chromium.org changed reviewers: + manzagop@chromium.org

4 years, 8 months ago (2016-04-22 23:01:36 UTC) #1

Patrick Monette

This CL depends on crrev.com/1905293003 Please take a look.

4 years, 8 months ago (2016-04-22 23:01:36 UTC) #2

manzagop (departed)

Cool stuff! lgtm From a quick look, the clients of this code should be ok ...

4 years, 8 months ago (2016-04-25 14:28:27 UTC) #3

Patrick Monette

On 2016/04/25 14:28:27, manzagop wrote: > Cool stuff! lgtm > > From a quick look, ...

4 years, 7 months ago (2016-04-28 20:18:12 UTC) #4

Patrick Monette

pmonette@chromium.org changed reviewers: + gab@chromium.org

4 years, 7 months ago (2016-04-28 20:22:08 UTC) #5

Patrick Monette

gab@ Small Q for you. https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_util.cc File chrome/browser/platform_util.cc (right): https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_util.cc#newcode42 chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, On 2016/04/25 ...

4 years, 7 months ago (2016-04-28 20:22:09 UTC) #6

gab

self to CC https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_util.cc File chrome/browser/platform_util.cc (right): https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_util.cc#newcode42 chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, On 2016/04/28 20:22:09, Patrick ...

4 years, 7 months ago (2016-04-28 22:36:25 UTC) #8

Patrick Monette

pmonette@chromium.org changed reviewers: + sky@chromium.org

4 years, 7 months ago (2016-04-29 15:50:51 UTC) #9

sky

Why are we putting this behind an experiment? Also, do we have test coverage for ...

4 years, 7 months ago (2016-04-29 17:32:33 UTC) #11

Patrick Monette

Description was changed from ========== OpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic. This task ...

4 years, 7 months ago (2016-04-29 17:44:13 UTC) #12

Patrick Monette

On 2016/04/29 17:32:33, sky wrote: > Why are we putting this behind an experiment? We've ...

4 years, 7 months ago (2016-04-29 20:18:23 UTC) #13

sky

On Fri, Apr 29, 2016 at 1:18 PM, <pmonette@chromium.org> wrote: > On 2016/04/29 17:32:33, sky ...

4 years, 7 months ago (2016-05-02 15:14:17 UTC) #14

Patrick Monette

On 2016/05/02 15:14:17, sky wrote: > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> ...

4 years, 7 months ago (2016-05-02 19:36:51 UTC) #15

On 2016/05/02 15:14:17, sky wrote:
> On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org> wrote:
> > On 2016/04/29 17:32:33, sky wrote:
> >> Why are we putting this behind an experiment?
> >
> > We've identified the shutdown behavior for this task as a probable cause of
> > unclean shutdowns and want to validate that hypothesis through an
> > experiment.
> > Simply looking at CPM wouldn't allow validation due to high version to
> > version
> > noise.
> 
> This seems like an awful way to draw conclusions about a particular
> code path. If we think open item is the culprit why can't we
> investigate that code path to figure out what it may be doing?
>
It's because we are not trying to validate if the fix is correct, we are trying
to validate the impact of that change. ("We" here means manzagop, siggi and I.)

As part of our investigation on browser hangs, we're received a large number of
hang
reports and we are interested in correlating the number of report received to a
CPM
number. e.g. A bucket (bucketed by magic signature) of X% of hang reports
directly
translate to Y CPM.

With this correlation, we could more easily convince other people that their
code is responsible for
a number of hangs (with a precise number) and that they have to fix it.

The end goal for our project would be to have hang crash reports to be triaged
the same way a normal crash is, as fixing all hangs is too big of a problem for
one team to fix.
Our reports are not sufficiently actionable for this yet.
> 
> >> Also, do we have test coverage for both ways?
> >
> > We don't and I'm not sure there is a good way to test for that. The current
> > unit
> > tests doesn't test the shutdown behavior (always expect the task to run
> > fully)
> > so running a parameterized test on the value of shutdown_mode will not
> > execute
> > any different code path.
> 
> Why can't we right a test for this? Perhaps a browser test is needed
> to get the right coverage?
>
The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already
tested
in sequenced_worker_pool_unittests.cc. I think we should trust that this is
doing
the right thing already.
> 
>   -Scott
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.
> 

Let me know if you still have more questions.

sky

On 2016/05/02 19:36:51, Patrick Monette wrote: > On 2016/05/02 15:14:17, sky wrote: > > On ...

4 years, 7 months ago (2016-05-02 20:23:51 UTC) #16

On 2016/05/02 19:36:51, Patrick Monette wrote:
> On 2016/05/02 15:14:17, sky wrote:
> > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org> wrote:
> > > On 2016/04/29 17:32:33, sky wrote:
> > >> Why are we putting this behind an experiment?
> > >
> > > We've identified the shutdown behavior for this task as a probable cause
of
> > > unclean shutdowns and want to validate that hypothesis through an
> > > experiment.
> > > Simply looking at CPM wouldn't allow validation due to high version to
> > > version
> > > noise.
> > 
> > This seems like an awful way to draw conclusions about a particular
> > code path. If we think open item is the culprit why can't we
> > investigate that code path to figure out what it may be doing?
> >
> It's because we are not trying to validate if the fix is correct, we are
trying
> to validate the impact of that change. ("We" here means manzagop, siggi and
I.)
> 
> As part of our investigation on browser hangs, we're received a large number
of
> hang
> reports and we are interested in correlating the number of report received to
a
> CPM
> number. e.g. A bucket (bucketed by magic signature) of X% of hang reports
> directly
> translate to Y CPM.
> 
> With this correlation, we could more easily convince other people that their
> code is responsible for
> a number of hangs (with a precise number) and that they have to fix it.
> 
> The end goal for our project would be to have hang crash reports to be triaged
> the same way a normal crash is, as fixing all hangs is too big of a problem
for
> one team to fix.
> Our reports are not sufficiently actionable for this yet.
> > 
> > >> Also, do we have test coverage for both ways?
> > >
> > > We don't and I'm not sure there is a good way to test for that. The
current
> > > unit
> > > tests doesn't test the shutdown behavior (always expect the task to run
> > > fully)
> > > so running a parameterized test on the value of shutdown_mode will not
> > > execute
> > > any different code path.
> > 
> > Why can't we right a test for this? Perhaps a browser test is needed
> > to get the right coverage?
> >
> The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already
> tested
> in sequenced_worker_pool_unittests.cc. I think we should trust that this is
> doing
> the right thing already.

I'm asking if we have coverage of OpenItem with the different shutdown behavior.

> > 
> >   -Scott
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups
> > "Chromium-reviews" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> email
> > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > 
> 
> Let me know if you still have more questions.

sky

On 2016/05/02 20:23:51, sky wrote: > On 2016/05/02 19:36:51, Patrick Monette wrote: > > On ...

4 years, 7 months ago (2016-05-02 20:30:53 UTC) #17

On 2016/05/02 20:23:51, sky wrote:
> On 2016/05/02 19:36:51, Patrick Monette wrote:
> > On 2016/05/02 15:14:17, sky wrote:
> > > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org> wrote:
> > > > On 2016/04/29 17:32:33, sky wrote:
> > > >> Why are we putting this behind an experiment?
> > > >
> > > > We've identified the shutdown behavior for this task as a probable cause
> of
> > > > unclean shutdowns and want to validate that hypothesis through an
> > > > experiment.
> > > > Simply looking at CPM wouldn't allow validation due to high version to
> > > > version
> > > > noise.
> > > 
> > > This seems like an awful way to draw conclusions about a particular
> > > code path. If we think open item is the culprit why can't we
> > > investigate that code path to figure out what it may be doing?
> > >
> > It's because we are not trying to validate if the fix is correct, we are
> trying
> > to validate the impact of that change. ("We" here means manzagop, siggi and
> I.)
> > 
> > As part of our investigation on browser hangs, we're received a large number
> of
> > hang
> > reports and we are interested in correlating the number of report received
to
> a
> > CPM
> > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports
> > directly
> > translate to Y CPM.
> > 
> > With this correlation, we could more easily convince other people that their
> > code is responsible for
> > a number of hangs (with a precise number) and that they have to fix it.
> > 
> > The end goal for our project would be to have hang crash reports to be
triaged
> > the same way a normal crash is, as fixing all hangs is too big of a problem
> for
> > one team to fix.
> > Our reports are not sufficiently actionable for this yet.
> > > 
> > > >> Also, do we have test coverage for both ways?
> > > >
> > > > We don't and I'm not sure there is a good way to test for that. The
> current
> > > > unit
> > > > tests doesn't test the shutdown behavior (always expect the task to run
> > > > fully)
> > > > so running a parameterized test on the value of shutdown_mode will not
> > > > execute
> > > > any different code path.
> > > 
> > > Why can't we right a test for this? Perhaps a browser test is needed
> > > to get the right coverage?
> > >
> > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already
> > tested
> > in sequenced_worker_pool_unittests.cc. I think we should trust that this is
> > doing
> > the right thing already.
> 
> I'm asking if we have coverage of OpenItem with the different shutdown
behavior.
> 
> > > 
> > >   -Scott
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google Groups
> > > "Chromium-reviews" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > email
> > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > > 
> > 
> > Let me know if you still have more questions.

And again, we shouldn't be doing an experiment for this. We should understand
what is going wrong and fix the bug.

grt (UTC plus 2)

On 2016/05/02 20:30:53, sky wrote: > On 2016/05/02 20:23:51, sky wrote: > > On 2016/05/02 ...

4 years, 7 months ago (2016-05-02 20:49:47 UTC) #18

On 2016/05/02 20:30:53, sky wrote:
> On 2016/05/02 20:23:51, sky wrote:
> > On 2016/05/02 19:36:51, Patrick Monette wrote:
> > > On 2016/05/02 15:14:17, sky wrote:
> > > > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org> wrote:
> > > > > On 2016/04/29 17:32:33, sky wrote:
> > > > >> Why are we putting this behind an experiment?
> > > > >
> > > > > We've identified the shutdown behavior for this task as a probable
cause
> > of
> > > > > unclean shutdowns and want to validate that hypothesis through an
> > > > > experiment.
> > > > > Simply looking at CPM wouldn't allow validation due to high version to
> > > > > version
> > > > > noise.
> > > > 
> > > > This seems like an awful way to draw conclusions about a particular
> > > > code path. If we think open item is the culprit why can't we
> > > > investigate that code path to figure out what it may be doing?
> > > >
> > > It's because we are not trying to validate if the fix is correct, we are
> > trying
> > > to validate the impact of that change. ("We" here means manzagop, siggi
and
> > I.)
> > > 
> > > As part of our investigation on browser hangs, we're received a large
number
> > of
> > > hang
> > > reports and we are interested in correlating the number of report received
> to
> > a
> > > CPM
> > > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports
> > > directly
> > > translate to Y CPM.
> > > 
> > > With this correlation, we could more easily convince other people that
their
> > > code is responsible for
> > > a number of hangs (with a precise number) and that they have to fix it.
> > > 
> > > The end goal for our project would be to have hang crash reports to be
> triaged
> > > the same way a normal crash is, as fixing all hangs is too big of a
problem
> > for
> > > one team to fix.
> > > Our reports are not sufficiently actionable for this yet.
> > > > 
> > > > >> Also, do we have test coverage for both ways?
> > > > >
> > > > > We don't and I'm not sure there is a good way to test for that. The
> > current
> > > > > unit
> > > > > tests doesn't test the shutdown behavior (always expect the task to
run
> > > > > fully)
> > > > > so running a parameterized test on the value of shutdown_mode will not
> > > > > execute
> > > > > any different code path.
> > > > 
> > > > Why can't we right a test for this? Perhaps a browser test is needed
> > > > to get the right coverage?
> > > >
> > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is
already
> > > tested
> > > in sequenced_worker_pool_unittests.cc. I think we should trust that this
is
> > > doing
> > > the right thing already.
> > 
> > I'm asking if we have coverage of OpenItem with the different shutdown
> behavior.
> > 
> > > > 
> > > >   -Scott
> > > > 
> > > > -- 
> > > > You received this message because you are subscribed to the Google
Groups
> > > > "Chromium-reviews" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send
an
> > > email
> > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > > > 
> > > 
> > > Let me know if you still have more questions.
> 
> And again, we shouldn't be doing an experiment for this. We should understand
> what is going wrong and fix the bug.

Hi Scott. I've been sitting near these guys, so I hope I can add a little color
to help explain why they're doing what they're doing. The goal of the experiment
is not to figure out how to fix this bug, but rather to see how CPM changes when
a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A have
collected hang reports demonstrating that this task (among others) is blocking
shutdown in certain cases. If I had to take a wild guess, I'd say ShellExecuteEx
blocks while the user interacts with one of the various Windows security dialogs
(e.g., "this came from the interwebs, are you sure you want to run it?"). This
isn't what this particular CL is addressing, though. This CL is for a temporary
experiment on canary and dev to answer the question "how much does CPM move when
a group of shutdown blockers don't block shutdown." In the long run, this
experiment will be removed and we'll find and test a proper fix for the bug.

sky

On 2016/05/02 20:49:47, grt wrote: > On 2016/05/02 20:30:53, sky wrote: > > On 2016/05/02 ...

4 years, 7 months ago (2016-05-02 22:53:36 UTC) #19

On 2016/05/02 20:49:47, grt wrote:
> On 2016/05/02 20:30:53, sky wrote:
> > On 2016/05/02 20:23:51, sky wrote:
> > > On 2016/05/02 19:36:51, Patrick Monette wrote:
> > > > On 2016/05/02 15:14:17, sky wrote:
> > > > > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org>
wrote:
> > > > > > On 2016/04/29 17:32:33, sky wrote:
> > > > > >> Why are we putting this behind an experiment?
> > > > > >
> > > > > > We've identified the shutdown behavior for this task as a probable
> cause
> > > of
> > > > > > unclean shutdowns and want to validate that hypothesis through an
> > > > > > experiment.
> > > > > > Simply looking at CPM wouldn't allow validation due to high version
to
> > > > > > version
> > > > > > noise.
> > > > > 
> > > > > This seems like an awful way to draw conclusions about a particular
> > > > > code path. If we think open item is the culprit why can't we
> > > > > investigate that code path to figure out what it may be doing?
> > > > >
> > > > It's because we are not trying to validate if the fix is correct, we are
> > > trying
> > > > to validate the impact of that change. ("We" here means manzagop, siggi
> and
> > > I.)
> > > > 
> > > > As part of our investigation on browser hangs, we're received a large
> number
> > > of
> > > > hang
> > > > reports and we are interested in correlating the number of report
received
> > to
> > > a
> > > > CPM
> > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang
reports
> > > > directly
> > > > translate to Y CPM.
> > > > 
> > > > With this correlation, we could more easily convince other people that
> their
> > > > code is responsible for
> > > > a number of hangs (with a precise number) and that they have to fix it.
> > > > 
> > > > The end goal for our project would be to have hang crash reports to be
> > triaged
> > > > the same way a normal crash is, as fixing all hangs is too big of a
> problem
> > > for
> > > > one team to fix.
> > > > Our reports are not sufficiently actionable for this yet.
> > > > > 
> > > > > >> Also, do we have test coverage for both ways?
> > > > > >
> > > > > > We don't and I'm not sure there is a good way to test for that. The
> > > current
> > > > > > unit
> > > > > > tests doesn't test the shutdown behavior (always expect the task to
> run
> > > > > > fully)
> > > > > > so running a parameterized test on the value of shutdown_mode will
not
> > > > > > execute
> > > > > > any different code path.
> > > > > 
> > > > > Why can't we right a test for this? Perhaps a browser test is needed
> > > > > to get the right coverage?
> > > > >
> > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is
> already
> > > > tested
> > > > in sequenced_worker_pool_unittests.cc. I think we should trust that this
> is
> > > > doing
> > > > the right thing already.
> > > 
> > > I'm asking if we have coverage of OpenItem with the different shutdown
> > behavior.
> > > 
> > > > > 
> > > > >   -Scott
> > > > > 
> > > > > -- 
> > > > > You received this message because you are subscribed to the Google
> Groups
> > > > > "Chromium-reviews" group.
> > > > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > > email
> > > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > > > > 
> > > > 
> > > > Let me know if you still have more questions.
> > 
> > And again, we shouldn't be doing an experiment for this. We should
understand
> > what is going wrong and fix the bug.
> 
> Hi Scott. I've been sitting near these guys, so I hope I can add a little
color
> to help explain why they're doing what they're doing. The goal of the
experiment
> is not to figure out how to fix this bug, but rather to see how CPM changes
when
> a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A
have
> collected hang reports demonstrating that this task (among others) is blocking
> shutdown in certain cases. If I had to take a wild guess, I'd say
ShellExecuteEx
> blocks while the user interacts with one of the various Windows security
dialogs
> (e.g., "this came from the interwebs, are you sure you want to run it?"). This
> isn't what this particular CL is addressing, though. This CL is for a
temporary
> experiment on canary and dev to answer the question "how much does CPM move
when
> a group of shutdown blockers don't block shutdown." In the long run, this
> experiment will be removed and we'll find and test a proper fix for the bug.

I don't think experiments were intended for this. None-the-less why don't we
change the default behavior always with no experiment? You even say this in the
bug.

jwd

On 2016/05/02 22:53:36, sky wrote: > On 2016/05/02 20:49:47, grt wrote: > > On 2016/05/02 ...

4 years, 7 months ago (2016-05-03 14:51:32 UTC) #20

On 2016/05/02 22:53:36, sky wrote:
> On 2016/05/02 20:49:47, grt wrote:
> > On 2016/05/02 20:30:53, sky wrote:
> > > On 2016/05/02 20:23:51, sky wrote:
> > > > On 2016/05/02 19:36:51, Patrick Monette wrote:
> > > > > On 2016/05/02 15:14:17, sky wrote:
> > > > > > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org>
> wrote:
> > > > > > > On 2016/04/29 17:32:33, sky wrote:
> > > > > > >> Why are we putting this behind an experiment?
> > > > > > >
> > > > > > > We've identified the shutdown behavior for this task as a probable
> > cause
> > > > of
> > > > > > > unclean shutdowns and want to validate that hypothesis through an
> > > > > > > experiment.
> > > > > > > Simply looking at CPM wouldn't allow validation due to high
version
> to
> > > > > > > version
> > > > > > > noise.
> > > > > > 
> > > > > > This seems like an awful way to draw conclusions about a particular
> > > > > > code path. If we think open item is the culprit why can't we
> > > > > > investigate that code path to figure out what it may be doing?
> > > > > >
> > > > > It's because we are not trying to validate if the fix is correct, we
are
> > > > trying
> > > > > to validate the impact of that change. ("We" here means manzagop,
siggi
> > and
> > > > I.)
> > > > > 
> > > > > As part of our investigation on browser hangs, we're received a large
> > number
> > > > of
> > > > > hang
> > > > > reports and we are interested in correlating the number of report
> received
> > > to
> > > > a
> > > > > CPM
> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang
> reports
> > > > > directly
> > > > > translate to Y CPM.
> > > > > 
> > > > > With this correlation, we could more easily convince other people that
> > their
> > > > > code is responsible for
> > > > > a number of hangs (with a precise number) and that they have to fix
it.
> > > > > 
> > > > > The end goal for our project would be to have hang crash reports to be
> > > triaged
> > > > > the same way a normal crash is, as fixing all hangs is too big of a
> > problem
> > > > for
> > > > > one team to fix.
> > > > > Our reports are not sufficiently actionable for this yet.
> > > > > > 
> > > > > > >> Also, do we have test coverage for both ways?
> > > > > > >
> > > > > > > We don't and I'm not sure there is a good way to test for that.
The
> > > > current
> > > > > > > unit
> > > > > > > tests doesn't test the shutdown behavior (always expect the task
to
> > run
> > > > > > > fully)
> > > > > > > so running a parameterized test on the value of shutdown_mode will
> not
> > > > > > > execute
> > > > > > > any different code path.
> > > > > > 
> > > > > > Why can't we right a test for this? Perhaps a browser test is needed
> > > > > > to get the right coverage?
> > > > > >
> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is
> > already
> > > > > tested
> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust that
this
> > is
> > > > > doing
> > > > > the right thing already.
> > > > 
> > > > I'm asking if we have coverage of OpenItem with the different shutdown
> > > behavior.
> > > > 
> > > > > > 
> > > > > >   -Scott
> > > > > > 
> > > > > > -- 
> > > > > > You received this message because you are subscribed to the Google
> > Groups
> > > > > > "Chromium-reviews" group.
> > > > > > To unsubscribe from this group and stop receiving emails from it,
send
> > an
> > > > > email
> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > > > > > 
> > > > > 
> > > > > Let me know if you still have more questions.
> > > 
> > > And again, we shouldn't be doing an experiment for this. We should
> understand
> > > what is going wrong and fix the bug.
> > 
> > Hi Scott. I've been sitting near these guys, so I hope I can add a little
> color
> > to help explain why they're doing what they're doing. The goal of the
> experiment
> > is not to figure out how to fix this bug, but rather to see how CPM changes
> when
> > a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A
> have
> > collected hang reports demonstrating that this task (among others) is
blocking
> > shutdown in certain cases. If I had to take a wild guess, I'd say
> ShellExecuteEx
> > blocks while the user interacts with one of the various Windows security
> dialogs
> > (e.g., "this came from the interwebs, are you sure you want to run it?").
This
> > isn't what this particular CL is addressing, though. This CL is for a
> temporary
> > experiment on canary and dev to answer the question "how much does CPM move
> when
> > a group of shutdown blockers don't block shutdown." In the long run, this
> > experiment will be removed and we'll find and test a proper fix for the bug.
> 
> I don't think experiments were intended for this. None-the-less why don't we
> change the default behavior always with no experiment? You even say this in
the
> bug.

I disagree with this not being what experiments were intended for. Experiments
are for when a change is being made and we want to see the effects by
controlling as many other variables as possible. This seems like it's exactly
what's being done here. I don't think a meaningful comparison can be made across
canary versions, there's too many other moving parts. I think the benefits from
the experimental data for guiding future efforts is more valuable than the cost
of not immediately pushing a beneficial change to all users.

grt (UTC plus 2)

On 2016/05/02 22:53:36, sky wrote: > On 2016/05/02 20:49:47, grt wrote: > > On 2016/05/02 ...

4 years, 7 months ago (2016-05-03 14:59:07 UTC) #21

On 2016/05/02 22:53:36, sky wrote:
> On 2016/05/02 20:49:47, grt wrote:
> > On 2016/05/02 20:30:53, sky wrote:
> > > On 2016/05/02 20:23:51, sky wrote:
> > > > On 2016/05/02 19:36:51, Patrick Monette wrote:
> > > > > On 2016/05/02 15:14:17, sky wrote:
> > > > > > On Fri, Apr 29, 2016 at 1:18 PM,  <mailto:pmonette@chromium.org>
> wrote:
> > > > > > > On 2016/04/29 17:32:33, sky wrote:
> > > > > > >> Why are we putting this behind an experiment?
> > > > > > >
> > > > > > > We've identified the shutdown behavior for this task as a probable
> > cause
> > > > of
> > > > > > > unclean shutdowns and want to validate that hypothesis through an
> > > > > > > experiment.
> > > > > > > Simply looking at CPM wouldn't allow validation due to high
version
> to
> > > > > > > version
> > > > > > > noise.
> > > > > > 
> > > > > > This seems like an awful way to draw conclusions about a particular
> > > > > > code path. If we think open item is the culprit why can't we
> > > > > > investigate that code path to figure out what it may be doing?
> > > > > >
> > > > > It's because we are not trying to validate if the fix is correct, we
are
> > > > trying
> > > > > to validate the impact of that change. ("We" here means manzagop,
siggi
> > and
> > > > I.)
> > > > > 
> > > > > As part of our investigation on browser hangs, we're received a large
> > number
> > > > of
> > > > > hang
> > > > > reports and we are interested in correlating the number of report
> received
> > > to
> > > > a
> > > > > CPM
> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang
> reports
> > > > > directly
> > > > > translate to Y CPM.
> > > > > 
> > > > > With this correlation, we could more easily convince other people that
> > their
> > > > > code is responsible for
> > > > > a number of hangs (with a precise number) and that they have to fix
it.
> > > > > 
> > > > > The end goal for our project would be to have hang crash reports to be
> > > triaged
> > > > > the same way a normal crash is, as fixing all hangs is too big of a
> > problem
> > > > for
> > > > > one team to fix.
> > > > > Our reports are not sufficiently actionable for this yet.
> > > > > > 
> > > > > > >> Also, do we have test coverage for both ways?
> > > > > > >
> > > > > > > We don't and I'm not sure there is a good way to test for that.
The
> > > > current
> > > > > > > unit
> > > > > > > tests doesn't test the shutdown behavior (always expect the task
to
> > run
> > > > > > > fully)
> > > > > > > so running a parameterized test on the value of shutdown_mode will
> not
> > > > > > > execute
> > > > > > > any different code path.
> > > > > > 
> > > > > > Why can't we right a test for this? Perhaps a browser test is needed
> > > > > > to get the right coverage?
> > > > > >
> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is
> > already
> > > > > tested
> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust that
this
> > is
> > > > > doing
> > > > > the right thing already.
> > > > 
> > > > I'm asking if we have coverage of OpenItem with the different shutdown
> > > behavior.
> > > > 
> > > > > > 
> > > > > >   -Scott
> > > > > > 
> > > > > > -- 
> > > > > > You received this message because you are subscribed to the Google
> > Groups
> > > > > > "Chromium-reviews" group.
> > > > > > To unsubscribe from this group and stop receiving emails from it,
send
> > an
> > > > > email
> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > > > > > 
> > > > > 
> > > > > Let me know if you still have more questions.
> > > 
> > > And again, we shouldn't be doing an experiment for this. We should
> understand
> > > what is going wrong and fix the bug.
> > 
> > Hi Scott. I've been sitting near these guys, so I hope I can add a little
> color
> > to help explain why they're doing what they're doing. The goal of the
> experiment
> > is not to figure out how to fix this bug, but rather to see how CPM changes
> when
> > a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A
> have
> > collected hang reports demonstrating that this task (among others) is
blocking
> > shutdown in certain cases. If I had to take a wild guess, I'd say
> ShellExecuteEx
> > blocks while the user interacts with one of the various Windows security
> dialogs
> > (e.g., "this came from the interwebs, are you sure you want to run it?").
This
> > isn't what this particular CL is addressing, though. This CL is for a
> temporary
> > experiment on canary and dev to answer the question "how much does CPM move
> when
> > a group of shutdown blockers don't block shutdown." In the long run, this
> > experiment will be removed and we'll find and test a proper fix for the bug.
> 
> I don't think experiments were intended for this. None-the-less why don't we
> change the default behavior always with no experiment?

The reason to not change the behavior without the experiment is because that
will prevent us from knowing how much of a difference the fix made. We think
it's valuable to have solid numbers so that we can properly prioritize other
hang-related fixes.

As for the change itself, I've looked at some hang stacks with Patrick and am
now less convinced that we should make this change. ;-)  Some of the stacks make
it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user experience.
I've updated the bug accordingly.

sky

Given grt's comment can we close this out now? On Tue, May 3, 2016 at ...

4 years, 7 months ago (2016-05-03 16:41:43 UTC) #22

Given grt's comment can we close this out now?

On Tue, May 3, 2016 at 7:59 AM,  <grt@chromium.org> wrote:
> On 2016/05/02 22:53:36, sky wrote:
>> On 2016/05/02 20:49:47, grt wrote:
>> > On 2016/05/02 20:30:53, sky wrote:
>> > > On 2016/05/02 20:23:51, sky wrote:
>> > > > On 2016/05/02 19:36:51, Patrick Monette wrote:
>> > > > > On 2016/05/02 15:14:17, sky wrote:
>> > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org>
>> wrote:
>> > > > > > > On 2016/04/29 17:32:33, sky wrote:
>> > > > > > >> Why are we putting this behind an experiment?
>> > > > > > >
>> > > > > > > We've identified the shutdown behavior for this task as a
>> > > > > > > probable
>> > cause
>> > > > of
>> > > > > > > unclean shutdowns and want to validate that hypothesis through
>> > > > > > > an
>> > > > > > > experiment.
>> > > > > > > Simply looking at CPM wouldn't allow validation due to high
> version
>> to
>> > > > > > > version
>> > > > > > > noise.
>> > > > > >
>> > > > > > This seems like an awful way to draw conclusions about a
>> > > > > > particular
>> > > > > > code path. If we think open item is the culprit why can't we
>> > > > > > investigate that code path to figure out what it may be doing?
>> > > > > >
>> > > > > It's because we are not trying to validate if the fix is correct,
>> > > > > we
> are
>> > > > trying
>> > > > > to validate the impact of that change. ("We" here means manzagop,
> siggi
>> > and
>> > > > I.)
>> > > > >
>> > > > > As part of our investigation on browser hangs, we're received a
>> > > > > large
>> > number
>> > > > of
>> > > > > hang
>> > > > > reports and we are interested in correlating the number of report
>> received
>> > > to
>> > > > a
>> > > > > CPM
>> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang
>> reports
>> > > > > directly
>> > > > > translate to Y CPM.
>> > > > >
>> > > > > With this correlation, we could more easily convince other people
>> > > > > that
>> > their
>> > > > > code is responsible for
>> > > > > a number of hangs (with a precise number) and that they have to
>> > > > > fix
> it.
>> > > > >
>> > > > > The end goal for our project would be to have hang crash reports
>> > > > > to be
>> > > triaged
>> > > > > the same way a normal crash is, as fixing all hangs is too big of
>> > > > > a
>> > problem
>> > > > for
>> > > > > one team to fix.
>> > > > > Our reports are not sufficiently actionable for this yet.
>> > > > > >
>> > > > > > >> Also, do we have test coverage for both ways?
>> > > > > > >
>> > > > > > > We don't and I'm not sure there is a good way to test for
>> > > > > > > that.
> The
>> > > > current
>> > > > > > > unit
>> > > > > > > tests doesn't test the shutdown behavior (always expect the
>> > > > > > > task
> to
>> > run
>> > > > > > > fully)
>> > > > > > > so running a parameterized test on the value of shutdown_mode
>> > > > > > > will
>> not
>> > > > > > > execute
>> > > > > > > any different code path.
>> > > > > >
>> > > > > > Why can't we right a test for this? Perhaps a browser test is
>> > > > > > needed
>> > > > > > to get the right coverage?
>> > > > > >
>> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN
>> > > > > is
>> > already
>> > > > > tested
>> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust
>> > > > > that
> this
>> > is
>> > > > > doing
>> > > > > the right thing already.
>> > > >
>> > > > I'm asking if we have coverage of OpenItem with the different
>> > > > shutdown
>> > > behavior.
>> > > >
>> > > > > >
>> > > > > > -Scott
>> > > > > >
>> > > > > > --
>> > > > > > You received this message because you are subscribed to the
>> > > > > > Google
>> > Groups
>> > > > > > "Chromium-reviews" group.
>> > > > > > To unsubscribe from this group and stop receiving emails from
>> > > > > > it,
> send
>> > an
>> > > > > email
>> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
>> > > > > >
>> > > > >
>> > > > > Let me know if you still have more questions.
>> > >
>> > > And again, we shouldn't be doing an experiment for this. We should
>> understand
>> > > what is going wrong and fix the bug.
>> >
>> > Hi Scott. I've been sitting near these guys, so I hope I can add a
>> > little
>> color
>> > to help explain why they're doing what they're doing. The goal of the
>> experiment
>> > is not to figure out how to fix this bug, but rather to see how CPM
>> > changes
>> when
>> > a collection of shutdown blockers aren't blocking shutdown. Patrick and
>> > P-A
>> have
>> > collected hang reports demonstrating that this task (among others) is
> blocking
>> > shutdown in certain cases. If I had to take a wild guess, I'd say
>> ShellExecuteEx
>> > blocks while the user interacts with one of the various Windows security
>> dialogs
>> > (e.g., "this came from the interwebs, are you sure you want to run
>> > it?").
> This
>> > isn't what this particular CL is addressing, though. This CL is for a
>> temporary
>> > experiment on canary and dev to answer the question "how much does CPM
>> > move
>> when
>> > a group of shutdown blockers don't block shutdown." In the long run,
>> > this
>> > experiment will be removed and we'll find and test a proper fix for the
>> > bug.
>>
>> I don't think experiments were intended for this. None-the-less why don't
>> we
>> change the default behavior always with no experiment?
>
> The reason to not change the behavior without the experiment is because that
> will prevent us from knowing how much of a difference the fix made. We think
> it's valuable to have solid numbers so that we can properly prioritize other
> hang-related fixes.
>
> As for the change itself, I've looked at some hang stacks with Patrick and
> am
> now less convinced that we should make this change. ;-) Some of the stacks
> make
> it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user
> experience.
> I've updated the bug accordingly.
>
> https://codereview.chromium.org/1912303002/

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Patrick Monette

4 years, 7 months ago (2016-05-03 16:45:13 UTC) #23

Message was sent while issue was closed.

On 2016/05/03 16:41:43, sky wrote:
> Given grt's comment can we close this out now?
> 
> On Tue, May 3, 2016 at 7:59 AM,  <mailto:grt@chromium.org> wrote:
> > On 2016/05/02 22:53:36, sky wrote:
> >> On 2016/05/02 20:49:47, grt wrote:
> >> > On 2016/05/02 20:30:53, sky wrote:
> >> > > On 2016/05/02 20:23:51, sky wrote:
> >> > > > On 2016/05/02 19:36:51, Patrick Monette wrote:
> >> > > > > On 2016/05/02 15:14:17, sky wrote:
> >> > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org>
> >> wrote:
> >> > > > > > > On 2016/04/29 17:32:33, sky wrote:
> >> > > > > > >> Why are we putting this behind an experiment?
> >> > > > > > >
> >> > > > > > > We've identified the shutdown behavior for this task as a
> >> > > > > > > probable
> >> > cause
> >> > > > of
> >> > > > > > > unclean shutdowns and want to validate that hypothesis through
> >> > > > > > > an
> >> > > > > > > experiment.
> >> > > > > > > Simply looking at CPM wouldn't allow validation due to high
> > version
> >> to
> >> > > > > > > version
> >> > > > > > > noise.
> >> > > > > >
> >> > > > > > This seems like an awful way to draw conclusions about a
> >> > > > > > particular
> >> > > > > > code path. If we think open item is the culprit why can't we
> >> > > > > > investigate that code path to figure out what it may be doing?
> >> > > > > >
> >> > > > > It's because we are not trying to validate if the fix is correct,
> >> > > > > we
> > are
> >> > > > trying
> >> > > > > to validate the impact of that change. ("We" here means manzagop,
> > siggi
> >> > and
> >> > > > I.)
> >> > > > >
> >> > > > > As part of our investigation on browser hangs, we're received a
> >> > > > > large
> >> > number
> >> > > > of
> >> > > > > hang
> >> > > > > reports and we are interested in correlating the number of report
> >> received
> >> > > to
> >> > > > a
> >> > > > > CPM
> >> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang
> >> reports
> >> > > > > directly
> >> > > > > translate to Y CPM.
> >> > > > >
> >> > > > > With this correlation, we could more easily convince other people
> >> > > > > that
> >> > their
> >> > > > > code is responsible for
> >> > > > > a number of hangs (with a precise number) and that they have to
> >> > > > > fix
> > it.
> >> > > > >
> >> > > > > The end goal for our project would be to have hang crash reports
> >> > > > > to be
> >> > > triaged
> >> > > > > the same way a normal crash is, as fixing all hangs is too big of
> >> > > > > a
> >> > problem
> >> > > > for
> >> > > > > one team to fix.
> >> > > > > Our reports are not sufficiently actionable for this yet.
> >> > > > > >
> >> > > > > > >> Also, do we have test coverage for both ways?
> >> > > > > > >
> >> > > > > > > We don't and I'm not sure there is a good way to test for
> >> > > > > > > that.
> > The
> >> > > > current
> >> > > > > > > unit
> >> > > > > > > tests doesn't test the shutdown behavior (always expect the
> >> > > > > > > task
> > to
> >> > run
> >> > > > > > > fully)
> >> > > > > > > so running a parameterized test on the value of shutdown_mode
> >> > > > > > > will
> >> not
> >> > > > > > > execute
> >> > > > > > > any different code path.
> >> > > > > >
> >> > > > > > Why can't we right a test for this? Perhaps a browser test is
> >> > > > > > needed
> >> > > > > > to get the right coverage?
> >> > > > > >
> >> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN
> >> > > > > is
> >> > already
> >> > > > > tested
> >> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust
> >> > > > > that
> > this
> >> > is
> >> > > > > doing
> >> > > > > the right thing already.
> >> > > >
> >> > > > I'm asking if we have coverage of OpenItem with the different
> >> > > > shutdown
> >> > > behavior.
> >> > > >
> >> > > > > >
> >> > > > > > -Scott
> >> > > > > >
> >> > > > > > --
> >> > > > > > You received this message because you are subscribed to the
> >> > > > > > Google
> >> > Groups
> >> > > > > > "Chromium-reviews" group.
> >> > > > > > To unsubscribe from this group and stop receiving emails from
> >> > > > > > it,
> > send
> >> > an
> >> > > > > email
> >> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> >> > > > > >
> >> > > > >
> >> > > > > Let me know if you still have more questions.
> >> > >
> >> > > And again, we shouldn't be doing an experiment for this. We should
> >> understand
> >> > > what is going wrong and fix the bug.
> >> >
> >> > Hi Scott. I've been sitting near these guys, so I hope I can add a
> >> > little
> >> color
> >> > to help explain why they're doing what they're doing. The goal of the
> >> experiment
> >> > is not to figure out how to fix this bug, but rather to see how CPM
> >> > changes
> >> when
> >> > a collection of shutdown blockers aren't blocking shutdown. Patrick and
> >> > P-A
> >> have
> >> > collected hang reports demonstrating that this task (among others) is
> > blocking
> >> > shutdown in certain cases. If I had to take a wild guess, I'd say
> >> ShellExecuteEx
> >> > blocks while the user interacts with one of the various Windows security
> >> dialogs
> >> > (e.g., "this came from the interwebs, are you sure you want to run
> >> > it?").
> > This
> >> > isn't what this particular CL is addressing, though. This CL is for a
> >> temporary
> >> > experiment on canary and dev to answer the question "how much does CPM
> >> > move
> >> when
> >> > a group of shutdown blockers don't block shutdown." In the long run,
> >> > this
> >> > experiment will be removed and we'll find and test a proper fix for the
> >> > bug.
> >>
> >> I don't think experiments were intended for this. None-the-less why don't
> >> we
> >> change the default behavior always with no experiment?
> >
> > The reason to not change the behavior without the experiment is because that
> > will prevent us from knowing how much of a difference the fix made. We think
> > it's valuable to have solid numbers so that we can properly prioritize other
> > hang-related fixes.
> >
> > As for the change itself, I've looked at some hang stacks with Patrick and
> > am
> > now less convinced that we should make this change. ;-) Some of the stacks
> > make
> > it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user
> > experience.
> > I've updated the bug accordingly.
> >
> > https://codereview.chromium.org/1912303002/
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.
> 

Done.

Issue 1912303002: OpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Rebased #

Messages