|
|
Created:
4 years, 8 months ago by Patrick Monette Modified:
4 years, 7 months ago CC:
chromium-reviews, gab Base URL:
https://chromium.googlesource.com/chromium/src.git@fix_delayed_analysis Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionOpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic.
This task is known for causing some shutdown hangs.
The change is behind an experiment called BrowserHangFixesExperiment.
BUG=478209, 592739
Patch Set 1 #
Total comments: 3
Patch Set 2 : Rebased #
Depends on Patchset: Messages
Total messages: 23 (5 generated)
pmonette@chromium.org changed reviewers: + manzagop@chromium.org
This CL depends on crrev.com/1905293003 Please take a look.
Cool stuff! lgtm From a quick look, the clients of this code should be ok with the new behavior (empty callbacks), but might be a good idea to minimally CC the owners of the client code (platform_util_linux.cc, downloads_api.cc, chrome_download_manager_delegate.cc, md_downloads_dom_handler.cc , etc). https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... File chrome/browser/platform_util.cc (right): https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, Is it an issue to post a task from a continue on shutdown task?
On 2016/04/25 14:28:27, manzagop wrote: > Cool stuff! lgtm > > From a quick look, the clients of this code should be ok with the new behavior > (empty callbacks), but might be a good idea to minimally CC the owners of the > client code (platform_util_linux.cc, downloads_api.cc, > chrome_download_manager_delegate.cc, md_downloads_dom_handler.cc > , etc). I don't see any problem because with SKIP_ON_SHUTDOWN, it was already possible for the task to never execute. > > https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... > File chrome/browser/platform_util.cc (right): > > https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... > chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, > FROM_HERE, > Is it an issue to post a task from a continue on shutdown task?
pmonette@chromium.org changed reviewers: + gab@chromium.org
gab@ Small Q for you. https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... File chrome/browser/platform_util.cc (right): https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, On 2016/04/25 14:28:26, manzagop wrote: > Is it an issue to post a task from a continue on shutdown task? Lets ask gab@!
gab@chromium.org changed reviewers: - gab@chromium.org
self to CC https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... File chrome/browser/platform_util.cc (right): https://codereview.chromium.org/1912303002/diff/1/chrome/browser/platform_uti... chrome/browser/platform_util.cc:42: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, On 2016/04/28 20:22:09, Patrick Monette wrote: > On 2016/04/25 14:28:26, manzagop wrote: > > Is it an issue to post a task from a continue on shutdown task? > > Lets ask gab@! It shouldn't be: - for BrowserThread MessageLoops either the loop is up or it's not (if it's not PostTask will simply return false and the task will never run; if it is the task will be posted and will most likely block the join of that thread). - for the SequencedWorkerPool I'm not sure (but I'd assume PostTask also returns false, unless shutdown isn't complete and the task is BLOCK_SHUTDOWN but that'd be reeeaaaaalllly weird to do!).
pmonette@chromium.org changed reviewers: + sky@chromium.org
Sky@ PTAL for owners review.
Why are we putting this behind an experiment? Also, do we have test coverage for both ways?
Description was changed from ========== OpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic. This task is known for causing some shutdown hangs. The change is behind an experiment called BrowserHangsExperiment. BUG=478209, 592739 ========== to ========== OpenItem now posts its task with CONTINUE_ON_SHUTDOWN semantic. This task is known for causing some shutdown hangs. The change is behind an experiment called BrowserHangFixesExperiment. BUG=478209, 592739 ==========
On 2016/04/29 17:32:33, sky wrote: > Why are we putting this behind an experiment? We've identified the shutdown behavior for this task as a probable cause of unclean shutdowns and want to validate that hypothesis through an experiment. Simply looking at CPM wouldn't allow validation due to high version to version noise. > Also, do we have test coverage for both ways? We don't and I'm not sure there is a good way to test for that. The current unit tests doesn't test the shutdown behavior (always expect the task to run fully) so running a parameterized test on the value of shutdown_mode will not execute any different code path.
On Fri, Apr 29, 2016 at 1:18 PM, <pmonette@chromium.org> wrote: > On 2016/04/29 17:32:33, sky wrote: >> Why are we putting this behind an experiment? > > We've identified the shutdown behavior for this task as a probable cause of > unclean shutdowns and want to validate that hypothesis through an > experiment. > Simply looking at CPM wouldn't allow validation due to high version to > version > noise. This seems like an awful way to draw conclusions about a particular code path. If we think open item is the culprit why can't we investigate that code path to figure out what it may be doing? >> Also, do we have test coverage for both ways? > > We don't and I'm not sure there is a good way to test for that. The current > unit > tests doesn't test the shutdown behavior (always expect the task to run > fully) > so running a parameterized test on the value of shutdown_mode will not > execute > any different code path. Why can't we right a test for this? Perhaps a browser test is needed to get the right coverage? -Scott -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/05/02 15:14:17, sky wrote: > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> wrote: > > On 2016/04/29 17:32:33, sky wrote: > >> Why are we putting this behind an experiment? > > > > We've identified the shutdown behavior for this task as a probable cause of > > unclean shutdowns and want to validate that hypothesis through an > > experiment. > > Simply looking at CPM wouldn't allow validation due to high version to > > version > > noise. > > This seems like an awful way to draw conclusions about a particular > code path. If we think open item is the culprit why can't we > investigate that code path to figure out what it may be doing? > It's because we are not trying to validate if the fix is correct, we are trying to validate the impact of that change. ("We" here means manzagop, siggi and I.) As part of our investigation on browser hangs, we're received a large number of hang reports and we are interested in correlating the number of report received to a CPM number. e.g. A bucket (bucketed by magic signature) of X% of hang reports directly translate to Y CPM. With this correlation, we could more easily convince other people that their code is responsible for a number of hangs (with a precise number) and that they have to fix it. The end goal for our project would be to have hang crash reports to be triaged the same way a normal crash is, as fixing all hangs is too big of a problem for one team to fix. Our reports are not sufficiently actionable for this yet. > > >> Also, do we have test coverage for both ways? > > > > We don't and I'm not sure there is a good way to test for that. The current > > unit > > tests doesn't test the shutdown behavior (always expect the task to run > > fully) > > so running a parameterized test on the value of shutdown_mode will not > > execute > > any different code path. > > Why can't we right a test for this? Perhaps a browser test is needed > to get the right coverage? > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already tested in sequenced_worker_pool_unittests.cc. I think we should trust that this is doing the right thing already. > > -Scott > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. > Let me know if you still have more questions.
On 2016/05/02 19:36:51, Patrick Monette wrote: > On 2016/05/02 15:14:17, sky wrote: > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> wrote: > > > On 2016/04/29 17:32:33, sky wrote: > > >> Why are we putting this behind an experiment? > > > > > > We've identified the shutdown behavior for this task as a probable cause of > > > unclean shutdowns and want to validate that hypothesis through an > > > experiment. > > > Simply looking at CPM wouldn't allow validation due to high version to > > > version > > > noise. > > > > This seems like an awful way to draw conclusions about a particular > > code path. If we think open item is the culprit why can't we > > investigate that code path to figure out what it may be doing? > > > It's because we are not trying to validate if the fix is correct, we are trying > to validate the impact of that change. ("We" here means manzagop, siggi and I.) > > As part of our investigation on browser hangs, we're received a large number of > hang > reports and we are interested in correlating the number of report received to a > CPM > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports > directly > translate to Y CPM. > > With this correlation, we could more easily convince other people that their > code is responsible for > a number of hangs (with a precise number) and that they have to fix it. > > The end goal for our project would be to have hang crash reports to be triaged > the same way a normal crash is, as fixing all hangs is too big of a problem for > one team to fix. > Our reports are not sufficiently actionable for this yet. > > > > >> Also, do we have test coverage for both ways? > > > > > > We don't and I'm not sure there is a good way to test for that. The current > > > unit > > > tests doesn't test the shutdown behavior (always expect the task to run > > > fully) > > > so running a parameterized test on the value of shutdown_mode will not > > > execute > > > any different code path. > > > > Why can't we right a test for this? Perhaps a browser test is needed > > to get the right coverage? > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already > tested > in sequenced_worker_pool_unittests.cc. I think we should trust that this is > doing > the right thing already. I'm asking if we have coverage of OpenItem with the different shutdown behavior. > > > > -Scott > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > Let me know if you still have more questions.
On 2016/05/02 20:23:51, sky wrote: > On 2016/05/02 19:36:51, Patrick Monette wrote: > > On 2016/05/02 15:14:17, sky wrote: > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> wrote: > > > > On 2016/04/29 17:32:33, sky wrote: > > > >> Why are we putting this behind an experiment? > > > > > > > > We've identified the shutdown behavior for this task as a probable cause > of > > > > unclean shutdowns and want to validate that hypothesis through an > > > > experiment. > > > > Simply looking at CPM wouldn't allow validation due to high version to > > > > version > > > > noise. > > > > > > This seems like an awful way to draw conclusions about a particular > > > code path. If we think open item is the culprit why can't we > > > investigate that code path to figure out what it may be doing? > > > > > It's because we are not trying to validate if the fix is correct, we are > trying > > to validate the impact of that change. ("We" here means manzagop, siggi and > I.) > > > > As part of our investigation on browser hangs, we're received a large number > of > > hang > > reports and we are interested in correlating the number of report received to > a > > CPM > > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports > > directly > > translate to Y CPM. > > > > With this correlation, we could more easily convince other people that their > > code is responsible for > > a number of hangs (with a precise number) and that they have to fix it. > > > > The end goal for our project would be to have hang crash reports to be triaged > > the same way a normal crash is, as fixing all hangs is too big of a problem > for > > one team to fix. > > Our reports are not sufficiently actionable for this yet. > > > > > > >> Also, do we have test coverage for both ways? > > > > > > > > We don't and I'm not sure there is a good way to test for that. The > current > > > > unit > > > > tests doesn't test the shutdown behavior (always expect the task to run > > > > fully) > > > > so running a parameterized test on the value of shutdown_mode will not > > > > execute > > > > any different code path. > > > > > > Why can't we right a test for this? Perhaps a browser test is needed > > > to get the right coverage? > > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already > > tested > > in sequenced_worker_pool_unittests.cc. I think we should trust that this is > > doing > > the right thing already. > > I'm asking if we have coverage of OpenItem with the different shutdown behavior. > > > > > > > -Scott > > > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Chromium-reviews" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > email > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > Let me know if you still have more questions. And again, we shouldn't be doing an experiment for this. We should understand what is going wrong and fix the bug.
On 2016/05/02 20:30:53, sky wrote: > On 2016/05/02 20:23:51, sky wrote: > > On 2016/05/02 19:36:51, Patrick Monette wrote: > > > On 2016/05/02 15:14:17, sky wrote: > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> wrote: > > > > > On 2016/04/29 17:32:33, sky wrote: > > > > >> Why are we putting this behind an experiment? > > > > > > > > > > We've identified the shutdown behavior for this task as a probable cause > > of > > > > > unclean shutdowns and want to validate that hypothesis through an > > > > > experiment. > > > > > Simply looking at CPM wouldn't allow validation due to high version to > > > > > version > > > > > noise. > > > > > > > > This seems like an awful way to draw conclusions about a particular > > > > code path. If we think open item is the culprit why can't we > > > > investigate that code path to figure out what it may be doing? > > > > > > > It's because we are not trying to validate if the fix is correct, we are > > trying > > > to validate the impact of that change. ("We" here means manzagop, siggi and > > I.) > > > > > > As part of our investigation on browser hangs, we're received a large number > > of > > > hang > > > reports and we are interested in correlating the number of report received > to > > a > > > CPM > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports > > > directly > > > translate to Y CPM. > > > > > > With this correlation, we could more easily convince other people that their > > > code is responsible for > > > a number of hangs (with a precise number) and that they have to fix it. > > > > > > The end goal for our project would be to have hang crash reports to be > triaged > > > the same way a normal crash is, as fixing all hangs is too big of a problem > > for > > > one team to fix. > > > Our reports are not sufficiently actionable for this yet. > > > > > > > > >> Also, do we have test coverage for both ways? > > > > > > > > > > We don't and I'm not sure there is a good way to test for that. The > > current > > > > > unit > > > > > tests doesn't test the shutdown behavior (always expect the task to run > > > > > fully) > > > > > so running a parameterized test on the value of shutdown_mode will not > > > > > execute > > > > > any different code path. > > > > > > > > Why can't we right a test for this? Perhaps a browser test is needed > > > > to get the right coverage? > > > > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is already > > > tested > > > in sequenced_worker_pool_unittests.cc. I think we should trust that this is > > > doing > > > the right thing already. > > > > I'm asking if we have coverage of OpenItem with the different shutdown > behavior. > > > > > > > > > > -Scott > > > > > > > > -- > > > > You received this message because you are subscribed to the Google Groups > > > > "Chromium-reviews" group. > > > > To unsubscribe from this group and stop receiving emails from it, send an > > > email > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > > > > Let me know if you still have more questions. > > And again, we shouldn't be doing an experiment for this. We should understand > what is going wrong and fix the bug. Hi Scott. I've been sitting near these guys, so I hope I can add a little color to help explain why they're doing what they're doing. The goal of the experiment is not to figure out how to fix this bug, but rather to see how CPM changes when a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A have collected hang reports demonstrating that this task (among others) is blocking shutdown in certain cases. If I had to take a wild guess, I'd say ShellExecuteEx blocks while the user interacts with one of the various Windows security dialogs (e.g., "this came from the interwebs, are you sure you want to run it?"). This isn't what this particular CL is addressing, though. This CL is for a temporary experiment on canary and dev to answer the question "how much does CPM move when a group of shutdown blockers don't block shutdown." In the long run, this experiment will be removed and we'll find and test a proper fix for the bug.
On 2016/05/02 20:49:47, grt wrote: > On 2016/05/02 20:30:53, sky wrote: > > On 2016/05/02 20:23:51, sky wrote: > > > On 2016/05/02 19:36:51, Patrick Monette wrote: > > > > On 2016/05/02 15:14:17, sky wrote: > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> wrote: > > > > > > On 2016/04/29 17:32:33, sky wrote: > > > > > >> Why are we putting this behind an experiment? > > > > > > > > > > > > We've identified the shutdown behavior for this task as a probable > cause > > > of > > > > > > unclean shutdowns and want to validate that hypothesis through an > > > > > > experiment. > > > > > > Simply looking at CPM wouldn't allow validation due to high version to > > > > > > version > > > > > > noise. > > > > > > > > > > This seems like an awful way to draw conclusions about a particular > > > > > code path. If we think open item is the culprit why can't we > > > > > investigate that code path to figure out what it may be doing? > > > > > > > > > It's because we are not trying to validate if the fix is correct, we are > > > trying > > > > to validate the impact of that change. ("We" here means manzagop, siggi > and > > > I.) > > > > > > > > As part of our investigation on browser hangs, we're received a large > number > > > of > > > > hang > > > > reports and we are interested in correlating the number of report received > > to > > > a > > > > CPM > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang reports > > > > directly > > > > translate to Y CPM. > > > > > > > > With this correlation, we could more easily convince other people that > their > > > > code is responsible for > > > > a number of hangs (with a precise number) and that they have to fix it. > > > > > > > > The end goal for our project would be to have hang crash reports to be > > triaged > > > > the same way a normal crash is, as fixing all hangs is too big of a > problem > > > for > > > > one team to fix. > > > > Our reports are not sufficiently actionable for this yet. > > > > > > > > > > >> Also, do we have test coverage for both ways? > > > > > > > > > > > > We don't and I'm not sure there is a good way to test for that. The > > > current > > > > > > unit > > > > > > tests doesn't test the shutdown behavior (always expect the task to > run > > > > > > fully) > > > > > > so running a parameterized test on the value of shutdown_mode will not > > > > > > execute > > > > > > any different code path. > > > > > > > > > > Why can't we right a test for this? Perhaps a browser test is needed > > > > > to get the right coverage? > > > > > > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is > already > > > > tested > > > > in sequenced_worker_pool_unittests.cc. I think we should trust that this > is > > > > doing > > > > the right thing already. > > > > > > I'm asking if we have coverage of OpenItem with the different shutdown > > behavior. > > > > > > > > > > > > > -Scott > > > > > > > > > > -- > > > > > You received this message because you are subscribed to the Google > Groups > > > > > "Chromium-reviews" group. > > > > > To unsubscribe from this group and stop receiving emails from it, send > an > > > > email > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > > > > > > > Let me know if you still have more questions. > > > > And again, we shouldn't be doing an experiment for this. We should understand > > what is going wrong and fix the bug. > > Hi Scott. I've been sitting near these guys, so I hope I can add a little color > to help explain why they're doing what they're doing. The goal of the experiment > is not to figure out how to fix this bug, but rather to see how CPM changes when > a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A have > collected hang reports demonstrating that this task (among others) is blocking > shutdown in certain cases. If I had to take a wild guess, I'd say ShellExecuteEx > blocks while the user interacts with one of the various Windows security dialogs > (e.g., "this came from the interwebs, are you sure you want to run it?"). This > isn't what this particular CL is addressing, though. This CL is for a temporary > experiment on canary and dev to answer the question "how much does CPM move when > a group of shutdown blockers don't block shutdown." In the long run, this > experiment will be removed and we'll find and test a proper fix for the bug. I don't think experiments were intended for this. None-the-less why don't we change the default behavior always with no experiment? You even say this in the bug.
On 2016/05/02 22:53:36, sky wrote: > On 2016/05/02 20:49:47, grt wrote: > > On 2016/05/02 20:30:53, sky wrote: > > > On 2016/05/02 20:23:51, sky wrote: > > > > On 2016/05/02 19:36:51, Patrick Monette wrote: > > > > > On 2016/05/02 15:14:17, sky wrote: > > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> > wrote: > > > > > > > On 2016/04/29 17:32:33, sky wrote: > > > > > > >> Why are we putting this behind an experiment? > > > > > > > > > > > > > > We've identified the shutdown behavior for this task as a probable > > cause > > > > of > > > > > > > unclean shutdowns and want to validate that hypothesis through an > > > > > > > experiment. > > > > > > > Simply looking at CPM wouldn't allow validation due to high version > to > > > > > > > version > > > > > > > noise. > > > > > > > > > > > > This seems like an awful way to draw conclusions about a particular > > > > > > code path. If we think open item is the culprit why can't we > > > > > > investigate that code path to figure out what it may be doing? > > > > > > > > > > > It's because we are not trying to validate if the fix is correct, we are > > > > trying > > > > > to validate the impact of that change. ("We" here means manzagop, siggi > > and > > > > I.) > > > > > > > > > > As part of our investigation on browser hangs, we're received a large > > number > > > > of > > > > > hang > > > > > reports and we are interested in correlating the number of report > received > > > to > > > > a > > > > > CPM > > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang > reports > > > > > directly > > > > > translate to Y CPM. > > > > > > > > > > With this correlation, we could more easily convince other people that > > their > > > > > code is responsible for > > > > > a number of hangs (with a precise number) and that they have to fix it. > > > > > > > > > > The end goal for our project would be to have hang crash reports to be > > > triaged > > > > > the same way a normal crash is, as fixing all hangs is too big of a > > problem > > > > for > > > > > one team to fix. > > > > > Our reports are not sufficiently actionable for this yet. > > > > > > > > > > > > >> Also, do we have test coverage for both ways? > > > > > > > > > > > > > > We don't and I'm not sure there is a good way to test for that. The > > > > current > > > > > > > unit > > > > > > > tests doesn't test the shutdown behavior (always expect the task to > > run > > > > > > > fully) > > > > > > > so running a parameterized test on the value of shutdown_mode will > not > > > > > > > execute > > > > > > > any different code path. > > > > > > > > > > > > Why can't we right a test for this? Perhaps a browser test is needed > > > > > > to get the right coverage? > > > > > > > > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is > > already > > > > > tested > > > > > in sequenced_worker_pool_unittests.cc. I think we should trust that this > > is > > > > > doing > > > > > the right thing already. > > > > > > > > I'm asking if we have coverage of OpenItem with the different shutdown > > > behavior. > > > > > > > > > > > > > > > > -Scott > > > > > > > > > > > > -- > > > > > > You received this message because you are subscribed to the Google > > Groups > > > > > > "Chromium-reviews" group. > > > > > > To unsubscribe from this group and stop receiving emails from it, send > > an > > > > > email > > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > > > > > > > > > > Let me know if you still have more questions. > > > > > > And again, we shouldn't be doing an experiment for this. We should > understand > > > what is going wrong and fix the bug. > > > > Hi Scott. I've been sitting near these guys, so I hope I can add a little > color > > to help explain why they're doing what they're doing. The goal of the > experiment > > is not to figure out how to fix this bug, but rather to see how CPM changes > when > > a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A > have > > collected hang reports demonstrating that this task (among others) is blocking > > shutdown in certain cases. If I had to take a wild guess, I'd say > ShellExecuteEx > > blocks while the user interacts with one of the various Windows security > dialogs > > (e.g., "this came from the interwebs, are you sure you want to run it?"). This > > isn't what this particular CL is addressing, though. This CL is for a > temporary > > experiment on canary and dev to answer the question "how much does CPM move > when > > a group of shutdown blockers don't block shutdown." In the long run, this > > experiment will be removed and we'll find and test a proper fix for the bug. > > I don't think experiments were intended for this. None-the-less why don't we > change the default behavior always with no experiment? You even say this in the > bug. I disagree with this not being what experiments were intended for. Experiments are for when a change is being made and we want to see the effects by controlling as many other variables as possible. This seems like it's exactly what's being done here. I don't think a meaningful comparison can be made across canary versions, there's too many other moving parts. I think the benefits from the experimental data for guiding future efforts is more valuable than the cost of not immediately pushing a beneficial change to all users.
On 2016/05/02 22:53:36, sky wrote: > On 2016/05/02 20:49:47, grt wrote: > > On 2016/05/02 20:30:53, sky wrote: > > > On 2016/05/02 20:23:51, sky wrote: > > > > On 2016/05/02 19:36:51, Patrick Monette wrote: > > > > > On 2016/05/02 15:14:17, sky wrote: > > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> > wrote: > > > > > > > On 2016/04/29 17:32:33, sky wrote: > > > > > > >> Why are we putting this behind an experiment? > > > > > > > > > > > > > > We've identified the shutdown behavior for this task as a probable > > cause > > > > of > > > > > > > unclean shutdowns and want to validate that hypothesis through an > > > > > > > experiment. > > > > > > > Simply looking at CPM wouldn't allow validation due to high version > to > > > > > > > version > > > > > > > noise. > > > > > > > > > > > > This seems like an awful way to draw conclusions about a particular > > > > > > code path. If we think open item is the culprit why can't we > > > > > > investigate that code path to figure out what it may be doing? > > > > > > > > > > > It's because we are not trying to validate if the fix is correct, we are > > > > trying > > > > > to validate the impact of that change. ("We" here means manzagop, siggi > > and > > > > I.) > > > > > > > > > > As part of our investigation on browser hangs, we're received a large > > number > > > > of > > > > > hang > > > > > reports and we are interested in correlating the number of report > received > > > to > > > > a > > > > > CPM > > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang > reports > > > > > directly > > > > > translate to Y CPM. > > > > > > > > > > With this correlation, we could more easily convince other people that > > their > > > > > code is responsible for > > > > > a number of hangs (with a precise number) and that they have to fix it. > > > > > > > > > > The end goal for our project would be to have hang crash reports to be > > > triaged > > > > > the same way a normal crash is, as fixing all hangs is too big of a > > problem > > > > for > > > > > one team to fix. > > > > > Our reports are not sufficiently actionable for this yet. > > > > > > > > > > > > >> Also, do we have test coverage for both ways? > > > > > > > > > > > > > > We don't and I'm not sure there is a good way to test for that. The > > > > current > > > > > > > unit > > > > > > > tests doesn't test the shutdown behavior (always expect the task to > > run > > > > > > > fully) > > > > > > > so running a parameterized test on the value of shutdown_mode will > not > > > > > > > execute > > > > > > > any different code path. > > > > > > > > > > > > Why can't we right a test for this? Perhaps a browser test is needed > > > > > > to get the right coverage? > > > > > > > > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN is > > already > > > > > tested > > > > > in sequenced_worker_pool_unittests.cc. I think we should trust that this > > is > > > > > doing > > > > > the right thing already. > > > > > > > > I'm asking if we have coverage of OpenItem with the different shutdown > > > behavior. > > > > > > > > > > > > > > > > -Scott > > > > > > > > > > > > -- > > > > > > You received this message because you are subscribed to the Google > > Groups > > > > > > "Chromium-reviews" group. > > > > > > To unsubscribe from this group and stop receiving emails from it, send > > an > > > > > email > > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > > > > > > > > > > Let me know if you still have more questions. > > > > > > And again, we shouldn't be doing an experiment for this. We should > understand > > > what is going wrong and fix the bug. > > > > Hi Scott. I've been sitting near these guys, so I hope I can add a little > color > > to help explain why they're doing what they're doing. The goal of the > experiment > > is not to figure out how to fix this bug, but rather to see how CPM changes > when > > a collection of shutdown blockers aren't blocking shutdown. Patrick and P-A > have > > collected hang reports demonstrating that this task (among others) is blocking > > shutdown in certain cases. If I had to take a wild guess, I'd say > ShellExecuteEx > > blocks while the user interacts with one of the various Windows security > dialogs > > (e.g., "this came from the interwebs, are you sure you want to run it?"). This > > isn't what this particular CL is addressing, though. This CL is for a > temporary > > experiment on canary and dev to answer the question "how much does CPM move > when > > a group of shutdown blockers don't block shutdown." In the long run, this > > experiment will be removed and we'll find and test a proper fix for the bug. > > I don't think experiments were intended for this. None-the-less why don't we > change the default behavior always with no experiment? The reason to not change the behavior without the experiment is because that will prevent us from knowing how much of a difference the fix made. We think it's valuable to have solid numbers so that we can properly prioritize other hang-related fixes. As for the change itself, I've looked at some hang stacks with Patrick and am now less convinced that we should make this change. ;-) Some of the stacks make it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user experience. I've updated the bug accordingly.
Given grt's comment can we close this out now? On Tue, May 3, 2016 at 7:59 AM, <grt@chromium.org> wrote: > On 2016/05/02 22:53:36, sky wrote: >> On 2016/05/02 20:49:47, grt wrote: >> > On 2016/05/02 20:30:53, sky wrote: >> > > On 2016/05/02 20:23:51, sky wrote: >> > > > On 2016/05/02 19:36:51, Patrick Monette wrote: >> > > > > On 2016/05/02 15:14:17, sky wrote: >> > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> >> wrote: >> > > > > > > On 2016/04/29 17:32:33, sky wrote: >> > > > > > >> Why are we putting this behind an experiment? >> > > > > > > >> > > > > > > We've identified the shutdown behavior for this task as a >> > > > > > > probable >> > cause >> > > > of >> > > > > > > unclean shutdowns and want to validate that hypothesis through >> > > > > > > an >> > > > > > > experiment. >> > > > > > > Simply looking at CPM wouldn't allow validation due to high > version >> to >> > > > > > > version >> > > > > > > noise. >> > > > > > >> > > > > > This seems like an awful way to draw conclusions about a >> > > > > > particular >> > > > > > code path. If we think open item is the culprit why can't we >> > > > > > investigate that code path to figure out what it may be doing? >> > > > > > >> > > > > It's because we are not trying to validate if the fix is correct, >> > > > > we > are >> > > > trying >> > > > > to validate the impact of that change. ("We" here means manzagop, > siggi >> > and >> > > > I.) >> > > > > >> > > > > As part of our investigation on browser hangs, we're received a >> > > > > large >> > number >> > > > of >> > > > > hang >> > > > > reports and we are interested in correlating the number of report >> received >> > > to >> > > > a >> > > > > CPM >> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang >> reports >> > > > > directly >> > > > > translate to Y CPM. >> > > > > >> > > > > With this correlation, we could more easily convince other people >> > > > > that >> > their >> > > > > code is responsible for >> > > > > a number of hangs (with a precise number) and that they have to >> > > > > fix > it. >> > > > > >> > > > > The end goal for our project would be to have hang crash reports >> > > > > to be >> > > triaged >> > > > > the same way a normal crash is, as fixing all hangs is too big of >> > > > > a >> > problem >> > > > for >> > > > > one team to fix. >> > > > > Our reports are not sufficiently actionable for this yet. >> > > > > > >> > > > > > >> Also, do we have test coverage for both ways? >> > > > > > > >> > > > > > > We don't and I'm not sure there is a good way to test for >> > > > > > > that. > The >> > > > current >> > > > > > > unit >> > > > > > > tests doesn't test the shutdown behavior (always expect the >> > > > > > > task > to >> > run >> > > > > > > fully) >> > > > > > > so running a parameterized test on the value of shutdown_mode >> > > > > > > will >> not >> > > > > > > execute >> > > > > > > any different code path. >> > > > > > >> > > > > > Why can't we right a test for this? Perhaps a browser test is >> > > > > > needed >> > > > > > to get the right coverage? >> > > > > > >> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN >> > > > > is >> > already >> > > > > tested >> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust >> > > > > that > this >> > is >> > > > > doing >> > > > > the right thing already. >> > > > >> > > > I'm asking if we have coverage of OpenItem with the different >> > > > shutdown >> > > behavior. >> > > > >> > > > > > >> > > > > > -Scott >> > > > > > >> > > > > > -- >> > > > > > You received this message because you are subscribed to the >> > > > > > Google >> > Groups >> > > > > > "Chromium-reviews" group. >> > > > > > To unsubscribe from this group and stop receiving emails from >> > > > > > it, > send >> > an >> > > > > email >> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. >> > > > > > >> > > > > >> > > > > Let me know if you still have more questions. >> > > >> > > And again, we shouldn't be doing an experiment for this. We should >> understand >> > > what is going wrong and fix the bug. >> > >> > Hi Scott. I've been sitting near these guys, so I hope I can add a >> > little >> color >> > to help explain why they're doing what they're doing. The goal of the >> experiment >> > is not to figure out how to fix this bug, but rather to see how CPM >> > changes >> when >> > a collection of shutdown blockers aren't blocking shutdown. Patrick and >> > P-A >> have >> > collected hang reports demonstrating that this task (among others) is > blocking >> > shutdown in certain cases. If I had to take a wild guess, I'd say >> ShellExecuteEx >> > blocks while the user interacts with one of the various Windows security >> dialogs >> > (e.g., "this came from the interwebs, are you sure you want to run >> > it?"). > This >> > isn't what this particular CL is addressing, though. This CL is for a >> temporary >> > experiment on canary and dev to answer the question "how much does CPM >> > move >> when >> > a group of shutdown blockers don't block shutdown." In the long run, >> > this >> > experiment will be removed and we'll find and test a proper fix for the >> > bug. >> >> I don't think experiments were intended for this. None-the-less why don't >> we >> change the default behavior always with no experiment? > > The reason to not change the behavior without the experiment is because that > will prevent us from knowing how much of a difference the fix made. We think > it's valuable to have solid numbers so that we can properly prioritize other > hang-related fixes. > > As for the change itself, I've looked at some hang stacks with Patrick and > am > now less convinced that we should make this change. ;-) Some of the stacks > make > it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user > experience. > I've updated the bug accordingly. > > https://codereview.chromium.org/1912303002/ -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Message was sent while issue was closed.
On 2016/05/03 16:41:43, sky wrote: > Given grt's comment can we close this out now? > > On Tue, May 3, 2016 at 7:59 AM, <mailto:grt@chromium.org> wrote: > > On 2016/05/02 22:53:36, sky wrote: > >> On 2016/05/02 20:49:47, grt wrote: > >> > On 2016/05/02 20:30:53, sky wrote: > >> > > On 2016/05/02 20:23:51, sky wrote: > >> > > > On 2016/05/02 19:36:51, Patrick Monette wrote: > >> > > > > On 2016/05/02 15:14:17, sky wrote: > >> > > > > > On Fri, Apr 29, 2016 at 1:18 PM, <mailto:pmonette@chromium.org> > >> wrote: > >> > > > > > > On 2016/04/29 17:32:33, sky wrote: > >> > > > > > >> Why are we putting this behind an experiment? > >> > > > > > > > >> > > > > > > We've identified the shutdown behavior for this task as a > >> > > > > > > probable > >> > cause > >> > > > of > >> > > > > > > unclean shutdowns and want to validate that hypothesis through > >> > > > > > > an > >> > > > > > > experiment. > >> > > > > > > Simply looking at CPM wouldn't allow validation due to high > > version > >> to > >> > > > > > > version > >> > > > > > > noise. > >> > > > > > > >> > > > > > This seems like an awful way to draw conclusions about a > >> > > > > > particular > >> > > > > > code path. If we think open item is the culprit why can't we > >> > > > > > investigate that code path to figure out what it may be doing? > >> > > > > > > >> > > > > It's because we are not trying to validate if the fix is correct, > >> > > > > we > > are > >> > > > trying > >> > > > > to validate the impact of that change. ("We" here means manzagop, > > siggi > >> > and > >> > > > I.) > >> > > > > > >> > > > > As part of our investigation on browser hangs, we're received a > >> > > > > large > >> > number > >> > > > of > >> > > > > hang > >> > > > > reports and we are interested in correlating the number of report > >> received > >> > > to > >> > > > a > >> > > > > CPM > >> > > > > number. e.g. A bucket (bucketed by magic signature) of X% of hang > >> reports > >> > > > > directly > >> > > > > translate to Y CPM. > >> > > > > > >> > > > > With this correlation, we could more easily convince other people > >> > > > > that > >> > their > >> > > > > code is responsible for > >> > > > > a number of hangs (with a precise number) and that they have to > >> > > > > fix > > it. > >> > > > > > >> > > > > The end goal for our project would be to have hang crash reports > >> > > > > to be > >> > > triaged > >> > > > > the same way a normal crash is, as fixing all hangs is too big of > >> > > > > a > >> > problem > >> > > > for > >> > > > > one team to fix. > >> > > > > Our reports are not sufficiently actionable for this yet. > >> > > > > > > >> > > > > > >> Also, do we have test coverage for both ways? > >> > > > > > > > >> > > > > > > We don't and I'm not sure there is a good way to test for > >> > > > > > > that. > > The > >> > > > current > >> > > > > > > unit > >> > > > > > > tests doesn't test the shutdown behavior (always expect the > >> > > > > > > task > > to > >> > run > >> > > > > > > fully) > >> > > > > > > so running a parameterized test on the value of shutdown_mode > >> > > > > > > will > >> not > >> > > > > > > execute > >> > > > > > > any different code path. > >> > > > > > > >> > > > > > Why can't we right a test for this? Perhaps a browser test is > >> > > > > > needed > >> > > > > > to get the right coverage? > >> > > > > > > >> > > > > The difference between CONTINUE_ON_SHUTDOWN and SKIP_ON_SHUTDOWN > >> > > > > is > >> > already > >> > > > > tested > >> > > > > in sequenced_worker_pool_unittests.cc. I think we should trust > >> > > > > that > > this > >> > is > >> > > > > doing > >> > > > > the right thing already. > >> > > > > >> > > > I'm asking if we have coverage of OpenItem with the different > >> > > > shutdown > >> > > behavior. > >> > > > > >> > > > > > > >> > > > > > -Scott > >> > > > > > > >> > > > > > -- > >> > > > > > You received this message because you are subscribed to the > >> > > > > > Google > >> > Groups > >> > > > > > "Chromium-reviews" group. > >> > > > > > To unsubscribe from this group and stop receiving emails from > >> > > > > > it, > > send > >> > an > >> > > > > email > >> > > > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > >> > > > > > > >> > > > > > >> > > > > Let me know if you still have more questions. > >> > > > >> > > And again, we shouldn't be doing an experiment for this. We should > >> understand > >> > > what is going wrong and fix the bug. > >> > > >> > Hi Scott. I've been sitting near these guys, so I hope I can add a > >> > little > >> color > >> > to help explain why they're doing what they're doing. The goal of the > >> experiment > >> > is not to figure out how to fix this bug, but rather to see how CPM > >> > changes > >> when > >> > a collection of shutdown blockers aren't blocking shutdown. Patrick and > >> > P-A > >> have > >> > collected hang reports demonstrating that this task (among others) is > > blocking > >> > shutdown in certain cases. If I had to take a wild guess, I'd say > >> ShellExecuteEx > >> > blocks while the user interacts with one of the various Windows security > >> dialogs > >> > (e.g., "this came from the interwebs, are you sure you want to run > >> > it?"). > > This > >> > isn't what this particular CL is addressing, though. This CL is for a > >> temporary > >> > experiment on canary and dev to answer the question "how much does CPM > >> > move > >> when > >> > a group of shutdown blockers don't block shutdown." In the long run, > >> > this > >> > experiment will be removed and we'll find and test a proper fix for the > >> > bug. > >> > >> I don't think experiments were intended for this. None-the-less why don't > >> we > >> change the default behavior always with no experiment? > > > > The reason to not change the behavior without the experiment is because that > > will prevent us from knowing how much of a difference the fix made. We think > > it's valuable to have solid numbers so that we can properly prioritize other > > hang-related fixes. > > > > As for the change itself, I've looked at some hang stacks with Patrick and > > am > > now less convinced that we should make this change. ;-) Some of the stacks > > make > > it clear that CONTINUE_ON_SHUTDOWN will lead to a sub-optimal user > > experience. > > I've updated the bug accordingly. > > > > https://codereview.chromium.org/1912303002/ > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. > Done. |