|
|
Created:
8 years, 8 months ago by Ilya Sherman Modified:
8 years, 8 months ago Reviewers:
jar (doing other things) CC:
chromium-reviews, MAD, Ilya Sherman, jar (doing other things) Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionFix flakiness in MetricsServiceTest.CrashRenderers
The test had a race condition because both the test code and the code being tested were listening for the same notification. The results of the test depended on the order in which these notifications were processed.
BUG=18738
TEST=none
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=131534
Patch Set 1 #
Total comments: 4
Patch Set 2 : Loop to avoid the race condition #Messages
Total messages: 8 (0 generated)
https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); This pattern always scares me... as it is most commonly used to run a nested message loop :-(.... which is evil. In this case, I *think* you are not nested, so I'm less scared... but.. In this case, you've indicated there is probably a race for two threads to post into the message loop, and one caused the message loop to terminate, presumably before the other one got its (notification) message posted. Sadly, this RunAllPendingInMessageLoop() *might* execute before the delayed notification message arrives, if your race scenario is correct. <sigh> How about looping, calling RunAllPendingInMessageLoop() again and again, until we get the expected kStabilitityLaunchCount (if that is the one we are expecting to be slow to grow? If we printed out each time with cycled, then we'd only fail if we never got the notification... and the we'd fail in a specific test, with a LOT of output (meaning it would make it to the log ;-) ). WDYT? Am I missing something??
https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); On 2012/04/07 02:46:48, jar wrote: > This pattern always scares me... as it is most commonly used to run a nested > message loop :-(.... which is evil. In this case, I *think* you are not nested, > so I'm less scared... but.. > > In this case, you've indicated there is probably a race for two threads to post > into the message loop, and one caused the message loop to terminate, presumably > before the other one got its (notification) message posted. > > Sadly, this RunAllPendingInMessageLoop() *might* execute before the delayed > notification message arrives, if your race scenario is correct. <sigh> > > How about looping, calling RunAllPendingInMessageLoop() again and again, until > we get the expected kStabilitityLaunchCount (if that is the one we are expecting > to be slow to grow? If we printed out each time with cycled, then we'd only > fail if we never got the notification... and the we'd fail in a specific test, > with a LOT of output (meaning it would make it to the log ;-) ). > > WDYT? > > Am I missing something?? I think you're absolutely right. Added the loop. I don't think adding extra logging is very helpful though... if the crash count is never incremented, the test will eventually time out, and we'll know that this hasn't fully fixed the race. I'm not sure what we'd gain from knowing how many times we looped before timing out.
https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); The reason I was after outputting something is that IF we get killed by the infrastructure for "taking too long" we might not get to tell anyone that it was *this* test that hung. So long as you're sure we can tell it was *this* test that fails (since we're designing it to fail via a multi-minute infrastructure kill), I'm fine with it as is. I just hate when I get some funky "browser test failed and timed out" but I don't know what caused it :-(. On 2012/04/10 01:00:25, Ilya Sherman wrote: > On 2012/04/07 02:46:48, jar wrote: > > This pattern always scares me... as it is most commonly used to run a nested > > message loop :-(.... which is evil. In this case, I *think* you are not > nested, > > so I'm less scared... but.. > > > > In this case, you've indicated there is probably a race for two threads to > post > > into the message loop, and one caused the message loop to terminate, > presumably > > before the other one got its (notification) message posted. > > > > Sadly, this RunAllPendingInMessageLoop() *might* execute before the delayed > > notification message arrives, if your race scenario is correct. <sigh> > > > > How about looping, calling RunAllPendingInMessageLoop() again and again, until > > we get the expected kStabilitityLaunchCount (if that is the one we are > expecting > > to be slow to grow? If we printed out each time with cycled, then we'd only > > fail if we never got the notification... and the we'd fail in a specific test, > > with a LOT of output (meaning it would make it to the log ;-) ). > > > > WDYT? > > > > Am I missing something?? > > I think you're absolutely right. Added the loop. I don't think adding extra > logging is very helpful though... if the crash count is never incremented, the > test will eventually time out, and we'll know that this hasn't fully fixed the > race. I'm not sure what we'd gain from knowing how many times we looped before > timing out.
https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics... chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); On 2012/04/10 01:32:22, jar wrote: > The reason I was after outputting something is that IF we get killed by the > infrastructure for "taking too long" we might not get to tell anyone that it was > *this* test that hung. So long as you're sure we can tell it was *this* test > that fails (since we're designing it to fail via a multi-minute infrastructure > kill), I'm fine with it as is. I just hate when I get some funky "browser test > failed and timed out" but I don't know what caused it :-(. Yes, the test logs definitely clearly indicate which test it was that timed out. That information doesn't always successfully make it into the failure summary, but if you look at the logs, it's quite clear :) > On 2012/04/10 01:00:25, Ilya Sherman wrote: > > On 2012/04/07 02:46:48, jar wrote: > > > This pattern always scares me... as it is most commonly used to run a nested > > > message loop :-(.... which is evil. In this case, I *think* you are not > > nested, > > > so I'm less scared... but.. > > > > > > In this case, you've indicated there is probably a race for two threads to > > post > > > into the message loop, and one caused the message loop to terminate, > > presumably > > > before the other one got its (notification) message posted. > > > > > > Sadly, this RunAllPendingInMessageLoop() *might* execute before the delayed > > > notification message arrives, if your race scenario is correct. <sigh> > > > > > > How about looping, calling RunAllPendingInMessageLoop() again and again, > until > > > we get the expected kStabilitityLaunchCount (if that is the one we are > > expecting > > > to be slow to grow? If we printed out each time with cycled, then we'd only > > > fail if we never got the notification... and the we'd fail in a specific > test, > > > with a LOT of output (meaning it would make it to the log ;-) ). > > > > > > WDYT? > > > > > > Am I missing something?? > > > > I think you're absolutely right. Added the loop. I don't think adding extra > > logging is very helpful though... if the crash count is never incremented, the > > test will eventually time out, and we'll know that this hasn't fully fixed the > > race. I'm not sure what we'd gain from knowing how many times we looped > before > > timing out. >
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/isherman@chromium.org/10022008/8001
Change committed as 131534 |