Issue 10022008: Fix flakiness in MetricsServiceTest.CrashRenderers

Ilya Sherman

8 years, 8 months ago (2012-04-07 01:22:09 UTC) #1

jar (doing other things)

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc#newcode88 chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); This pattern always scares me... as it is ...

8 years, 8 months ago (2012-04-07 02:46:48 UTC) #2

Ilya Sherman

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc#newcode88 chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); On 2012/04/07 02:46:48, jar wrote: > This pattern ...

8 years, 8 months ago (2012-04-10 01:00:25 UTC) #3

jar (doing other things)

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc#newcode88 chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); The reason I was after outputting something is ...

8 years, 8 months ago (2012-04-10 01:32:21 UTC) #4

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics...
File chrome/browser/metrics/metrics_service_browsertest.cc (right):

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics...
chrome/browser/metrics/metrics_service_browsertest.cc:88:
ui_test_utils::RunAllPendingInMessageLoop();
The reason I was after outputting something is that IF we get killed by the
infrastructure for "taking too long" we might not get to tell anyone that it was
*this* test that hung.  So long as you're sure we can tell it was *this* test
that fails (since we're designing it to fail via a multi-minute infrastructure
kill), I'm fine with it as is.  I just hate when I get some funky "browser test
failed and timed out" but I don't know what caused it :-(.

On 2012/04/10 01:00:25, Ilya Sherman wrote:
> On 2012/04/07 02:46:48, jar wrote:
> > This pattern always scares me... as it is most commonly used to run a nested
> > message loop :-(.... which is evil.  In this case, I *think* you are not
> nested,
> > so I'm less scared... but..
> > 
> > In this case, you've indicated there is probably a race for two threads to
> post
> > into the message loop, and one caused the message loop to terminate,
> presumably
> > before the other one got its (notification) message posted.
> > 
> > Sadly, this RunAllPendingInMessageLoop() *might* execute before the delayed
> > notification message arrives, if your race scenario is correct.  <sigh>
> > 
> > How about looping, calling RunAllPendingInMessageLoop() again and again,
until
> > we get the expected kStabilitityLaunchCount (if that is the one we are
> expecting
> > to be slow to grow?  If we printed out each time with cycled, then we'd only
> > fail if we never got the notification... and the we'd fail in a specific
test,
> > with a LOT of output (meaning it would make it to the log ;-) ).
> > 
> > WDYT?
> > 
> > Am I missing something??
> 
> I think you're absolutely right.  Added the loop.  I don't think adding extra
> logging is very helpful though... if the crash count is never incremented, the
> test will eventually time out, and we'll know that this hasn't fully fixed the
> race.  I'm not sure what we'd gain from knowing how many times we looped
before
> timing out.

Ilya Sherman

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc File chrome/browser/metrics/metrics_service_browsertest.cc (right): https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics/metrics_service_browsertest.cc#newcode88 chrome/browser/metrics/metrics_service_browsertest.cc:88: ui_test_utils::RunAllPendingInMessageLoop(); On 2012/04/10 01:32:22, jar wrote: > The reason ...

8 years, 8 months ago (2012-04-10 01:33:53 UTC) #5

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics...
File chrome/browser/metrics/metrics_service_browsertest.cc (right):

https://chromiumcodereview.appspot.com/10022008/diff/1/chrome/browser/metrics...
chrome/browser/metrics/metrics_service_browsertest.cc:88:
ui_test_utils::RunAllPendingInMessageLoop();
On 2012/04/10 01:32:22, jar wrote:
> The reason I was after outputting something is that IF we get killed by the
> infrastructure for "taking too long" we might not get to tell anyone that it
was
> *this* test that hung.  So long as you're sure we can tell it was *this* test
> that fails (since we're designing it to fail via a multi-minute infrastructure
> kill), I'm fine with it as is.  I just hate when I get some funky "browser
test
> failed and timed out" but I don't know what caused it :-(.

Yes, the test logs definitely clearly indicate which test it was that timed out.
 That information doesn't always successfully make it into the failure summary,
but if you look at the logs, it's quite clear :)

> On 2012/04/10 01:00:25, Ilya Sherman wrote:
> > On 2012/04/07 02:46:48, jar wrote:
> > > This pattern always scares me... as it is most commonly used to run a
nested
> > > message loop :-(.... which is evil.  In this case, I *think* you are not
> > nested,
> > > so I'm less scared... but..
> > > 
> > > In this case, you've indicated there is probably a race for two threads to
> > post
> > > into the message loop, and one caused the message loop to terminate,
> > presumably
> > > before the other one got its (notification) message posted.
> > > 
> > > Sadly, this RunAllPendingInMessageLoop() *might* execute before the
delayed
> > > notification message arrives, if your race scenario is correct.  <sigh>
> > > 
> > > How about looping, calling RunAllPendingInMessageLoop() again and again,
> until
> > > we get the expected kStabilitityLaunchCount (if that is the one we are
> > expecting
> > > to be slow to grow?  If we printed out each time with cycled, then we'd
only
> > > fail if we never got the notification... and the we'd fail in a specific
> test,
> > > with a LOT of output (meaning it would make it to the log ;-) ).
> > > 
> > > WDYT?
> > > 
> > > Am I missing something??
> > 
> > I think you're absolutely right.  Added the loop.  I don't think adding
extra
> > logging is very helpful though... if the crash count is never incremented,
the
> > test will eventually time out, and we'll know that this hasn't fully fixed
the
> > race.  I'm not sure what we'd gain from knowing how many times we looped
> before
> > timing out.
>

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/isherman@chromium.org/10022008/8001

8 years, 8 months ago (2012-04-10 01:52:15 UTC) #7

Change committed as 131534

Issue 10022008: Fix flakiness in MetricsServiceTest.CrashRenderers (Closed)

Description

Patch Set 1 #

Patch Set 2 : Loop to avoid the race condition #

Messages