|
|
Created:
6 years, 5 months ago by Sergey Berezin Modified:
6 years, 5 months ago CC:
chromium-reviews, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Project:
chromium Visibility:
Public. |
DescriptionTemporarily increase retry limit in test_launcher
In bot mode, test_launcher would normally retry tests up to 3 times.
This is sufficient to keep large test suites like browser_tests under
1% flakiness rate if the flakiness of individual tests is also within
1%. However, some tests especially in browser_tests are up to 33%
flaky, making the browser_tests suite 40% flaky.
Subsequent top-level retries effectively double the runtime of all
bots running these tests, and we need a solution to shorten the try
job runtime.
A longer-term solution is the ignorer bot (see the bug), but until
it's deployed, a quick hack is to increase the number of retries so
statistically we should see about 1% flakiness of browser_tests at the
expense of a slight runtime increase.
R=sergiyb@chromium.org
BUG=390600, 395189
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=284650
Patch Set 1 #
Messages
Total messages: 21 (0 generated)
PTAL. I'll run a few bots with browser_tests to see how this behaves before trying to commit.
lgtm
+agable for rubber stamp, please
On 2014/07/17 18:17:54, Sergey Berezin wrote: > +agable for rubber stamp, please As part of the code yellow, we're trying to get rid of retries, not put even more of them in place. I realize that this reduces the likelihood of having to retry the entire browser_tests binary, but it still feels like a step in the wrong direction. Does this CL have buy-in from phajdan@, or people leading aspects of the code yellow effort? At the very least, I wouldn't approve this CL unless the number of times a given test ran is part of the information reported to the flakiness dashboard. (I don't know if it is or not -- please inform me.) Either way, this is going to make truly failing tests take a heck of a lot longer to fail :(
On 2014/07/17 23:37:03, agable wrote: > On 2014/07/17 18:17:54, Sergey Berezin wrote: > > +agable for rubber stamp, please > > As part of the code yellow, we're trying to get rid of retries, not put even > more of them in place. I realize that this reduces the likelihood of having to > retry the entire browser_tests binary, but it still feels like a step in the > wrong direction. Does this CL have buy-in from phajdan@, or people leading > aspects of the code yellow effort? > > At the very least, I wouldn't approve this CL unless the number of times a given > test ran is part of the information reported to the flakiness dashboard. (I > don't know if it is or not -- please inform me.) > > Either way, this is going to make truly failing tests take a heck of a lot > longer to fail :( phajdan.jr is OOO till Monday. Regardless, I'm basing this CL on the data we've been collecting for the past few weeks. Some facts: 1. If more than 10% of tests fail in a suite / shard, they are not retried, and the entire suite / shard fails. 2. Each test normally runs under 2 sec, only a handful are about 10 sec. Assuming the worst, we'll retry 10% of tests 12 times, resulting in additional 120% runtime increase. That is, a failing browser_test step will take 80-100 minutes instead of 40 min in the worst case. My preliminary data suggests that less than 10% of CLs will fail because of browser_tests, and most of those failures will not hit the worst case degradation. 3. Currently, browser_tests flakiness is about 40%, and bots like win_chromium_rel are at ~20% flakiness total. This means 40% of good CLs now take 2x time retrying the entire suite. This change should bring browser_tests flakiness down to ~1%, reducing latency for these 40%. So, it's 40% of good CLs taking 2x time vs. <10% of failing CLs taking slightly longer, and only in the worst case 2x. On top of it, the ignorer bot will be online in a week or two, this change will be reverted and we'll have the same benefit without any extra penalty.
On 2014/07/18 00:05:22, Sergey Berezin wrote: > On 2014/07/17 23:37:03, agable wrote: > > On 2014/07/17 18:17:54, Sergey Berezin wrote: > > > +agable for rubber stamp, please > > > > As part of the code yellow, we're trying to get rid of retries, not put even > > more of them in place. I realize that this reduces the likelihood of having to > > retry the entire browser_tests binary, but it still feels like a step in the > > wrong direction. Does this CL have buy-in from phajdan@, or people leading > > aspects of the code yellow effort? > > > > At the very least, I wouldn't approve this CL unless the number of times a > given > > test ran is part of the information reported to the flakiness dashboard. (I > > don't know if it is or not -- please inform me.) > > > > Either way, this is going to make truly failing tests take a heck of a lot > > longer to fail :( > > phajdan.jr is OOO till Monday. Regardless, I'm basing this CL on the data we've > been collecting for the past few weeks. Some facts: > > 1. If more than 10% of tests fail in a suite / shard, they are not retried, and > the entire suite / shard fails. > 2. Each test normally runs under 2 sec, only a handful are about 10 sec. > Assuming the worst, we'll retry 10% of tests 12 times, resulting in additional > 120% runtime increase. That is, a failing browser_test step will take 80-100 > minutes instead of 40 min in the worst case. My preliminary data suggests that > less than 10% of CLs will fail because of browser_tests, and most of those > failures will not hit the worst case degradation. > 3. Currently, browser_tests flakiness is about 40%, and bots like > win_chromium_rel are at ~20% flakiness total. This means 40% of good CLs now > take 2x time retrying the entire suite. This change should bring browser_tests > flakiness down to ~1%, reducing latency for these 40%. > > So, it's 40% of good CLs taking 2x time vs. <10% of failing CLs taking slightly > longer, and only in the worst case 2x. > > On top of it, the ignorer bot will be online in a week or two, this change will > be reverted and we'll have the same benefit without any extra penalty. LGTM as a temporary measure. Please link to a bug specifically to revert this CL, so that we're tracking the fact that this is temporary with more than just a TODO.
The bug to revert it is http://crbug.com/395189 On 2014/07/18 15:28:07, agable wrote: > LGTM as a temporary measure. Please link to a bug specifically to revert this > CL, so that we're tracking the fact that this is temporary with more than just a > TODO.
The CQ bit was checked by sergeyberezin@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sergeyberezin@chromium.org/393283006/1
+phajdan.jr for OWNERS stamp.
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: chromium_presubmit on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...)
The CQ bit was checked by sergiyb@chromium.org
The CQ bit was unchecked by sergiyb@chromium.org
LGTM
The CQ bit was checked by phajdan.jr@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sergeyberezin@chromium.org/393283006/1
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: win_gpu_triggered_tests on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/win_gpu_triggered...)
Message was sent while issue was closed.
Change committed as 284650
Message was sent while issue was closed.
On 2014/07/18 00:05:22, Sergey Berezin wrote: > On 2014/07/17 23:37:03, agable wrote: > > On 2014/07/17 18:17:54, Sergey Berezin wrote: > > > +agable for rubber stamp, please > > > > As part of the code yellow, we're trying to get rid of retries, not put even > > more of them in place. I realize that this reduces the likelihood of having to > > retry the entire browser_tests binary, but it still feels like a step in the > > wrong direction. Does this CL have buy-in from phajdan@, or people leading > > aspects of the code yellow effort? > > > > At the very least, I wouldn't approve this CL unless the number of times a > given > > test ran is part of the information reported to the flakiness dashboard. (I > > don't know if it is or not -- please inform me.) > > > > Either way, this is going to make truly failing tests take a heck of a lot > > longer to fail :( > > phajdan.jr is OOO till Monday. Regardless, I'm basing this CL on the data we've > been collecting for the past few weeks. Some facts: > > 1. If more than 10% of tests fail in a suite / shard, they are not retried, and > the entire suite / shard fails. > 2. Each test normally runs under 2 sec, only a handful are about 10 sec. > Assuming the worst, we'll retry 10% of tests 12 times, resulting in additional > 120% runtime increase. That is, a failing browser_test step will take 80-100 > minutes instead of 40 min in the worst case. My preliminary data suggests that > less than 10% of CLs will fail because of browser_tests, and most of those > failures will not hit the worst case degradation. > 3. Currently, browser_tests flakiness is about 40%, Where is this number from? I haven't seen that before, and I've been looking at the try bots output for a while. I could see that this number is referring to the number of green browser_tests runs where a test failed in its first run but passed in one of the three retries. > and bots like > win_chromium_rel are at ~20% flakiness total. This means 40% of good CLs now > take 2x time retrying the entire suite. ditto, I've never seen rates that high. I think that's what we would have if browser_tests never retried, but that isn't the case now. > This change should bring browser_tests > flakiness down to ~1%, reducing latency for these 40%. > > So, it's 40% of good CLs taking 2x time vs. <10% of failing CLs taking slightly > longer, and only in the worst case 2x. > > On top of it, the ignorer bot will be online in a week or two, this change will > be reverted and we'll have the same benefit without any extra penalty. |