|
|
Created:
6 years, 3 months ago by Sergiy Byelozyorov Modified:
6 years, 3 months ago CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
DescriptionImplemented a flaky test runner for auto-bisect bot.
BUG=387410
R=qyearsley@chromium.org
Committed: https://crrev.com/923f4e6a68e610a4d83fda194815f760461aa7dd
Cr-Commit-Position: refs/heads/master@{#294858}
Patch Set 1 #Patch Set 2 : Replaced optparse with argparse #Patch Set 3 : There was more to it than just changing optparse to argparse :-) #Patch Set 4 : Removed debug statement #Patch Set 5 : #Patch Set 6 : #Patch Set 7 : #
Total comments: 11
Patch Set 8 : Review #
Total comments: 4
Patch Set 9 : #
Total comments: 6
Messages
Total messages: 14 (2 generated)
PTAL
Nice :-) I just tried this out with a script that contained: #!/usr/bin/env python """Randomly exits with a non-zero return code some percentage of the time.""" import random import sys def main(): return 1 if random.random() > 0.5 else 0 if __name__ == '__main__': sys.exit(main()) https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:2: # Copyright (c) 2011 The Chromium Authors. All rights reserved. Copyright 2014 https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:7: if the flakiness is higher than the specified threshold.""" Maybe "failure rate" seems like a better-defined term than "flakiness"? https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:15: parser = argparse.ArgumentParser(description=sys.modules['__main__'].__doc__) What's the difference between `sys.modules['__main__'].__doc__` and just `__doc__`? https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:60: flakiness = 0 So 100% failure is defined as having a flakiness of 0, but 1 failure out of 1000 is a flakiness of 0.999? https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:62: flakiness = num_failed / options.retries num_failed and options.retries are both ints here, so integer division is used.
https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:60: flakiness = 0 On 2014/09/12 16:42:59, qyearsley wrote: > So 100% failure is defined as having a flakiness of 0, but 1 failure out of 1000 > is a flakiness of 0.999? I meant to write "999 failures out of 1000 has a flakiness of 0.999"...
https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:2: # Copyright (c) 2011 The Chromium Authors. All rights reserved. On 2014/09/12 16:42:59, qyearsley wrote: > Copyright 2014 Done. https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:7: if the flakiness is higher than the specified threshold.""" On 2014/09/12 16:42:59, qyearsley wrote: > Maybe "failure rate" seems like a better-defined term than "flakiness"? Done. https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:15: parser = argparse.ArgumentParser(description=sys.modules['__main__'].__doc__) On 2014/09/12 16:42:59, qyearsley wrote: > What's the difference between `sys.modules['__main__'].__doc__` and just > `__doc__`? Done. https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:60: flakiness = 0 On 2014/09/12 16:44:26, qyearsley wrote: > On 2014/09/12 16:42:59, qyearsley wrote: > > So 100% failure is defined as having a flakiness of 0, but 1 failure out of > 1000 > > is a flakiness of 0.999? > > I meant to write "999 failures out of 1000 has a flakiness of 0.999"... Yes, if a test failing all the time then it is a failing test, not a flaky test. During bisect we should skip over commits that have broken the test. For example these are the numbers of failures for a sequence of commits that we may get: 10 14 12 30 1000 1000 12 21 45 312 231 343 342 Clearly the test was broken at 5th commit and was fixed at 7th. However, at 10th commit it has become very flaky (>5%) and bisect should look for this commit. Another example is when a test was broken and then fixed, but badly: 10 14 12 30 1000 1000 312 231 343 342 In this case script will correctly identify that a commit introducing flakiness is 7th commit rather than 5th. A failure-bisect however will find the 5th commit. https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:62: flakiness = num_failed / options.retries On 2014/09/12 16:42:59, qyearsley wrote: > num_failed and options.retries are both ints here, so integer division is used. Nice catch. Done.
lgtm https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:29: running[:] = [p for p in running if p.poll() is None] What's the difference between using slice assignment here and not? i.e. what's the difference between the following two lines? running[:] = [p for p in running if p.poll() is None] running = [p for p in running if p.poll() is None] https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:62: flakiness = num_failed / (options.retries * 1.0) Equivalently, float(options.retries) would do the same thing.
https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:29: running[:] = [p for p in running if p.poll() is None] On 2014/09/15 16:02:28, qyearsley wrote: > What's the difference between using slice assignment here and not? i.e. what's > the difference between the following two lines? > > running[:] = [p for p in running if p.poll() is None] > running = [p for p in running if p.poll() is None] This is to modify the value outside of the function too. Just assignment would just replace the parameter with a different object, while in main the list of running jobs would still contain jobs that are finished (and thus loop will never quit). https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:62: flakiness = num_failed / (options.retries * 1.0) On 2014/09/15 16:02:28, qyearsley wrote: > Equivalently, float(options.retries) would do the same thing. Done.
The CQ bit was checked by sergiyb@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/563243002/160001
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as 6dddbe7fe09c030015e3b2f3f1d2851cf08544b7
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/923f4e6a68e610a4d83fda194815f760461aa7dd Cr-Commit-Position: refs/heads/master@{#294858}
Message was sent while issue was closed.
ojan@chromium.org changed reviewers: + ojan@chromium.org
Message was sent while issue was closed.
Please add tests. All new code should have tests. That's harder to ask for buildbot code, but this code is all new and doesn't depend on any libraries. It should have 100% test coverage. Also, please CC me on patches related to auto-bisect so I can do these reviews before the patch lands. Lastly, can we focus our efforts on getting non-flaky bisect working first? I'm worried this is distracting from that effort and this work won't be useful until we have the non-flaky bisect working. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:23: parser.add_argument('command', nargs='+', help='Command to run test.') This is a good start. Eventually we're going to need something more complicated for tests that are flaky only when run in conjunction with other tests. But we don't need to worry about that until we have at least the basic system in place. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:45: running.append(subprocess.Popen(options.command, stdout=subprocess.PIPE, Use multiprocessing.Pool. That will replace a good chunk of the code in this file. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:50: num_passed, num_failed = process_finished(running, num_passed, num_failed) I think this code would be easier to follow if you didn't pass num_passed/num_failed in. Instead just return num_passed/num_failed from process_finished and add to the local num_passed/num_failed here.
Message was sent while issue was closed.
On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > Please add tests. All new code should have tests. That's harder to ask for > buildbot code, but this code is all new and doesn't depend on any libraries. It > should have 100% test coverage. > > Also, please CC me on patches related to auto-bisect so I can do these reviews > before the patch lands. > > Lastly, can we focus our efforts on getting non-flaky bisect working first? I'm > worried this is distracting from that effort and this work won't be useful until > we have the non-flaky bisect working. > > https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... > File tools/flakiness/is_flaky_test.py (right): > > https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... > tools/flakiness/is_flaky_test.py:23: parser.add_argument('command', nargs='+', > help='Command to run test.') > This is a good start. Eventually we're going to need something more complicated > for tests that are flaky only when run in conjunction with other tests. But we > don't need to worry about that until we have at least the basic system in place. > > https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... > tools/flakiness/is_flaky_test.py:45: > running.append(subprocess.Popen(options.command, stdout=subprocess.PIPE, > Use multiprocessing.Pool. That will replace a good chunk of the code in this > file. > > https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... > tools/flakiness/is_flaky_test.py:50: num_passed, num_failed = > process_finished(running, num_passed, num_failed) > I think this code would be easier to follow if you didn't pass > num_passed/num_failed in. Instead just return num_passed/num_failed from > process_finished and add to the local num_passed/num_failed here. Good ideas -- since this CL has been submitted, I guess now Sergiy can add a follow-up CL that adds the unit test and makes other changes?
Message was sent while issue was closed.
Added tests and rewrote using multiprocessing. See https://codereview.chromium.org/560123004. On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > Lastly, can we focus our efforts on getting non-flaky bisect working first? I'm > worried this is distracting from that effort and this work won't be useful until > we have the non-flaky bisect working. According to qyearsley@, the auto-bisect project is sufficiently stuffed - there is no major pieces of work to be done that is not being done by someone else. Apart from that, for CQ we need to build tools to reduce flakiness and this is high priority as it blocks other CY pillars. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:23: parser.add_argument('command', nargs='+', help='Command to run test.') On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > This is a good start. Eventually we're going to need something more complicated > for tests that are flaky only when run in conjunction with other tests. But we > don't need to worry about that until we have at least the basic system in place. Acknowledged. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:45: running.append(subprocess.Popen(options.command, stdout=subprocess.PIPE, On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > Use multiprocessing.Pool. That will replace a good chunk of the code in this > file. Done. https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak... tools/flakiness/is_flaky_test.py:50: num_passed, num_failed = process_finished(running, num_passed, num_failed) On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > I think this code would be easier to follow if you didn't pass > num_passed/num_failed in. Instead just return num_passed/num_failed from > process_finished and add to the local num_passed/num_failed here. Acknowledged. |