Issue 563243002: Implemented a flaky test runner for auto-bisect bot.

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flaky_test.py File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flaky_test.py#newcode60 tools/flakiness/is_flaky_test.py:60: flakiness = 0 On 2014/09/12 16:42:59, qyearsley wrote: > ...

6 years, 3 months ago (2014-09-12 16:44:26 UTC) #3

Sergiy Byelozyorov

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flaky_test.py File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flaky_test.py#newcode2 tools/flakiness/is_flaky_test.py:2: # Copyright (c) 2011 The Chromium Authors. All rights ...

6 years, 3 months ago (2014-09-15 08:12:14 UTC) #4

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
File tools/flakiness/is_flaky_test.py (right):

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:2: # Copyright (c) 2011 The Chromium Authors.
All rights reserved.
On 2014/09/12 16:42:59, qyearsley wrote:
> Copyright 2014

Done.

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:7: if the flakiness is higher than the
specified threshold."""
On 2014/09/12 16:42:59, qyearsley wrote:
> Maybe "failure rate" seems like a better-defined term than "flakiness"?

Done.

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:15: parser =
argparse.ArgumentParser(description=sys.modules['__main__'].__doc__)
On 2014/09/12 16:42:59, qyearsley wrote:
> What's the difference between `sys.modules['__main__'].__doc__` and just
> `__doc__`?

Done.

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:60: flakiness = 0
On 2014/09/12 16:44:26, qyearsley wrote:
> On 2014/09/12 16:42:59, qyearsley wrote:
> > So 100% failure is defined as having a flakiness of 0, but 1 failure out of
> 1000
> > is a flakiness of 0.999?
> 
> I meant to write "999 failures out of 1000 has a flakiness of 0.999"...

Yes, if a test failing all the time then it is a failing test, not a flaky test.
During bisect we should  skip over commits that have broken the test. For
example these are the numbers of failures for a sequence of commits that we may
get:

10 14 12 30 1000 1000 12 21 45 312 231 343 342

Clearly the test was broken at 5th commit and was fixed at 7th. However, at 10th
commit it has become very flaky (>5%) and bisect should look for this commit.
Another example is when a test was broken and then fixed, but badly:

10 14 12 30 1000 1000 312 231 343 342

In this case script will correctly identify that a commit introducing flakiness
is 7th commit rather than 5th. A failure-bisect however will find the 5th
commit.

https://codereview.chromium.org/563243002/diff/120001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:62: flakiness = num_failed / options.retries
On 2014/09/12 16:42:59, qyearsley wrote:
> num_failed and options.retries are both ints here, so integer division is
used. 

Nice catch. Done.

qyearsley

lgtm https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flaky_test.py File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flaky_test.py#newcode29 tools/flakiness/is_flaky_test.py:29: running[:] = [p for p in running if ...

6 years, 3 months ago (2014-09-15 16:02:28 UTC) #5

Sergiy Byelozyorov

https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flaky_test.py File tools/flakiness/is_flaky_test.py (right): https://codereview.chromium.org/563243002/diff/140001/tools/flakiness/is_flaky_test.py#newcode29 tools/flakiness/is_flaky_test.py:29: running[:] = [p for p in running if p.poll() ...

6 years, 3 months ago (2014-09-15 16:27:29 UTC) #6

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/563243002/160001

6 years, 3 months ago (2014-09-15 16:29:13 UTC) #8

commit-bot: I haz the power

Committed patchset #9 (id:160001) as 6dddbe7fe09c030015e3b2f3f1d2851cf08544b7

6 years, 3 months ago (2014-09-15 18:50:48 UTC) #9

commit-bot: I haz the power

Patchset 9 (id:??) landed as https://crrev.com/923f4e6a68e610a4d83fda194815f760461aa7dd Cr-Commit-Position: refs/heads/master@{#294858}

6 years, 3 months ago (2014-09-15 19:03:39 UTC) #10

ojan

Please add tests. All new code should have tests. That's harder to ask for buildbot ...

6 years, 3 months ago (2014-09-15 19:59:53 UTC) #12

qyearsley

On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote: > Please add tests. All new code should have tests. ...

6 years, 3 months ago (2014-09-15 20:17:54 UTC) #13

Message was sent while issue was closed.

On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote:
> Please add tests. All new code should have tests. That's harder to ask for
> buildbot code, but this code is all new and doesn't depend on any libraries.
It
> should have 100% test coverage.
> 
> Also, please CC me on patches related to auto-bisect so I can do these reviews
> before the patch lands.
> 
> Lastly, can we focus our efforts on getting non-flaky bisect working first?
I'm
> worried this is distracting from that effort and this work won't be useful
until
> we have the non-flaky bisect working.
> 
>
https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
> File tools/flakiness/is_flaky_test.py (right):
> 
>
https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
> tools/flakiness/is_flaky_test.py:23: parser.add_argument('command', nargs='+',
> help='Command to run test.')
> This is a good start. Eventually we're going to need something more
complicated
> for tests that are flaky only when run in conjunction with other tests. But we
> don't need to worry about that until we have at least the basic system in
place.
> 
>
https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
> tools/flakiness/is_flaky_test.py:45:
> running.append(subprocess.Popen(options.command, stdout=subprocess.PIPE,
> Use multiprocessing.Pool. That will replace a good chunk of the code in this
> file.
> 
>
https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
> tools/flakiness/is_flaky_test.py:50: num_passed, num_failed =
> process_finished(running, num_passed, num_failed)
> I think this code would be easier to follow if you didn't pass
> num_passed/num_failed in. Instead just return num_passed/num_failed from
> process_finished and add to the local num_passed/num_failed here.


Good ideas -- since this CL has been submitted, I guess now Sergiy can add a
follow-up CL that adds the unit test and makes other changes?

Sergiy Byelozyorov

6 years, 3 months ago (2014-09-16 14:27:07 UTC) #14

Message was sent while issue was closed.

Added tests and rewrote using multiprocessing. See
https://codereview.chromium.org/560123004.

On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote:
> Lastly, can we focus our efforts on getting non-flaky bisect working first?
I'm
> worried this is distracting from that effort and this work won't be useful
until
> we have the non-flaky bisect working.

According to qyearsley@, the auto-bisect project is sufficiently stuffed - there
is no major pieces of work to be done that is not being done by someone else.
Apart from that, for CQ we need to build tools to reduce flakiness and this is
high priority as it blocks other CY pillars.

https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
File tools/flakiness/is_flaky_test.py (right):

https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:23: parser.add_argument('command', nargs='+',
help='Command to run test.')
On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote:
> This is a good start. Eventually we're going to need something more
complicated
> for tests that are flaky only when run in conjunction with other tests. But we
> don't need to worry about that until we have at least the basic system in
place.

Acknowledged.

https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:45:
running.append(subprocess.Popen(options.command, stdout=subprocess.PIPE,
On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote:
> Use multiprocessing.Pool. That will replace a good chunk of the code in this
> file.

Done.

https://codereview.chromium.org/563243002/diff/160001/tools/flakiness/is_flak...
tools/flakiness/is_flaky_test.py:50: num_passed, num_failed =
process_finished(running, num_passed, num_failed)
On 2014/09/15 19:59:53, ojan-only-code-yellow-reviews wrote:
> I think this code would be easier to follow if you didn't pass
> num_passed/num_failed in. Instead just return num_passed/num_failed from
> process_finished and add to the local num_passed/num_failed here.

Acknowledged.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Issue 563243002: Implemented a flaky test runner for auto-bisect bot. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Replaced optparse with argparse #

Patch Set 3 : There was more to it than just changing optparse to argparse :-) #

Patch Set 4 : Removed debug statement #

Patch Set 5 : #

Patch Set 6 : #

Patch Set 7 : #

Patch Set 8 : Review #

Patch Set 9 : #

Messages