OLD | NEW |
(Empty) | |
| 1 |
| 2 # Fixing layout test flakiness |
| 3 |
| 4 We'd like to stamp out all the tests that have ordering dependencies. This helps |
| 5 make the tests more reliable and, eventually, will make it so we can run tests |
| 6 in a random order and avoid new ordering dependencies being introduced. To get |
| 7 there, we need to weed out and fix all the existing ordering dependencies. |
| 8 |
| 9 ## Diagnosing test ordering flakiness |
| 10 |
| 11 These are steps for diagnosing ordering flakiness once you have a test that you |
| 12 believe depends on an earlier test running. |
| 13 |
| 14 ### Bisect test ordering |
| 15 |
| 16 1. Run the tests such that the test in question fails. |
| 17 2. Run `./Tools/Scripts/print-test-ordering` and save the output to a file. This |
| 18 outputs the tests run in the order they were run on each content_shell |
| 19 instance. |
| 20 3. Create a file that contains only the tests run on that worker in the same |
| 21 order as in your saved output file. The last line in the file should be the |
| 22 failing test. |
| 23 4. Run |
| 24 `./Tools/Scripts/bisect-test-ordering --test-list=path/to/file/from/step/3` |
| 25 |
| 26 The bisect-test-ordering script should spit out a list of tests at the end that |
| 27 causes the test to fail. |
| 28 |
| 29 *** promo |
| 30 At the moment bisect-test-ordering only allows you to find tests that fail due |
| 31 to a previous test running. It's a small change to the script to make it work |
| 32 for tests that pass due to a previous test running (i.e. to figure out which |
| 33 test it depends on running before it). Contact ojan@chromium if you're |
| 34 interested in adding that feature to the script. |
| 35 *** |
| 36 |
| 37 ### Manual bisect |
| 38 |
| 39 Instead of running `bisect-test-ordering`, you can manually do the work of step |
| 40 4 above. |
| 41 |
| 42 1. `run-webkit-tests --child-processes=1 --order=none --test-list=path/to/file/f
rom/step/3` |
| 43 2. If the test doesn't fail here, then the test itself is probably just flaky. |
| 44 If it does, remove some lines from the file and repeat step 1. Continue |
| 45 repeating until you've found the dependency. If the test fails when run by |
| 46 itself, but passes on the bots, that means that it depends on another test to |
| 47 pass. In this case, you need to generate the list of tests run by |
| 48 `run-webkit-tests --order=natural` and repeat this process to find which test |
| 49 causes the test in question to *pass* (e.g. |
| 50 [crbug.com/262793](https://crbug.com/262793)). |
| 51 3. File a bug and give it the |
| 52 [LayoutTestOrdering](https://crbug.com/?q=label:LayoutTestOrdering) label, |
| 53 e.g. [crbug.com/262787](https://crbug.com/262787) or |
| 54 [crbug.com/262791](https://crbug.com/262791). |
| 55 |
| 56 ### Finding test ordering flakiness |
| 57 |
| 58 #### Run tests in a random order and diagnose failures |
| 59 |
| 60 1. Run `run-webkit-tests --order=random --no-retry` |
| 61 2. Run `./Tools/Scripts/print-test-ordering` and save the output to a file. This |
| 62 outputs the tests run in the order they were run on each content_shell |
| 63 instance. |
| 64 3. Run the diagnosing steps from above to figure out which tests |
| 65 |
| 66 Run `run-webkit-tests --run-singly --no-retry`. This starts up a new |
| 67 content_shell instance for each test. Tests that fail when run in isolation but |
| 68 pass when run as part of the full test suite represent some state that we're not |
| 69 properly resetting between test runs or some state that we're not properly |
| 70 setting when starting up content_shell. You might want to run with |
| 71 `--time-out-ms=60000` to weed out tests that timeout due to waiting on |
| 72 content_shell startup time. |
| 73 |
| 74 #### Diagnose especially flaky tests |
| 75 |
| 76 1. Load |
| 77 https://test-results.appspot.com/dashboards/overview.html#group=%40ToT%20Blin
k&flipCount=12 |
| 78 2. Tweak the flakiness threshold to the desired level of flakiness. |
| 79 3. Click on *webkit_tests* to get that list of flaky tests. |
| 80 4. Diagnose the source of flakiness for that test. |
OLD | NEW |