OLD | NEW |
(Empty) | |
| 1 # Layout Test Expectations and Baselines |
| 2 |
| 3 |
| 4 The primary function of the LayoutTests is as a regression test suite; this |
| 5 means that, while we care about whether a page is being rendered correctly, we |
| 6 care more about whether the page is being rendered the way we expect it to. In |
| 7 other words, we look more for changes in behavior than we do for correctness. |
| 8 |
| 9 [TOC] |
| 10 |
| 11 All layout tests have "expected results", or "baselines", which may be one of |
| 12 several forms. The test may produce one or more of: |
| 13 |
| 14 * A text file containing JavaScript log messages. |
| 15 * A text rendering of the Render Tree. |
| 16 * A screen capture of the rendered page as a PNG file. |
| 17 * WAV files of the audio output, for WebAudio tests. |
| 18 |
| 19 For any of these types of tests, there are files checked into the LayoutTests |
| 20 directory named `-expected.{txt,png,wav}`. Lastly, we also support the concept |
| 21 of "reference tests", which check that two pages are rendered identically |
| 22 (pixel-by-pixel). As long as the two tests' output match, the tests pass. For |
| 23 more on reference tests, see |
| 24 [Writing ref tests](https://trac.webkit.org/wiki/Writing%20Reftests). |
| 25 |
| 26 ## Failing tests |
| 27 |
| 28 When the output doesn't match, there are two potential reasons for it: |
| 29 |
| 30 * The port is performing "correctly", but the output simply won't match the |
| 31 generic version. The usual reason for this is for things like form controls, |
| 32 which are rendered differently on each platform. |
| 33 * The port is performing "incorrectly" (i.e., the test is failing). |
| 34 |
| 35 In both cases, the convention is to check in a new baseline (aka rebaseline), |
| 36 even though that file may be codifying errors. This helps us maintain test |
| 37 coverage for all the other things the test is testing while we resolve the bug. |
| 38 |
| 39 *** promo |
| 40 If a test can be rebaselined, it should always be rebaselined instead of adding |
| 41 lines to TestExpectations. |
| 42 *** |
| 43 |
| 44 Bugs at [crbug.com](https://crbug.com) should track fixing incorrect behavior, |
| 45 not lines in |
| 46 [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations). If a |
| 47 test is never supposed to pass (e.g. it's testing Windows-specific behavior, so |
| 48 can't ever pass on Linux/Mac), move it to the |
| 49 [NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests) file. That |
| 50 gets it out of the way of the rest of the project. |
| 51 |
| 52 There are some cases where you can't rebaseline and, unfortunately, we don't |
| 53 have a better solution than either: |
| 54 |
| 55 1. Reverting the patch that caused the failure, or |
| 56 2. Adding a line to TestExpectations and fixing the bug later. |
| 57 |
| 58 In this case, **reverting the patch is strongly preferred**. |
| 59 |
| 60 These are the cases where you can't rebaseline: |
| 61 |
| 62 * The test is a reference test. |
| 63 * The test gives different output in release and debug; in this case, generate a |
| 64 baseline with the release build, and mark the debug build as expected to fail. |
| 65 * The test is flaky, crashes or times out. |
| 66 * The test is for a feature that hasn't yet shipped on some platforms yet, but |
| 67 will shortly. |
| 68 |
| 69 ## Handling flaky tests |
| 70 |
| 71 The |
| 72 [flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dash
board.html) |
| 73 is a tool for understanding a test’s behavior over time. |
| 74 Originally designed for managing flaky tests, the dashboard shows a timeline |
| 75 view of the test’s behavior over time. The tool may be overwhelming at first, |
| 76 but |
| 77 [the documentation](https://dev.chromium.org/developers/testing/flakiness-dashbo
ard) |
| 78 should help. Once you decide that a test is truly flaky, you can suppress it |
| 79 using the TestExpectations file, as described below. |
| 80 |
| 81 We do not generally expect Chromium sheriffs to spend time trying to address |
| 82 flakiness, though. |
| 83 |
| 84 ## How to rebaseline |
| 85 |
| 86 Since baselines themselves are often platform-specific, updating baselines in |
| 87 general requires fetching new test results after running the test on multiple |
| 88 platforms. |
| 89 |
| 90 ### Rebaselining using try jobs |
| 91 |
| 92 The recommended way to rebaseline for a currently-in-progress CL is to use |
| 93 results from try jobs. To do this: |
| 94 |
| 95 1. Upload a CL with changes in Blink source code or layout tests. |
| 96 2. Trigger Blink try jobs. The bots to use are the release builders on |
| 97 [tryserver.blink](https://build.chromium.org/p/tryserver.blink/builders). |
| 98 This can be done via the code review Web UI or via `git cl try`. |
| 99 3. Wait for all try jobs to finish. |
| 100 4. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-cl` to fetch |
| 101 new baselines. |
| 102 5. Commit the new baselines and upload a new patch. |
| 103 |
| 104 This way, the new baselines can be reviewed along with the changes, which helps |
| 105 the reviewer verify that the new baselines are correct. It also means that there |
| 106 is no period of time when the layout test results are ignored. |
| 107 |
| 108 The tests which `webkit-patch rebaseline-cl` tries to download new baselines for |
| 109 depends on its arguments. |
| 110 |
| 111 * By default, it tries to download all baselines for tests that failed in the |
| 112 try jobs. |
| 113 * If you pass `--only-changed-tests`, then only tests modified in the CL will be |
| 114 considered. |
| 115 * You can also explicitly pass a list of test names, and then just those tests |
| 116 will be rebaselined. |
| 117 |
| 118 ### Rebaselining with rebaseline-o-matic |
| 119 |
| 120 If the test is not already listed in |
| 121 [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations), you |
| 122 can mark it as `[ NeedsRebaseline ]`. The |
| 123 [rebaseline-o-matic bot](https://build.chromium.org/p/chromium.infra.cron/builde
rs/rebaseline-o-matic) |
| 124 will automatically detect when the bots have cycled (by looking at the blame on |
| 125 the file) and do the rebaseline for you. As long as the test doesn't timeout or |
| 126 crash, it won't turn the bots red if it has a `NeedsRebaseline` expectation. |
| 127 When all of the continuous builders on the waterfall have cycled, the |
| 128 rebaseline-o-matic bot will commit a patch which includes the new baselines and |
| 129 removes the `[ NeedsRebaseline ]` entry from TestExpectations. |
| 130 |
| 131 ### Rebaselining manually |
| 132 |
| 133 1. If the tests is already listed in TestExpectations as flaky, mark the test |
| 134 `NeedsManualRebaseline` and comment out the flaky line so that your patch can |
| 135 land without turning the tree red. If the test is not in TestExpectations, |
| 136 you can add a `[ Rebaseline ]` line to TestExpectations. |
| 137 2. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-expectations` |
| 138 3. Post the patch created in step 2 for review. |
| 139 |
| 140 ## Kinds of expectations files |
| 141 |
| 142 * [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations): The |
| 143 main test failure suppression file. In theory, this should be used for flaky |
| 144 lines and `NeedsRebaseline`/`NeedsManualRebaseline` lines. |
| 145 * [ASANExpectations](../../third_party/WebKit/LayoutTests/ASANExpectations): |
| 146 Tests that fail under ASAN. |
| 147 * [LeakExpectations](../../third_party/WebKit/LayoutTests/LeakExpectations): |
| 148 Tests that have memory leaks under the leak checker. |
| 149 * [MSANExpectations](../../third_party/WebKit/LayoutTests/MSANExpectations): |
| 150 Tests that fail under MSAN. |
| 151 * [NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests): Tests |
| 152 that we never intend to fix (e.g. a test for Windows-specific behavior will |
| 153 never be fixed on Linux/Mac). Tests that will never pass on any platform |
| 154 should just be deleted, though. |
| 155 * [SlowTests](../../third_party/WebKit/LayoutTests/SlowTests): Tests that take |
| 156 longer than the usual timeout to run. Slow tests are given 5x the usual |
| 157 timeout. |
| 158 * [SmokeTests](../../third_party/WebKit/LayoutTests/SmokeTests): A small subset |
| 159 of tests that we run on the Android bot. |
| 160 * [StaleTestExpectations](../../third_party/WebKit/LayoutTests/StaleTestExpectat
ions): |
| 161 Platform-specific lines that have been in TestExpectations for many months. |
| 162 They're moved here to get them out of the way of people doing rebaselines |
| 163 since they're clearly not getting fixed anytime soon. |
| 164 * [W3CImportExpectations](../../third_party/WebKit/LayoutTests/W3CImportExpectat
ions): |
| 165 A record of which W3C tests should be imported or skipped. |
| 166 * [WPTServeExpectations](../../third_party/WebKit/LayoutTests/WPTServeExpectatio
ns): |
| 167 Expectations for tests that fail differently when run under the W3C's wptserve |
| 168 HTTP server with the `--enable-wptserve flag`. This is an experimental feature |
| 169 at this time. |
| 170 |
| 171 |
| 172 ### Flag-specific expectations files |
| 173 |
| 174 It is possible to handle tests that only fail when run with a particular flag |
| 175 being passed to `content_shell`. See |
| 176 [LayoutTests/FlagExpectations/README.txt](../../third_party/WebKit/LayoutTests/F
lagExpectations/README.txt) |
| 177 for more. |
| 178 |
| 179 ## Updating the expectations files |
| 180 |
| 181 ### Ordering |
| 182 |
| 183 The file is not ordered. If you put new changes somewhere in the middle of the |
| 184 file, this will reduce the chance of merge conflicts when landing your patch. |
| 185 |
| 186 ### Syntax |
| 187 |
| 188 The syntax of the file is roughly one expectation per line. An expectation can |
| 189 apply to either a directory of tests, or a specific tests. Lines prefixed with |
| 190 `# ` are treated as comments, and blank lines are allowed as well. |
| 191 |
| 192 The syntax of a line is roughly: |
| 193 |
| 194 ``` |
| 195 [ bugs ] [ "[" modifiers "]" ] test_name [ "[" expectations "]" ] |
| 196 ``` |
| 197 |
| 198 * Tokens are separated by whitespace. |
| 199 * **The brackets delimiting the modifiers and expectations from the bugs and the |
| 200 test_name are not optional**; however the modifiers component is optional. In |
| 201 other words, if you want to specify modifiers or expectations, you must |
| 202 enclose them in brackets. |
| 203 * Lines are expected to have one or more bug identifiers, and the linter will |
| 204 complain about lines missing them. Bug identifiers are of the form |
| 205 `crbug.com/12345`, `code.google.com/p/v8/issues/detail?id=12345` or |
| 206 `Bug(username)`. |
| 207 * If no modifiers are specified, the test applies to all of the configurations |
| 208 applicable to that file. |
| 209 * Modifiers can be one or more of `Mac`, `Mac10.9`, `Mac10.10`, `Mac10.11`, |
| 210 `Retina`, `Win`, `Win7`, `Win10`, `Linux`, `Linux32`, `Precise`, `Trusty`, |
| 211 `Android`, `Release`, `Debug`. |
| 212 * Some modifiers are meta keywords, e.g. `Win` represents both `Win7` and |
| 213 `Win10`. See the `CONFIGURATION_SPECIFIER_MACROS` dictionary in |
| 214 [third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py](../../th
ird_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py) |
| 215 for the meta keywords and which modifiers they represent. |
| 216 * Expectations can be one or more of `Crash`, `Failure`, `Pass`, `Rebaseline`, |
| 217 `Slow`, `Skip`, `Timeout`, `WontFix`, `Missing`, `NeedsRebaseline`, |
| 218 `NeedsManualRebaseline`. If multiple expectations are listed, the test is |
| 219 considered "flaky" and any of those results will be considered as expected. |
| 220 |
| 221 For example: |
| 222 |
| 223 ``` |
| 224 crbug.com/12345 [ Win Debug ] fast/html/keygen.html [ Crash ] |
| 225 ``` |
| 226 |
| 227 which indicates that the "fast/html/keygen.html" test file is expected to crash |
| 228 when run in the Debug configuration on Windows, and the tracking bug for this |
| 229 crash is bug \#12345 in the [Chromium issue tracker](https://crbug.com). Note |
| 230 that the test will still be run, so that we can notice if it doesn't actually |
| 231 crash. |
| 232 |
| 233 Assuming you're running a debug build on Mac 10.9, the following lines are all |
| 234 equivalent (in terms of whether the test is performed and its expected outcome): |
| 235 |
| 236 ``` |
| 237 fast/html/keygen.html [ Skip ] |
| 238 fast/html/keygen.html [ WontFix ] |
| 239 Bug(darin) [ Mac10.9 Debug ] fast/html/keygen.html [ Skip ] |
| 240 ``` |
| 241 |
| 242 ### Semantics |
| 243 |
| 244 * `WontFix` implies `Skip` and also indicates that we don't have any plans to |
| 245 make the test pass. |
| 246 * `WontFix` lines always go in the |
| 247 [NeverFixTests file]((../../third_party/WebKit/LayoutTests/NeverFixTests) as |
| 248 we never intend to fix them. These are just for tests that only apply to some |
| 249 subset of the platforms we support. |
| 250 * `WontFix` and `Skip` must be used by themselves and cannot be specified |
| 251 alongside `Crash` or another expectation keyword. |
| 252 * `Slow` causes the test runner to give the test 5x the usual time limit to run. |
| 253 `Slow` lines go in the |
| 254 [SlowTests file ](../../third_party/WebKit/LayoutTests/SlowTests). A given |
| 255 line cannot have both Slow and Timeout. |
| 256 |
| 257 Also, when parsing the file, we use two rules to figure out if an expectation |
| 258 line applies to the current run: |
| 259 |
| 260 1. If the configuration parameters don't match the configuration of the current |
| 261 run, the expectation is ignored. |
| 262 2. Expectations that match more of a test name are used before expectations that |
| 263 match less of a test name. |
| 264 |
| 265 For example, if you had the following lines in your file, and you were running a |
| 266 debug build on `Mac10.10`: |
| 267 |
| 268 ``` |
| 269 crbug.com/12345 [ Mac10.10 ] fast/html [ Failure ] |
| 270 crbug.com/12345 [ Mac10.10 ] fast/html/keygen.html [ Pass ] |
| 271 crbug.com/12345 [ Win7 ] fast/forms/submit.html [ Failure ] |
| 272 crbug.com/12345 fast/html/section-element.html [ Failure Crash ] |
| 273 ``` |
| 274 |
| 275 You would expect: |
| 276 |
| 277 * `fast/html/article-element.html` to fail with a text diff (since it is in the |
| 278 fast/html directory). |
| 279 * `fast/html/keygen.html` to pass (since the exact match on the test name). |
| 280 * `fast/html/submit.html` to pass (since the configuration parameters don't |
| 281 match). |
| 282 * `fast/html/section-element.html` to either crash or produce a text (or image |
| 283 and text) failure, but not time out or pass. |
| 284 |
| 285 *** promo |
| 286 Duplicate expectations are not allowed within the file and will generate |
| 287 warnings. |
| 288 *** |
| 289 |
| 290 You can verify that any changes you've made to an expectations file are correct |
| 291 by running: |
| 292 |
| 293 ```bash |
| 294 third_party/WebKit/Tools/Scripts/lint-test-expectations |
| 295 ``` |
| 296 |
| 297 which will cycle through all of the possible combinations of configurations |
| 298 looking for problems. |
OLD | NEW |