docs/testing/layout_test_expectations.md - Issue 2488463004: Move sub-pages of "Layout Tests" from Google Sites to Markdown.

Unified Diff: docs/testing/layout_test_expectations.md

Issue 2488463004: Move sub-pages of "Layout Tests" from Google Sites to Markdown. (Closed)

Patch Set: Created 4 years, 1 month ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: docs/testing/layout_test_expectations.md

diff --git a/docs/testing/layout_test_expectations.md b/docs/testing/layout_test_expectations.md

new file mode 100644

index 0000000000000000000000000000000000000000..746cadbc784538e42fd5fcd26bbecd5a35af2ed0

--- /dev/null

+++ b/docs/testing/layout_test_expectations.md

@@ -0,0 +1,298 @@

+# Layout Test Expectations and Baselines

+The primary function of the LayoutTests is as a regression test suite; this

+means that, while we care about whether a page is being rendered correctly, we

+care more about whether the page is being rendered the way we expect it to. In

+other words, we look more for changes in behavior than we do for correctness.

+[TOC]

+All layout tests have "expected results", or "baselines", which may be one of

+several forms. The test may produce one or more of:

+* A text file containing JavaScript log messages.

+* A text rendering of the Render Tree.

+* A screen capture of the rendered page as a PNG file.

+* WAV files of the audio output, for WebAudio tests.

+For any of these types of tests, there are files checked into the LayoutTests

+directory named `-expected.{txt,png,wav}`. Lastly, we also support the concept

+of "reference tests", which check that two pages are rendered identically

+(pixel-by-pixel). As long as the two tests' output match, the tests pass. For

+more on reference tests, see

+[Writing ref tests](https://trac.webkit.org/wiki/Writing%20Reftests).

+## Failing tests

+When the output doesn't match, there are two potential reasons for it:

+* The port is performing "correctly", but the output simply won't match the

+ generic version. The usual reason for this is for things like form controls,

+ which are rendered differently on each platform.

+* The port is performing "incorrectly" (i.e., the test is failing).

+In both cases, the convention is to check in a new baseline (aka rebaseline),

+even though that file may be codifying errors. This helps us maintain test

+coverage for all the other things the test is testing while we resolve the bug.

+*** promo

+If a test can be rebaselined, it should always be rebaselined instead of adding

+lines to TestExpectations.

+***

+Bugs at [crbug.com](https://crbug.com) should track fixing incorrect behavior,

+not lines in

+[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations). If a

+test is never supposed to pass (e.g. it's testing Windows-specific behavior, so

+can't ever pass on Linux/Mac), move it to the

+[NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests) file. That

+gets it out of the way of the rest of the project.

+There are some cases where you can't rebaseline and, unfortunately, we don't

+have a better solution than either:

+1. Reverting the patch that caused the failure, or

+2. Adding a line to TestExpectations and fixing the bug later.

+In this case, **reverting the patch is strongly preferred**.

+These are the cases where you can't rebaseline:

+* The test is a reference test.

+* The test gives different output in release and debug; in this case, generate a

+ baseline with the release build, and mark the debug build as expected to fail.

+* The test is flaky, crashes or times out.

+* The test is for a feature that hasn't yet shipped on some platforms yet, but

+ will shortly.

+## Handling flaky tests

+The

+[flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dashboard.html)

+is a tool for understanding a test’s behavior over time.

+Originally designed for managing flaky tests, the dashboard shows a timeline

+view of the test’s behavior over time. The tool may be overwhelming at first,

+but

+[the documentation](https://dev.chromium.org/developers/testing/flakiness-dashboard)

+should help. Once you decide that a test is truly flaky, you can suppress it

+using the TestExpectations file, as described below.

+We do not generally expect Chromium sheriffs to spend time trying to address

+flakiness, though.

+## How to rebaseline

+Since baselines themselves are often platform-specific, updating baselines in

+general requires fetching new test results after running the test on multiple

+platforms.

+### Rebaselining using try jobs

+The recommended way to rebaseline for a currently-in-progress CL is to use

+results from try jobs. To do this:

+1. Upload a CL with changes in Blink source code or layout tests.

+2. Trigger Blink try jobs. The bots to use are the release builders on

+ [tryserver.blink](https://build.chromium.org/p/tryserver.blink/builders).

+ This can be done via the code review Web UI or via `git cl try`.

+3. Wait for all try jobs to finish.

+4. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-cl` to fetch

+ new baselines.

+5. Commit the new baselines and upload a new patch.

+This way, the new baselines can be reviewed along with the changes, which helps

+the reviewer verify that the new baselines are correct. It also means that there

+is no period of time when the layout test results are ignored.

+The tests which `webkit-patch rebaseline-cl` tries to download new baselines for

+depends on its arguments.

+* By default, it tries to download all baselines for tests that failed in the

+ try jobs.

+* If you pass `--only-changed-tests`, then only tests modified in the CL will be

+ considered.

+* You can also explicitly pass a list of test names, and then just those tests

+ will be rebaselined.

+### Rebaselining with rebaseline-o-matic

+If the test is not already listed in

+[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations), you

+can mark it as `[ NeedsRebaseline ]`. The

+[rebaseline-o-matic bot](https://build.chromium.org/p/chromium.infra.cron/builders/rebaseline-o-matic)

+will automatically detect when the bots have cycled (by looking at the blame on

+the file) and do the rebaseline for you. As long as the test doesn't timeout or

+crash, it won't turn the bots red if it has a `NeedsRebaseline` expectation.

+When all of the continuous builders on the waterfall have cycled, the

+rebaseline-o-matic bot will commit a patch which includes the new baselines and

+removes the `[ NeedsRebaseline ]` entry from TestExpectations.

+### Rebaselining manually

+1. If the tests is already listed in TestExpectations as flaky, mark the test

+ `NeedsManualRebaseline` and comment out the flaky line so that your patch can

+ land without turning the tree red. If the test is not in TestExpectations,

+ you can add a `[ Rebaseline ]` line to TestExpectations.

+2. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-expectations`

+3. Post the patch created in step 2 for review.

+## Kinds of expectations files

+* [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations): The

+ main test failure suppression file. In theory, this should be used for flaky

+ lines and `NeedsRebaseline`/`NeedsManualRebaseline` lines.

+* [ASANExpectations](../../third_party/WebKit/LayoutTests/ASANExpectations):

+ Tests that fail under ASAN.

+* [LeakExpectations](../../third_party/WebKit/LayoutTests/LeakExpectations):

+ Tests that have memory leaks under the leak checker.

+* [MSANExpectations](../../third_party/WebKit/LayoutTests/MSANExpectations):

+ Tests that fail under MSAN.

+* [NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests): Tests

+ that we never intend to fix (e.g. a test for Windows-specific behavior will

+ never be fixed on Linux/Mac). Tests that will never pass on any platform

+ should just be deleted, though.

+* [SlowTests](../../third_party/WebKit/LayoutTests/SlowTests): Tests that take

+ longer than the usual timeout to run. Slow tests are given 5x the usual

+ timeout.

+* [SmokeTests](../../third_party/WebKit/LayoutTests/SmokeTests): A small subset

+ of tests that we run on the Android bot.

+* [StaleTestExpectations](../../third_party/WebKit/LayoutTests/StaleTestExpectations):

+ Platform-specific lines that have been in TestExpectations for many months.

+ They're moved here to get them out of the way of people doing rebaselines

+ since they're clearly not getting fixed anytime soon.

+* [W3CImportExpectations](../../third_party/WebKit/LayoutTests/W3CImportExpectations):

+ A record of which W3C tests should be imported or skipped.

+* [WPTServeExpectations](../../third_party/WebKit/LayoutTests/WPTServeExpectations):

+ Expectations for tests that fail differently when run under the W3C's wptserve

+ HTTP server with the `--enable-wptserve flag`. This is an experimental feature

+ at this time.

+### Flag-specific expectations files

+It is possible to handle tests that only fail when run with a particular flag

+being passed to `content_shell`. See

+[LayoutTests/FlagExpectations/README.txt](../../third_party/WebKit/LayoutTests/FlagExpectations/README.txt)

+for more.

+## Updating the expectations files

+### Ordering

+The file is not ordered. If you put new changes somewhere in the middle of the

+file, this will reduce the chance of merge conflicts when landing your patch.

+### Syntax

+The syntax of the file is roughly one expectation per line. An expectation can

+apply to either a directory of tests, or a specific tests. Lines prefixed with

+`# ` are treated as comments, and blank lines are allowed as well.

+The syntax of a line is roughly:

+```

+[ bugs ] [ "[" modifiers "]" ] test_name [ "[" expectations "]" ]

+```

+* Tokens are separated by whitespace.

+* **The brackets delimiting the modifiers and expectations from the bugs and the

+ test_name are not optional**; however the modifiers component is optional. In

+ other words, if you want to specify modifiers or expectations, you must

+ enclose them in brackets.

+* Lines are expected to have one or more bug identifiers, and the linter will

+ complain about lines missing them. Bug identifiers are of the form

+ `crbug.com/12345`, `code.google.com/p/v8/issues/detail?id=12345` or

+ `Bug(username)`.

+* If no modifiers are specified, the test applies to all of the configurations

+ applicable to that file.

+* Modifiers can be one or more of `Mac`, `Mac10.9`, `Mac10.10`, `Mac10.11`,

+ `Retina`, `Win`, `Win7`, `Win10`, `Linux`, `Linux32`, `Precise`, `Trusty`,

+ `Android`, `Release`, `Debug`.

+* Some modifiers are meta keywords, e.g. `Win` represents both `Win7` and

+ `Win10`. See the `CONFIGURATION_SPECIFIER_MACROS` dictionary in

+ [third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py](../../third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py)

+ for the meta keywords and which modifiers they represent.

+* Expectations can be one or more of `Crash`, `Failure`, `Pass`, `Rebaseline`,

+ `Slow`, `Skip`, `Timeout`, `WontFix`, `Missing`, `NeedsRebaseline`,

+ `NeedsManualRebaseline`. If multiple expectations are listed, the test is

+ considered "flaky" and any of those results will be considered as expected.

+For example:

+```

+crbug.com/12345 [ Win Debug ] fast/html/keygen.html [ Crash ]

+```

+which indicates that the "fast/html/keygen.html" test file is expected to crash

+when run in the Debug configuration on Windows, and the tracking bug for this

+crash is bug \#12345 in the [Chromium issue tracker](https://crbug.com). Note

+that the test will still be run, so that we can notice if it doesn't actually

+crash.

+Assuming you're running a debug build on Mac 10.9, the following lines are all

+equivalent (in terms of whether the test is performed and its expected outcome):

+```

+fast/html/keygen.html [ Skip ]

+fast/html/keygen.html [ WontFix ]

+Bug(darin) [ Mac10.9 Debug ] fast/html/keygen.html [ Skip ]

+```

+### Semantics

+* `WontFix` implies `Skip` and also indicates that we don't have any plans to

+ make the test pass.

+* `WontFix` lines always go in the

+ [NeverFixTests file]((../../third_party/WebKit/LayoutTests/NeverFixTests) as

+ we never intend to fix them. These are just for tests that only apply to some

+ subset of the platforms we support.

+* `WontFix` and `Skip` must be used by themselves and cannot be specified

+ alongside `Crash` or another expectation keyword.

+* `Slow` causes the test runner to give the test 5x the usual time limit to run.

+ `Slow` lines go in the

+ [SlowTests file ](../../third_party/WebKit/LayoutTests/SlowTests). A given

+ line cannot have both Slow and Timeout.

+Also, when parsing the file, we use two rules to figure out if an expectation

+line applies to the current run:

+1. If the configuration parameters don't match the configuration of the current

+ run, the expectation is ignored.

+2. Expectations that match more of a test name are used before expectations that

+ match less of a test name.

+For example, if you had the following lines in your file, and you were running a

+debug build on `Mac10.10`:

+```

+crbug.com/12345 [ Mac10.10 ] fast/html [ Failure ]

+crbug.com/12345 [ Mac10.10 ] fast/html/keygen.html [ Pass ]

+crbug.com/12345 [ Win7 ] fast/forms/submit.html [ Failure ]

+crbug.com/12345 fast/html/section-element.html [ Failure Crash ]

+```

+You would expect:

+* `fast/html/article-element.html` to fail with a text diff (since it is in the

+ fast/html directory).

+* `fast/html/keygen.html` to pass (since the exact match on the test name).

+* `fast/html/submit.html` to pass (since the configuration parameters don't

+ match).

+* `fast/html/section-element.html` to either crash or produce a text (or image

+ and text) failure, but not time out or pass.

+*** promo

+Duplicate expectations are not allowed within the file and will generate

+warnings.

+***

+You can verify that any changes you've made to an expectations file are correct

+by running:

+```bash

+third_party/WebKit/Tools/Scripts/lint-test-expectations

+```

+which will cycle through all of the possible combinations of configurations

+looking for problems.

« no previous file with comments | « docs/testing/identifying_tests_that_depend_on_order.md ('k') | docs/testing/layout_tests.md » ('j') | no next file with comments »