| Index: docs/testing/layout_test_expectations.md
|
| diff --git a/docs/testing/layout_test_expectations.md b/docs/testing/layout_test_expectations.md
|
| new file mode 100644
|
| index 0000000000000000000000000000000000000000..746cadbc784538e42fd5fcd26bbecd5a35af2ed0
|
| --- /dev/null
|
| +++ b/docs/testing/layout_test_expectations.md
|
| @@ -0,0 +1,298 @@
|
| +# Layout Test Expectations and Baselines
|
| +
|
| +
|
| +The primary function of the LayoutTests is as a regression test suite; this
|
| +means that, while we care about whether a page is being rendered correctly, we
|
| +care more about whether the page is being rendered the way we expect it to. In
|
| +other words, we look more for changes in behavior than we do for correctness.
|
| +
|
| +[TOC]
|
| +
|
| +All layout tests have "expected results", or "baselines", which may be one of
|
| +several forms. The test may produce one or more of:
|
| +
|
| +* A text file containing JavaScript log messages.
|
| +* A text rendering of the Render Tree.
|
| +* A screen capture of the rendered page as a PNG file.
|
| +* WAV files of the audio output, for WebAudio tests.
|
| +
|
| +For any of these types of tests, there are files checked into the LayoutTests
|
| +directory named `-expected.{txt,png,wav}`. Lastly, we also support the concept
|
| +of "reference tests", which check that two pages are rendered identically
|
| +(pixel-by-pixel). As long as the two tests' output match, the tests pass. For
|
| +more on reference tests, see
|
| +[Writing ref tests](https://trac.webkit.org/wiki/Writing%20Reftests).
|
| +
|
| +## Failing tests
|
| +
|
| +When the output doesn't match, there are two potential reasons for it:
|
| +
|
| +* The port is performing "correctly", but the output simply won't match the
|
| + generic version. The usual reason for this is for things like form controls,
|
| + which are rendered differently on each platform.
|
| +* The port is performing "incorrectly" (i.e., the test is failing).
|
| +
|
| +In both cases, the convention is to check in a new baseline (aka rebaseline),
|
| +even though that file may be codifying errors. This helps us maintain test
|
| +coverage for all the other things the test is testing while we resolve the bug.
|
| +
|
| +*** promo
|
| +If a test can be rebaselined, it should always be rebaselined instead of adding
|
| +lines to TestExpectations.
|
| +***
|
| +
|
| +Bugs at [crbug.com](https://crbug.com) should track fixing incorrect behavior,
|
| +not lines in
|
| +[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations). If a
|
| +test is never supposed to pass (e.g. it's testing Windows-specific behavior, so
|
| +can't ever pass on Linux/Mac), move it to the
|
| +[NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests) file. That
|
| +gets it out of the way of the rest of the project.
|
| +
|
| +There are some cases where you can't rebaseline and, unfortunately, we don't
|
| +have a better solution than either:
|
| +
|
| +1. Reverting the patch that caused the failure, or
|
| +2. Adding a line to TestExpectations and fixing the bug later.
|
| +
|
| +In this case, **reverting the patch is strongly preferred**.
|
| +
|
| +These are the cases where you can't rebaseline:
|
| +
|
| +* The test is a reference test.
|
| +* The test gives different output in release and debug; in this case, generate a
|
| + baseline with the release build, and mark the debug build as expected to fail.
|
| +* The test is flaky, crashes or times out.
|
| +* The test is for a feature that hasn't yet shipped on some platforms yet, but
|
| + will shortly.
|
| +
|
| +## Handling flaky tests
|
| +
|
| +The
|
| +[flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dashboard.html)
|
| +is a tool for understanding a test’s behavior over time.
|
| +Originally designed for managing flaky tests, the dashboard shows a timeline
|
| +view of the test’s behavior over time. The tool may be overwhelming at first,
|
| +but
|
| +[the documentation](https://dev.chromium.org/developers/testing/flakiness-dashboard)
|
| +should help. Once you decide that a test is truly flaky, you can suppress it
|
| +using the TestExpectations file, as described below.
|
| +
|
| +We do not generally expect Chromium sheriffs to spend time trying to address
|
| +flakiness, though.
|
| +
|
| +## How to rebaseline
|
| +
|
| +Since baselines themselves are often platform-specific, updating baselines in
|
| +general requires fetching new test results after running the test on multiple
|
| +platforms.
|
| +
|
| +### Rebaselining using try jobs
|
| +
|
| +The recommended way to rebaseline for a currently-in-progress CL is to use
|
| +results from try jobs. To do this:
|
| +
|
| +1. Upload a CL with changes in Blink source code or layout tests.
|
| +2. Trigger Blink try jobs. The bots to use are the release builders on
|
| + [tryserver.blink](https://build.chromium.org/p/tryserver.blink/builders).
|
| + This can be done via the code review Web UI or via `git cl try`.
|
| +3. Wait for all try jobs to finish.
|
| +4. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-cl` to fetch
|
| + new baselines.
|
| +5. Commit the new baselines and upload a new patch.
|
| +
|
| +This way, the new baselines can be reviewed along with the changes, which helps
|
| +the reviewer verify that the new baselines are correct. It also means that there
|
| +is no period of time when the layout test results are ignored.
|
| +
|
| +The tests which `webkit-patch rebaseline-cl` tries to download new baselines for
|
| +depends on its arguments.
|
| +
|
| +* By default, it tries to download all baselines for tests that failed in the
|
| + try jobs.
|
| +* If you pass `--only-changed-tests`, then only tests modified in the CL will be
|
| + considered.
|
| +* You can also explicitly pass a list of test names, and then just those tests
|
| + will be rebaselined.
|
| +
|
| +### Rebaselining with rebaseline-o-matic
|
| +
|
| +If the test is not already listed in
|
| +[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations), you
|
| +can mark it as `[ NeedsRebaseline ]`. The
|
| +[rebaseline-o-matic bot](https://build.chromium.org/p/chromium.infra.cron/builders/rebaseline-o-matic)
|
| +will automatically detect when the bots have cycled (by looking at the blame on
|
| +the file) and do the rebaseline for you. As long as the test doesn't timeout or
|
| +crash, it won't turn the bots red if it has a `NeedsRebaseline` expectation.
|
| +When all of the continuous builders on the waterfall have cycled, the
|
| +rebaseline-o-matic bot will commit a patch which includes the new baselines and
|
| +removes the `[ NeedsRebaseline ]` entry from TestExpectations.
|
| +
|
| +### Rebaselining manually
|
| +
|
| +1. If the tests is already listed in TestExpectations as flaky, mark the test
|
| + `NeedsManualRebaseline` and comment out the flaky line so that your patch can
|
| + land without turning the tree red. If the test is not in TestExpectations,
|
| + you can add a `[ Rebaseline ]` line to TestExpectations.
|
| +2. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-expectations`
|
| +3. Post the patch created in step 2 for review.
|
| +
|
| +## Kinds of expectations files
|
| +
|
| +* [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations): The
|
| + main test failure suppression file. In theory, this should be used for flaky
|
| + lines and `NeedsRebaseline`/`NeedsManualRebaseline` lines.
|
| +* [ASANExpectations](../../third_party/WebKit/LayoutTests/ASANExpectations):
|
| + Tests that fail under ASAN.
|
| +* [LeakExpectations](../../third_party/WebKit/LayoutTests/LeakExpectations):
|
| + Tests that have memory leaks under the leak checker.
|
| +* [MSANExpectations](../../third_party/WebKit/LayoutTests/MSANExpectations):
|
| + Tests that fail under MSAN.
|
| +* [NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests): Tests
|
| + that we never intend to fix (e.g. a test for Windows-specific behavior will
|
| + never be fixed on Linux/Mac). Tests that will never pass on any platform
|
| + should just be deleted, though.
|
| +* [SlowTests](../../third_party/WebKit/LayoutTests/SlowTests): Tests that take
|
| + longer than the usual timeout to run. Slow tests are given 5x the usual
|
| + timeout.
|
| +* [SmokeTests](../../third_party/WebKit/LayoutTests/SmokeTests): A small subset
|
| + of tests that we run on the Android bot.
|
| +* [StaleTestExpectations](../../third_party/WebKit/LayoutTests/StaleTestExpectations):
|
| + Platform-specific lines that have been in TestExpectations for many months.
|
| + They're moved here to get them out of the way of people doing rebaselines
|
| + since they're clearly not getting fixed anytime soon.
|
| +* [W3CImportExpectations](../../third_party/WebKit/LayoutTests/W3CImportExpectations):
|
| + A record of which W3C tests should be imported or skipped.
|
| +* [WPTServeExpectations](../../third_party/WebKit/LayoutTests/WPTServeExpectations):
|
| + Expectations for tests that fail differently when run under the W3C's wptserve
|
| + HTTP server with the `--enable-wptserve flag`. This is an experimental feature
|
| + at this time.
|
| +
|
| +
|
| +### Flag-specific expectations files
|
| +
|
| +It is possible to handle tests that only fail when run with a particular flag
|
| +being passed to `content_shell`. See
|
| +[LayoutTests/FlagExpectations/README.txt](../../third_party/WebKit/LayoutTests/FlagExpectations/README.txt)
|
| +for more.
|
| +
|
| +## Updating the expectations files
|
| +
|
| +### Ordering
|
| +
|
| +The file is not ordered. If you put new changes somewhere in the middle of the
|
| +file, this will reduce the chance of merge conflicts when landing your patch.
|
| +
|
| +### Syntax
|
| +
|
| +The syntax of the file is roughly one expectation per line. An expectation can
|
| +apply to either a directory of tests, or a specific tests. Lines prefixed with
|
| +`# ` are treated as comments, and blank lines are allowed as well.
|
| +
|
| +The syntax of a line is roughly:
|
| +
|
| +```
|
| +[ bugs ] [ "[" modifiers "]" ] test_name [ "[" expectations "]" ]
|
| +```
|
| +
|
| +* Tokens are separated by whitespace.
|
| +* **The brackets delimiting the modifiers and expectations from the bugs and the
|
| + test_name are not optional**; however the modifiers component is optional. In
|
| + other words, if you want to specify modifiers or expectations, you must
|
| + enclose them in brackets.
|
| +* Lines are expected to have one or more bug identifiers, and the linter will
|
| + complain about lines missing them. Bug identifiers are of the form
|
| + `crbug.com/12345`, `code.google.com/p/v8/issues/detail?id=12345` or
|
| + `Bug(username)`.
|
| +* If no modifiers are specified, the test applies to all of the configurations
|
| + applicable to that file.
|
| +* Modifiers can be one or more of `Mac`, `Mac10.9`, `Mac10.10`, `Mac10.11`,
|
| + `Retina`, `Win`, `Win7`, `Win10`, `Linux`, `Linux32`, `Precise`, `Trusty`,
|
| + `Android`, `Release`, `Debug`.
|
| +* Some modifiers are meta keywords, e.g. `Win` represents both `Win7` and
|
| + `Win10`. See the `CONFIGURATION_SPECIFIER_MACROS` dictionary in
|
| + [third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py](../../third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py)
|
| + for the meta keywords and which modifiers they represent.
|
| +* Expectations can be one or more of `Crash`, `Failure`, `Pass`, `Rebaseline`,
|
| + `Slow`, `Skip`, `Timeout`, `WontFix`, `Missing`, `NeedsRebaseline`,
|
| + `NeedsManualRebaseline`. If multiple expectations are listed, the test is
|
| + considered "flaky" and any of those results will be considered as expected.
|
| +
|
| +For example:
|
| +
|
| +```
|
| +crbug.com/12345 [ Win Debug ] fast/html/keygen.html [ Crash ]
|
| +```
|
| +
|
| +which indicates that the "fast/html/keygen.html" test file is expected to crash
|
| +when run in the Debug configuration on Windows, and the tracking bug for this
|
| +crash is bug \#12345 in the [Chromium issue tracker](https://crbug.com). Note
|
| +that the test will still be run, so that we can notice if it doesn't actually
|
| +crash.
|
| +
|
| +Assuming you're running a debug build on Mac 10.9, the following lines are all
|
| +equivalent (in terms of whether the test is performed and its expected outcome):
|
| +
|
| +```
|
| +fast/html/keygen.html [ Skip ]
|
| +fast/html/keygen.html [ WontFix ]
|
| +Bug(darin) [ Mac10.9 Debug ] fast/html/keygen.html [ Skip ]
|
| +```
|
| +
|
| +### Semantics
|
| +
|
| +* `WontFix` implies `Skip` and also indicates that we don't have any plans to
|
| + make the test pass.
|
| +* `WontFix` lines always go in the
|
| + [NeverFixTests file]((../../third_party/WebKit/LayoutTests/NeverFixTests) as
|
| + we never intend to fix them. These are just for tests that only apply to some
|
| + subset of the platforms we support.
|
| +* `WontFix` and `Skip` must be used by themselves and cannot be specified
|
| + alongside `Crash` or another expectation keyword.
|
| +* `Slow` causes the test runner to give the test 5x the usual time limit to run.
|
| + `Slow` lines go in the
|
| + [SlowTests file ](../../third_party/WebKit/LayoutTests/SlowTests). A given
|
| + line cannot have both Slow and Timeout.
|
| +
|
| +Also, when parsing the file, we use two rules to figure out if an expectation
|
| +line applies to the current run:
|
| +
|
| +1. If the configuration parameters don't match the configuration of the current
|
| + run, the expectation is ignored.
|
| +2. Expectations that match more of a test name are used before expectations that
|
| + match less of a test name.
|
| +
|
| +For example, if you had the following lines in your file, and you were running a
|
| +debug build on `Mac10.10`:
|
| +
|
| +```
|
| +crbug.com/12345 [ Mac10.10 ] fast/html [ Failure ]
|
| +crbug.com/12345 [ Mac10.10 ] fast/html/keygen.html [ Pass ]
|
| +crbug.com/12345 [ Win7 ] fast/forms/submit.html [ Failure ]
|
| +crbug.com/12345 fast/html/section-element.html [ Failure Crash ]
|
| +```
|
| +
|
| +You would expect:
|
| +
|
| +* `fast/html/article-element.html` to fail with a text diff (since it is in the
|
| + fast/html directory).
|
| +* `fast/html/keygen.html` to pass (since the exact match on the test name).
|
| +* `fast/html/submit.html` to pass (since the configuration parameters don't
|
| + match).
|
| +* `fast/html/section-element.html` to either crash or produce a text (or image
|
| + and text) failure, but not time out or pass.
|
| +
|
| +*** promo
|
| +Duplicate expectations are not allowed within the file and will generate
|
| +warnings.
|
| +***
|
| +
|
| +You can verify that any changes you've made to an expectations file are correct
|
| +by running:
|
| +
|
| +```bash
|
| +third_party/WebKit/Tools/Scripts/lint-test-expectations
|
| +```
|
| +
|
| +which will cycle through all of the possible combinations of configurations
|
| +looking for problems.
|
|
|