Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(73)

Side by Side Diff: tools/perf/docs/perf_bot_sheriffing.md

Issue 2611183005: Add "Useful Logs and Debugging Info" section to perf sheriff doc. (Closed)
Patch Set: Created 3 years, 11 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # Perf Bot Sheriffing 1 # Perf Bot Sheriffing
2 2
3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf 3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf
4 waterfall up and running, and triaging performance test failures and flakes. 4 waterfall up and running, and triaging performance test failures and flakes.
5 5
6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** 6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)**
7 7
8 ## Key Responsibilities 8 ## Key Responsibilities
9 9
10 * [Handle Device and Bot Failures](#Handle-Device-and-Bot-Failures) 10 * [Handle Device and Bot Failures](#Handle-Device-and-Bot-Failures)
(...skipping 188 matching lines...) Expand 10 before | Expand all | Expand 10 after
199 * The revision range the test occurred on. 199 * The revision range the test occurred on.
200 * A list of all platforms the test fails on. 200 * A list of all platforms the test fails on.
201 2. Disable the failing test if it is failing more than one out of five runs. 201 2. Disable the failing test if it is failing more than one out of five runs.
202 (see below for instructions on telemetry and other types of tests). Make 202 (see below for instructions on telemetry and other types of tests). Make
203 sure your disable cl includes a BUG= line with the bug from step 1 and the 203 sure your disable cl includes a BUG= line with the bug from step 1 and the
204 test owner is cc-ed on the bug. 204 test owner is cc-ed on the bug.
205 3. After the disable CL lands, you can downgrade the priority to Pri-2 and 205 3. After the disable CL lands, you can downgrade the priority to Pri-2 and
206 ensure that the bug title reflects something like "Fix and re-enable 206 ensure that the bug title reflects something like "Fix and re-enable
207 testname". 207 testname".
208 4. Investigate the failure. Some tips for investigating: 208 4. Investigate the failure. Some tips for investigating:
209 * When viewing buildbot step logs, **use the **<font color="blue">[stdout] </font>** link to view logs!**.
210 This will link to logdog logs which do not expire. Do not use or link
211 to the logs found through the <font color="blue">stdio</font> link
212 whenever possible as these logs will expire.
213 * When investigating Android, look for the logcat which is uploaded to
214 Google Storage at the end of the run. logcat will contain much more
215 detailed Android device and crash info than will be found in
216 Telemetry logs.
217 * If it's a non flaky failure, indentify the first failed 209 * If it's a non flaky failure, indentify the first failed
218 build so you can narrow down the range of CLs that causes the failure. 210 build so you can narrow down the range of CLs that causes the failure.
219 You can use the 211 You can use the
220 [diagnose_test_failure](https://code.google.com/p/chromium/codesearch#ch romium/src/tools/perf/diagnose_test_failure) 212 [diagnose_test_failure](https://code.google.com/p/chromium/codesearch#ch romium/src/tools/perf/diagnose_test_failure)
221 script to automatically find the first failed build and the good & bad 213 script to automatically find the first failed build and the good & bad
222 revisions (which can also be used for return code bisect). 214 revisions (which can also be used for return code bisect).
223 * If you suspect a specific CL in the range, you can revert it locally and 215 * If you suspect a specific CL in the range, you can revert it locally and
224 run the test on the 216 run the test on the
225 [perf trybots](https://www.chromium.org/developers/telemetry/performance -try-bots). 217 [perf trybots](https://www.chromium.org/developers/telemetry/performance -try-bots).
226 * You can run a return code bisect to narrow down the culprit CL: 218 * You can run a return code bisect to narrow down the culprit CL:
227 1. Open up the graph in the [perf dashboard](https://chromeperf.appspot .com/report) 219 1. Open up the graph in the [perf dashboard](https://chromeperf.appspot .com/report)
228 on one of the failing platforms. 220 on one of the failing platforms.
229 2. Hover over a data point and click the "Bisect" button on the 221 2. Hover over a data point and click the "Bisect" button on the
230 tooltip. 222 tooltip.
231 3. Type the **Bug ID** from step 1, the **Good Revision** the last 223 3. Type the **Bug ID** from step 1, the **Good Revision** the last
232 commit pos data was received from, the **Bad Revision** the last 224 commit pos data was received from, the **Bad Revision** the last
233 commit pos and set **Bisect mode** to `return_code`. 225 commit pos and set **Bisect mode** to `return_code`.
234 * [Debugging telemetry failures](https://www.chromium.org/developers/telem etry/diagnosing-test-failures) 226 * [Debugging telemetry failures](https://www.chromium.org/developers/telem etry/diagnosing-test-failures)
235 * On Android and Mac, you can view platform-level screenshots of the 227 * On Android and Mac, you can view platform-level screenshots of the
236 device screen for failing tests, links to which are printed in the logs. 228 device screen for failing tests, links to which are printed in the logs.
237 Often this will immediately reveal failure causes that are opaque from 229 Often this will immediately reveal failure causes that are opaque from
238 the logs alone. On other platforms, Devtools will produce tab 230 the logs alone. On other platforms, Devtools will produce tab
239 screenshots as long as the tab did not crash. 231 screenshots as long as the tab did not crash.
240 232
233 ### Useful Logs and Debugging Info
234
235 1. **Telemetry test runner logs**
236
237 **_Useful Content:_** Best place to start. These logs contain all of the
238 python logging information from the telemetry test runner scripts.
239
240 **_Where to find:_** These logs can be found from the buildbot build page.
241 Click the _"[stdout]"_ link under any of the telemetry test buildbot steps
242 to view the logs. Do not use the "stdio" link which will show similiar
243 information but will expire earilier and be slower to load.
244
245 2. **Android Logcat (Android)**
246
247 **_Useful Content:_** This file contains all Android device logs. All
248 Android apps and the Android system will log information to logcat. Good
249 place to look if you believe an issue is device related
250 (Android out-of-memory problem for example). Additionally, often information
251 about native crashes will be logged to here.
252
253 **_Where to find:_** These logs can be found from the buildbot status page.
254 Click the _"logcat dump"_ link under one of the _"gsutil upload"_ steps.
255
256 3. **Test Trace (Android)**
257
258 **_Useful Content:_** These logs graphically depict the start/end times for
259 all telemetry tests on all of the devices. This can help determine if test
260 failures were caused by an environmental issue.
261 (see [Cross-Device Failures](#Android-Cross-Device-Failures))
262
263 **_Where to find:_** These logs can be found from the buildbot status page.
264 Click the _"Test Trace"_ link under one of the
265 _"gsutil Upload Test Trace"_ steps.
266
267 4. **Symbolized Stack Traces (Android)**
268
269 **_Useful Content:_** Contains symbolized stack traces of any Chrome or
270 Android crashes.
271
272 **_Where to find_:** These logs can be found from the buildbot status page.
273 The symbolized stack traces can be found under several steps. Click link
274 under _"symbolized breakpad crashes"_ step to see symbolized Chrome crashes.
275 Click link under _"stack tool with logcat dump"_ to see symbolized Android
276 crashes.
277
241 ### Disabling Telemetry Tests 278 ### Disabling Telemetry Tests
242 279
243 If the test is a telemetry test, its name will have a '.' in it, such as 280 If the test is a telemetry test, its name will have a '.' in it, such as
244 `thread_times.key_mobile_sites` or `page_cycler.top_10`. The part before the 281 `thread_times.key_mobile_sites` or `page_cycler.top_10`. The part before the
245 first dot will be a python file in [tools/perf/benchmarks](https://code.google.c om/p/chromium/codesearch#chromium/src/tools/perf/benchmarks/). 282 first dot will be a python file in [tools/perf/benchmarks](https://code.google.c om/p/chromium/codesearch#chromium/src/tools/perf/benchmarks/).
246 283
247 If a telemetry test is failing and there is no clear culprit to revert 284 If a telemetry test is failing and there is no clear culprit to revert
248 immediately, disable the test. You can do this with the `@benchmark.Disabled` 285 immediately, disable the test. You can do this with the `@benchmark.Disabled`
249 decorator. **Always add a comment next to your decorator with the bug id which 286 decorator. **Always add a comment next to your decorator with the bug id which
250 has background on why the test was disabled, and also include a BUG= line in 287 has background on why the test was disabled, and also include a BUG= line in
(...skipping 52 matching lines...) Expand 10 before | Expand all | Expand 10 after
303 340
304 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)** 341 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)**
305 are for disabled tests. These should be pinged weekly, and work towards fixing 342 are for disabled tests. These should be pinged weekly, and work towards fixing
306 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the 343 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the
307 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified). 344 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified).
308 345
309 <!-- Unresolved issues: 346 <!-- Unresolved issues:
310 1. Do perf sheriffs watch the bisect waterfall? 347 1. Do perf sheriffs watch the bisect waterfall?
311 2. Do perf sheriffs watch the internal clank waterfall? 348 2. Do perf sheriffs watch the internal clank waterfall?
312 --> 349 -->
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698