Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(426)

Side by Side Diff: tools/perf/docs/perf_regression_sheriffing.md

Issue 2712383003: Add updated documentation for debugging data stoppage alerts. (Closed)
Patch Set: Actually add documentation Created 3 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | tools/perf/docs/triaging_data_stoppage_alerts.md » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # Perf Regression Sheriffing (go/perfregression-sheriff) 1 # Perf Regression Sheriffing (go/perfregression-sheriff)
2 2
3 The perf regression sheriff tracks performance regressions in Chrome's 3 The perf regression sheriff tracks performance regressions in Chrome's
4 continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md) 4 continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md)
5 has been created to ensure the builds and tests stay green, so the perf 5 has been created to ensure the builds and tests stay green, so the perf
6 regression sheriff role is now entirely focused on performance. 6 regression sheriff role is now entirely focused on performance.
7 7
8 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** 8 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)**
9 9
10 ## Key Responsibilities 10 ## Key Responsibilities
11 11
12 * [Triage Regressions on the Perf Dashboard](#Triage-Regressions-on-the-Perf-Da shboard) 12 * [Triage Regressions on the Perf Dashboard](#Triage-Regressions-on-the-Perf-Da shboard)
13 * [Triaging Data Stoppage Alerts](#Triaging-Data-Stoppage-Alerts) 13 * [Triaging Data Stoppage Alerts](#Triaging-Data-Stoppage-Alerts)
14 * [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions) 14 * [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)
15 * [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure) 15 * [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)
16 16
17 ## Triage Regressions on the Perf Dashboard 17 ## Triage Regressions on the Perf Dashboard
18 18
19 Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts). 19 Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts).
20 20
21 In the upper right corner, **sign in with your Chromium account**. Signing in is 21 In the upper right corner, **sign in with your Chromium account**. Signing in is
22 important in order to be able to kick off bisect jobs, and see data from 22 important in order to be able to kick off bisect jobs, and see data from
23 internal waterfalls. 23 internal waterfalls.
24 24
25 Pick up **Chromium Perf Sheriff** from "Select an item ▼" drop down menu. There 25 Pick up **Chromium Perf Sheriff** from "Select an item ▼" drop down menu. There
26 are two tables of alerts that may be shown: 26 are two tables of alerts that may be shown:
27 27
28 * "Performance Alerts", which you should triage, and 28 * "Performance Alerts"
29 * "Data Stoppage Alerts", which you can ignore. 29 * "Data Stoppage Alerts"
30 30
31 For either type of alert, if there are no currently pending alerts, then the 31 For either type of alert, if there are no currently pending alerts, then the
32 table won't be shown. 32 table won't be shown.
33 33
34 The list can be sorted by clicking on the column header. When you click on the 34 The list can be sorted by clicking on the column header. When you click on the
35 checkbox next to an alert, all the other alerts that occurred in the same 35 checkbox next to an alert, all the other alerts that occurred in the same
36 revision range will be highlighted. 36 revision range will be highlighted.
37 37
38 Check the boxes next to the alerts you want to take a look at, and click the 38 Check the boxes next to the alerts you want to take a look at, and click the
39 "Graph" button. You'll be taken to a page with a table at the top listing all 39 "Graph" button. You'll be taken to a page with a table at the top listing all
(...skipping 39 matching lines...) Expand 10 before | Expand all | Expand 10 after
79 to see a broader revision range feel free to click on the alert on that graph 79 to see a broader revision range feel free to click on the alert on that graph
80 and kick off a bisect for it. There should be capacity to kick off as many 80 and kick off a bisect for it. There should be capacity to kick off as many
81 bisects as you feel are necessary to investigate; [give feedback](#feedback) 81 bisects as you feel are necessary to investigate; [give feedback](#feedback)
82 below if you feel that is not the case. 82 below if you feel that is not the case.
83 83
84 ## Triaging Data Stoppage Alerts 84 ## Triaging Data Stoppage Alerts
85 85
86 Data stoppage alerts are listed on the 86 Data stoppage alerts are listed on the
87 [perf dashboard alerts page](https://chromeperf.appspot.com/alerts). Whenever 87 [perf dashboard alerts page](https://chromeperf.appspot.com/alerts). Whenever
88 the dashboard is monitoring a metric, and that metric stops sending data, an 88 the dashboard is monitoring a metric, and that metric stops sending data, an
89 alert is fired. Some of these alerts are expected: 89 alert is fired. See
90 90 [triaging data stoppage alerts](triaging_data_stoppage_alerts.md) for more
91 * When a telemetry benchmark is disabled, we get a data stoppage alert. 91 details.
92 Check the [code for the benchmark](https://code.google.com/p/chromium/codes earch#chromium/src/tools/perf/benchmarks/)
93 to see if it has been disabled, and if so associate the alert with the
94 bug for the disable.
95 * When a bot has been turned down. These should be announced to
96 perf-sheriffs@chromium.org, but if you can't find the bot on
97 [the waterfall](https://uberchromegw.corp.google.com/i/chromium.perf/) and
98 you didn't see the announcement, double check in the speed infra chat.
99 Ideally these will be associated with the bug for the bot turndown, but
100 it's okay to mark them invalid if you can't find the bug.
101 You can check the
102 [recipe](https://chromium.googlesource.com/chromium/tools/build/+/master/sc ripts/slave/recipe_modules/chromium_tests/chromium_perf.py)
103 to find a corresponding bot name for waterfall with one for dashboard.
104
105 If there doesn't seem to be a valid reason for the alert, file a bug on it
106 using the perf dashboard, and cc [the owner](http://go/perf-owners). Then do
107 some diagnosis:
108
109 * Look at the perf dashboard graph to see the last revision we got data for,
110 and note that in the bug. Click on the `buildbot stdio` link in the tooltip
111 to find the buildbot status page for the last good build, and increment
112 the build number to get the first build with no data, and note that in the
113 bug as well. Check for any changes to the test in the revision range.
114 * Go to the buildbot status page of the bot which should be running the test.
115 Is it running the test? If not, note that in the bug.
116 * If it is running the test and the test is failing, diagnose as a test
117 failure.
118 * If it is running the test and the test is passing, check the `json.output`
119 link on the buildbot status page for the test. This is the data the test
120 sent to the perf dashboard. Are there null values? Sometimes it lists a
121 reason as well. Please put your finding in the bug.
122 92
123 ## Follow up on Performance Regressions 93 ## Follow up on Performance Regressions
124 94
125 During your shift, you should try to follow up on each of the bugs you filed. 95 During your shift, you should try to follow up on each of the bugs you filed.
126 Once you've triaged all the alerts, check to see if the bisects have come back, 96 Once you've triaged all the alerts, check to see if the bisects have come back,
127 or if they failed. If the results came back, and a culprit was found, follow up 97 or if they failed. If the results came back, and a culprit was found, follow up
128 with the CL author. If the bisects failed to update the bug with results, please 98 with the CL author. If the bisects failed to update the bug with results, please
129 file a bug on it (see [feedback](#feedback) links below). 99 file a bug on it (see [feedback](#feedback) links below).
130 100
131 Also during your shift, please spend any spare time driving down bugs from the 101 Also during your shift, please spend any spare time driving down bugs from the
(...skipping 33 matching lines...) Expand 10 before | Expand all | Expand 10 after
165 [go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA3SR 2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0). 135 [go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA3SR 2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0).
166 The team triages these regularly. If you spot a really clear bug (bisect 136 The team triages these regularly. If you spot a really clear bug (bisect
167 job red, bugs not being updated with bisect results) please file it in 137 job red, bugs not being updated with bisect results) please file it in
168 crbug with component `Tests>AutoBisect`. If a bisect problem is blocking a 138 crbug with component `Tests>AutoBisect`. If a bisect problem is blocking a
169 perf regression bug triage, **please file a new bug with component 139 perf regression bug triage, **please file a new bug with component
170 `Tests>AutoBisect` and block the regression bug on the bisect bug**. This 140 `Tests>AutoBisect` and block the regression bug on the bisect bug**. This
171 makes it much easier for the team to triage, dupe, and close bugs on the 141 makes it much easier for the team to triage, dupe, and close bugs on the
172 infrastructure without affecting the state of the perf regression bugs. 142 infrastructure without affecting the state of the perf regression bugs.
173 * **Noisy Tests**: Please file a bug in crbug with component `Tests>Telemetry` 143 * **Noisy Tests**: Please file a bug in crbug with component `Tests>Telemetry`
174 and [cc the owner](http://go/perf-owners). 144 and [cc the owner](http://go/perf-owners).
OLDNEW
« no previous file with comments | « no previous file | tools/perf/docs/triaging_data_stoppage_alerts.md » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698