tools/perf/docs/perf_regression_sheriffing.md - Issue 2712383003: Add updated documentation for debugging data stoppage alerts.

Side by Side Diff: tools/perf/docs/perf_regression_sheriffing.md

Issue 2712383003: Add updated documentation for debugging data stoppage alerts. (Closed)

Patch Set: Actually add documentation Created 3 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 # Perf Regression Sheriffing (go/perfregression-sheriff)	1 # Perf Regression Sheriffing (go/perfregression-sheriff)

2	2

3 The perf regression sheriff tracks performance regressions in Chrome's	3 The perf regression sheriff tracks performance regressions in Chrome's

4 continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md)	4 continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md)

5 has been created to ensure the builds and tests stay green, so the perf	5 has been created to ensure the builds and tests stay green, so the perf

6 regression sheriff role is now entirely focused on performance.	6 regression sheriff role is now entirely focused on performance.

7	7

8 [Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)	8 [Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)

9	9

10 ## Key Responsibilities	10 ## Key Responsibilities

11	11

12 * [Triage Regressions on the Perf Dashboard](#Triage-Regressions-on-the-Perf-Da shboard)	12 * [Triage Regressions on the Perf Dashboard](#Triage-Regressions-on-the-Perf-Da shboard)

13 * [Triaging Data Stoppage Alerts](#Triaging-Data-Stoppage-Alerts)	13 * [Triaging Data Stoppage Alerts](#Triaging-Data-Stoppage-Alerts)

14 * [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)	14 * [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)

15 * [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)	15 * [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)

16	16

17 ## Triage Regressions on the Perf Dashboard	17 ## Triage Regressions on the Perf Dashboard

18	18

19 Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts).	19 Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts).

20	20

21 In the upper right corner, sign in with your Chromium account. Signing in is	21 In the upper right corner, sign in with your Chromium account. Signing in is

22 important in order to be able to kick off bisect jobs, and see data from	22 important in order to be able to kick off bisect jobs, and see data from

23 internal waterfalls.	23 internal waterfalls.

24	24

25 Pick up Chromium Perf Sheriff from "Select an item ▼" drop down menu. There	25 Pick up Chromium Perf Sheriff from "Select an item ▼" drop down menu. There

26 are two tables of alerts that may be shown:	26 are two tables of alerts that may be shown:

27	27

28 * "Performance Alerts", which you should triage, and	28 * "Performance Alerts"

29 * "Data Stoppage Alerts", which you can ignore.	29 * "Data Stoppage Alerts"

30	30

31 For either type of alert, if there are no currently pending alerts, then the	31 For either type of alert, if there are no currently pending alerts, then the

32 table won't be shown.	32 table won't be shown.

33	33

34 The list can be sorted by clicking on the column header. When you click on the	34 The list can be sorted by clicking on the column header. When you click on the

35 checkbox next to an alert, all the other alerts that occurred in the same	35 checkbox next to an alert, all the other alerts that occurred in the same

36 revision range will be highlighted.	36 revision range will be highlighted.

37	37

38 Check the boxes next to the alerts you want to take a look at, and click the	38 Check the boxes next to the alerts you want to take a look at, and click the

39 "Graph" button. You'll be taken to a page with a table at the top listing all	39 "Graph" button. You'll be taken to a page with a table at the top listing all

(...skipping 39 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
79 to see a broader revision range feel free to click on the alert on that graph	79 to see a broader revision range feel free to click on the alert on that graph

80 and kick off a bisect for it. There should be capacity to kick off as many	80 and kick off a bisect for it. There should be capacity to kick off as many

81 bisects as you feel are necessary to investigate; [give feedback](#feedback)	81 bisects as you feel are necessary to investigate; [give feedback](#feedback)

82 below if you feel that is not the case.	82 below if you feel that is not the case.

83	83

84 ## Triaging Data Stoppage Alerts	84 ## Triaging Data Stoppage Alerts

85	85

86 Data stoppage alerts are listed on the	86 Data stoppage alerts are listed on the

87 [perf dashboard alerts page](https://chromeperf.appspot.com/alerts). Whenever	87 [perf dashboard alerts page](https://chromeperf.appspot.com/alerts). Whenever

88 the dashboard is monitoring a metric, and that metric stops sending data, an	88 the dashboard is monitoring a metric, and that metric stops sending data, an

89 alert is fired. Some of these alerts are expected:	89 alert is fired. See

90	90 [triaging data stoppage alerts](triaging_data_stoppage_alerts.md) for more

91 * When a telemetry benchmark is disabled, we get a data stoppage alert.	91 details.

92 Check the [code for the benchmark](https://code.google.com/p/chromium/codes earch#chromium/src/tools/perf/benchmarks/)

93 to see if it has been disabled, and if so associate the alert with the

94 bug for the disable.

95 * When a bot has been turned down. These should be announced to

96 perf-sheriffs@chromium.org, but if you can't find the bot on

97 [the waterfall](https://uberchromegw.corp.google.com/i/chromium.perf/) and

98 you didn't see the announcement, double check in the speed infra chat.

99 Ideally these will be associated with the bug for the bot turndown, but

100 it's okay to mark them invalid if you can't find the bug.

101 You can check the

102 [recipe](https://chromium.googlesource.com/chromium/tools/build/+/master/sc ripts/slave/recipe_modules/chromium_tests/chromium_perf.py)

103 to find a corresponding bot name for waterfall with one for dashboard.

104

105 If there doesn't seem to be a valid reason for the alert, file a bug on it

106 using the perf dashboard, and cc [the owner](http://go/perf-owners). Then do

107 some diagnosis:

108

109 * Look at the perf dashboard graph to see the last revision we got data for,

110 and note that in the bug. Click on the `buildbot stdio` link in the tooltip

111 to find the buildbot status page for the last good build, and increment

112 the build number to get the first build with no data, and note that in the

113 bug as well. Check for any changes to the test in the revision range.

114 * Go to the buildbot status page of the bot which should be running the test.

115 Is it running the test? If not, note that in the bug.

116 * If it is running the test and the test is failing, diagnose as a test

117 failure.

118 * If it is running the test and the test is passing, check the `json.output`

119 link on the buildbot status page for the test. This is the data the test

120 sent to the perf dashboard. Are there null values? Sometimes it lists a

121 reason as well. Please put your finding in the bug.

122	92

123 ## Follow up on Performance Regressions	93 ## Follow up on Performance Regressions

124	94

125 During your shift, you should try to follow up on each of the bugs you filed.	95 During your shift, you should try to follow up on each of the bugs you filed.

126 Once you've triaged all the alerts, check to see if the bisects have come back,	96 Once you've triaged all the alerts, check to see if the bisects have come back,

127 or if they failed. If the results came back, and a culprit was found, follow up	97 or if they failed. If the results came back, and a culprit was found, follow up

128 with the CL author. If the bisects failed to update the bug with results, please	98 with the CL author. If the bisects failed to update the bug with results, please

129 file a bug on it (see [feedback](#feedback) links below).	99 file a bug on it (see [feedback](#feedback) links below).

130	100

131 Also during your shift, please spend any spare time driving down bugs from the	101 Also during your shift, please spend any spare time driving down bugs from the

(...skipping 33 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
165 [go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA3SR 2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0).	135 [go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA3SR 2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0).

166 The team triages these regularly. If you spot a really clear bug (bisect	136 The team triages these regularly. If you spot a really clear bug (bisect

167 job red, bugs not being updated with bisect results) please file it in	137 job red, bugs not being updated with bisect results) please file it in

168 crbug with component `Tests>AutoBisect`. If a bisect problem is blocking a	138 crbug with component `Tests>AutoBisect`. If a bisect problem is blocking a

169 perf regression bug triage, **please file a new bug with component	139 perf regression bug triage, **please file a new bug with component

170 `Tests>AutoBisect` and block the regression bug on the bisect bug**. This	140 `Tests>AutoBisect` and block the regression bug on the bisect bug**. This

171 makes it much easier for the team to triage, dupe, and close bugs on the	141 makes it much easier for the team to triage, dupe, and close bugs on the

172 infrastructure without affecting the state of the perf regression bugs.	142 infrastructure without affecting the state of the perf regression bugs.

173 * Noisy Tests: Please file a bug in crbug with component `Tests>Telemetry`	143 * Noisy Tests: Please file a bug in crbug with component `Tests>Telemetry`

174 and [cc the owner](http://go/perf-owners).	144 and [cc the owner](http://go/perf-owners).

OLD	NEW

« no previous file with comments | « no previous file | tools/perf/docs/triaging_data_stoppage_alerts.md » ('j') | no next file with comments »