tools/perf/docs/perf_regression_sheriffing.md - Issue 1511993002: Add initial version of performance regression sheriffing documentation.

Side by Side Diff: tools/perf/docs/perf_regression_sheriffing.md

Issue 1511993002: Add initial version of performance regression sheriffing documentation. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Created 5 years ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Perf Regression Sheriffing

	2

	3 The perf regression sheriff tracks performance regressions in Chrome's

	4 continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md)

	5 has been created to ensure the builds and tests stay green, so the perf

	6 regression sheriff role is now entirely focused on performance.

	7

	8 ## Key Responsibilities

	9

	10 * [Triage Regressions on the Perf Dashboard](#triage)

	11 * [Follow up on Performance Regressions](#followup)

	12 * [Give Feedback on our Infrastructure](#feedback)

	13

	14 ###<a name="triage"></a> Triage Regressions on the Perf Dashboard

	15

	16 Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts).

	17

	18 In the upper right corner, sign in with your Google account. The page shows
	RyanS 2015/12/09 18:58:56 I'd make a bigger deal out of signing in, or at le I'd make a bigger deal out of signing in, or at least explain why we have to do it. I've seen a number of bugs get closed because people didn't realize they weren't signed in, and saw no alerts. sullivan 2015/12/09 20:37:27 Done. Show quoted text On 2015/12/09 18:58:56, RyanS wrote: > I'd make a bigger deal out of signing in, or at least explain why we have to do > it. I've seen a number of bugs get closed because people didn't realize they > weren't signed in, and saw no alerts. Done.
	19 two lists; you are responsible for triaging Performance Alerts. The list

	20 can be sorted by clicking on the column header. When you click on the checkbox

	21 next to an alert, all the other alerts that occurred in the same revision

	22 range will be highlighted.

	23

	24 Check the boxes next to the alerts you want to take a look at, and click the

	25 "Graph" button. You'll be taken to a page with a table at the top listing all

	26 the alerts that have an overlapping revision range with the one you chose, and

	27 below it the dashboard shows graphs of all the alerts checked in that table.

	28

	29 1. Look at the graph.

	30 * If the alert appears to be within the noise, click on the red

	31 exclamation point icon for it in the graph and hit the "Invalid" button.

	32 * If the alert is shifted right or left a little, click on it and use the
	RyanS 2015/12/09 18:58:56 I don't think "shifted" is clear here. Maybe you w I don't think "shifted" is clear here. Maybe you want to say "if the alert is visibly to the left or the right of the actual regression" sullivan 2015/12/09 20:37:27 Thanks, I couldn't think of good wording here :) Show quoted text On 2015/12/09 18:58:56, RyanS wrote: > I don't think "shifted" is clear here. Maybe you want to say "if the alert is > visibly to the left or the right of the actual regression" Thanks, I couldn't think of good wording here :)
	33 "nudge" menu to move it into place.

	34 * If there is a line labeled "ref" on the graph, that is the reference build.

	35 It's an older version of Chrome, used to help us sort out whether a change

	36 to the bot or test might have caused the graph to jump, rather than a real

	37 performance regression. If **the ref build moved at the same time as the

	38 alert**, click on the alert and hit the "Invalid" button.

	39 2. Look at the other alerts in the table to see if any should be grouped tog ether.

	40 Note that the bisect will automatically dupe bugs if it finds they have the

	41 same culprit, so you don't need to be too aggressive about grouping alerts

	42 that might not be related. Some signs alerts should be grouped together:

	43 * If they're all in the same test suite

	44 * If they all regressed the same metric (a lot of commonality in the Test

	45 column)

	46 3. Triage the group of alerts. Check all the alerts you believe are related,

	47 and press the triage button.

	48 * If one of the alerts already has a bug id, click "existing bug" and use

	49 that bug id.

	50 * Otherwise click "new bug". Be sure to see the [test owner](go/perf-owners)
	RyanS 2015/12/09 18:58:55 Do you need to include "http://" on the go link? Do you need to include "http://" on the go link? Also, did you mean cc instead of see? sullivan 2015/12/09 20:37:27 Done. Show quoted text On 2015/12/09 18:58:55, RyanS wrote: > Do you need to include "http://" on the go link? > > Also, did you mean cc instead of see? Done.
	51 on the bug.

	52 4. Look at the revision range for the regression. You can see it in the

	53 tooltip on the graph. If you see any likely culprits, cc the authors on the

	54 bug.

	55 5. Optionally, bisect. The perf dashboard will automatically kick off a
	picksi 2015/12/10 10:50:24 It's not clear to me whether we should lean in fav It's not clear to me whether we should lean in favor of kicking off a bisect for edge cases (i.e. if we suspect it may be noise), or if the extra load on the bisect bots is bad. It might be worth adding a note about how to weigh up the cost/benefit of kicking off bisects (or just say it is always worth kicking off a bisect, [if it is!]). sullivan 2015/12/10 15:25:34 Done. Show quoted text On 2015/12/10 10:50:24, picksi wrote: > It's not clear to me whether we should lean in favor of kicking off a bisect for > edge cases (i.e. if we suspect it may be noise), or if the extra load on the > bisect bots is bad. It might be worth adding a note about how to weigh up the > cost/benefit of kicking off bisects (or just say it is always worth kicking off > a bisect, [if it is!]). Done.
	56 bisect for each bug you file. But if you think the regression is much clearer

	57 on one platform, or a specific page of a page set, feel free to click on the

	58 alert on that graph and kick off a bisect for it.

	59

	60 ###<a name="followup"></a> Follow up on Performance Regressions
	RyanS 2015/12/09 18:58:56 I think it would be good to include a first paragr I think it would be good to include a first paragraph in this section about following-up on performance regressions during your shift. That is, once you've triaged all the alerts, go back through the bugs you've filed. Have the bisects come back/failed? Take the next step toward figuring out what happened. sullivan 2015/12/09 20:37:27 Done. Show quoted text On 2015/12/09 18:58:56, RyanS wrote: > I think it would be good to include a first paragraph in this section about > following-up on performance regressions during your shift. That is, once > you've triaged all the alerts, go back through the bugs you've filed. Have the > bisects come back/failed? Take the next step toward figuring out what happened. Done. sullivan 2015/12/09 20:37:27 Done. Show quoted text On 2015/12/09 18:58:56, RyanS wrote: > I think it would be good to include a first paragraph in this section about > following-up on performance regressions during your shift. That is, once > you've triaged all the alerts, go back through the bugs you've filed. Have the > bisects come back/failed? Take the next step toward figuring out what happened. Done.
	61

	62 After your shift, please try to follow up on the bugs you filed weekly. Kick off

	63 new bisects if the previous ones failed, and if the bisect picks a likely

	64 culprit follow up to ensure the CL author addresses the problem. If you are

	65 certain that a specific CL caused a performance regression, it is okay to revert
	picksi 2015/12/10 10:50:24 Do we need to be more assertive than 'it is okay t Do we need to be more assertive than 'it is okay to revert'? Should we actively say that the CL 'must'(?) 'should' be reverted 'if the owner is not available to fix it'? I personally have a lot of psychological resistance to reverting CLs and any I would seize any get-out clause to avoid doing it! sullivan 2015/12/10 15:25:33 Done. Show quoted text On 2015/12/10 10:50:24, picksi wrote: > Do we need to be more assertive than 'it is okay to revert'? Should we actively > say that the CL 'must'(?) 'should' be reverted 'if the owner is not available to > fix it'? I personally have a lot of psychological resistance to reverting CLs > and any I would seize any get-out clause to avoid doing it! Done.
	66 that CL.

	67

	68 ###<a name="feedback"></a> Give Feedback on our Infrastructure

	69

	70 Perf regression sheriffs have their eyes on the perf dashboard and bisects

	71 more than anyone else, and their feedback is invaluable for making sure these

	72 tools are accurate and improving them. Please file bugs and feature requests

	73 as you see them:

	74

	75 * Perf Dashboard: Please use the red "Report Issue" link in the navbar.

	76 * Perf Bisect/Trybots: If a bisect is identifying the wrong CL as culprit

	77 or missing a clear culprit, or not reproducing what appears to be a clear

	78 regression, please link the comment the bisect bot posted on the bug at

	79 [go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA 3SR2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0).

	80 The team triages these regularly. If you spot a really clear bug (bisect

	81 job red, bugs not being updated with bisect results) please file it in

	82 crbug with label `Cr-Tests-AutoBisect`.

	83 * Noisy Tests: Please file a bug in crbug with label `Cr-Tests-Telemetry`

	84 and [cc the owner](http://go/perf-owners).

	85 <!-- Unresolved issues:
	RyanS 2015/12/09 18:58:56 This comment didn't work - I see it in the preview This comment didn't work - I see it in the preview. But I'm not sure why? sullivan 2015/12/09 20:37:27 Done. Show quoted text On 2015/12/09 18:58:56, RyanS wrote: > This comment didn't work - I see it in the preview. But I'm not sure why? Done.
	86 1. Are perf sheriffs responsible for static initializers failures?

	87 -->

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »