tools/perf/docs/perf_bot_sheriffing.md - Issue 2380913002: Update perfbot sheriffing docs.

Side by Side Diff: tools/perf/docs/perf_bot_sheriffing.md

Issue 2380913002: Update perfbot sheriffing docs. (Closed)

Patch Set: Rebase and comments Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 # Perf Bot Sheriffing	1 # Perf Bot Sheriffing

2	2

3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf	3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf

4 waterfall up and running, and triaging performance test failures and flakes.	4 waterfall up and running, and triaging performance test failures and flakes.

5	5

6 [Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)	6 [Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)

7	7

8 ## Key Responsibilities	8 ## Key Responsibilities

9	9

10 * [Handle Device and Bot Failures](#botfailures)	10 * [Handle Device and Bot Failures](#botfailures)

(...skipping 20 matching lines...) Expand all Loading...
31 it easier to see a summary.	31 it easier to see a summary.

32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf all)	32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf all)

33 shows more details, including recent changes.	33 shows more details, including recent changes.

34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of	34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of

35 recent builds. It takes url parameter arguments:	35 recent builds. It takes url parameter arguments:

36 * master can be chromium.perf, tryserver.chromium.perf	36 * master can be chromium.perf, tryserver.chromium.perf

37 * builder can be a builder or tester name, like	37 * builder can be a builder or tester name, like

38 "Android Nexus5 Perf (2)"	38 "Android Nexus5 Perf (2)"

39 * start_time is seconds since the epoch.	39 * start_time is seconds since the epoch.

40	40

	41

	42 There is also [milo](https://luci-milo.appspot.com), which has the same data as

	43 buildbot, but mirrored in a different datastore. It is generally faster than

	44 buildbot, and links to it will not break, as the data is kept around for much

	45 longer.

	46

41 In addition to watching the waterfall directly,	47 In addition to watching the waterfall directly,

42 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may	48 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may

43 optionally be used to easily track the different issues and associate	49 optionally be used to easily track the different issues and associate

44 them with specific bugs.	50 them with specific bugs. It also attempts to group together similar failures

	51 across different builders, so it can help to see a higher level perspective on

	52 what is happening on the perf waterfall.

45	53

46 You can see a list of all previously filed bugs using the	54 You can see a list of all previously filed bugs using the

47 [Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li st?can=2&q=label%3APerformance-Sheriff-BotHealth)	55 [Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li st?can=2&q=label%3APerformance-Sheriff-BotHealth)

48 label in crbug.	56 label in crbug.

49	57

50 Please also check the recent	58 Please also check the recent

51 [perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#! forum/perf-sheriffs)	59 [perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#! forum/perf-sheriffs)

52 postings for important announcements about bot turndowns and other known issues.	60 postings for important announcements about bot turndowns and other known issues.

53	61

54 ## Handle Device and Bot Failures	62 ## Handle Device and Bot Failures

55	63

56 ### Offline Buildslaves	64 ### Offline Buildslaves

57	65

58 Some build configurations, in particular the perf builders and trybots, have	66 Some build configurations, in particular the perf builders and trybots, have

59 multiple machines attached. If one or more of the machines go down, there are	67 multiple machines attached. If one or more of the machines go down, there are

60 still other machines running, so the console or waterfall view will still show	68 still other machines running, so the console or waterfall view will still show

61 green, but those configs will run at reduced throughput. At least once during	69 green, but those configs will run at reduced throughput. At least once during

62 your shift, you should check the lists of buildslaves and ensure they're all	70 your shift, you should check the lists of buildslaves and ensure they're all

63 running.	71 running.

64	72

65 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build slaves)	73 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build slaves)

66 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver .chromium.perf/buildslaves)	74 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver .chromium.perf/buildslaves)

67	75

68 The machines restart between test runs, so just looking for "Status: Not	76 The machines restart between test runs, so just looking for "Status: Not

69 connected" is not enough to indicate a problem. For each disconnected machine,	77 connected" is not enough to indicate a problem. For each disconnected machine,

70 you can also check the "Last heard from" column to ensure that it's been gone	78 you can also check the "Last heard from" column to ensure that it's been gone

71 for at least an hour. To get it running again,	79 for at least an hour. To get it running again,

72 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla ve+offline+on+chromium.perf)	80 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla ve+offline+on+chromium.perf)

73 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper) for contacting troopers.	81 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper)

	82 for contacting troopers.

	83

	84 The chrome infrastructure team also maintains a set of dashboards you can use to

	85 view some debugging information about our systems. This is available at

	86 [vi/chrome_infra](http://vi/chrome_infra). To debug offline buildslaves,

	87 you can look at the "Individual machine" dashboard, (at

	88 [vi/chrome_infra/Machines/per_machine](http://vi/chrome_infra/Machines/per_machi ne)

	89 under the "Machines" section, which can show some useful information about the

	90 machine in question.

74	91

75 ### Purple bots	92 ### Purple bots

76	93

77 When a bot goes purple, it's usually because of an infrastructure failure	94 When a bot goes purple, it's usually because of an infrastructure failure

78 outside of the tests. But you should first check the logs of a purple bot to	95 outside of the tests. But you should first check the logs of a purple bot to

79 try to better understand the problem. Sometimes a telemetry test failure can	96 try to better understand the problem. Sometimes a telemetry test failure can

80 turn the bot purple, for example.	97 turn the bot purple, for example.

81	98

82 If the bot goes purple and you believe it's an infrastructure issue, file a bug	99 If the bot goes purple and you believe it's an infrastructure issue, file a bug

83 with	100 with

(...skipping 176 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
260	277

261 [Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)	278 [Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)

262 are for disabled tests. These should be pinged weekly, and work towards fixing	279 are for disabled tests. These should be pinged weekly, and work towards fixing

263 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the	280 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the

264 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified).	281 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified).

265	282

266 <!-- Unresolved issues:	283 <!-- Unresolved issues:

267 1. Do perf sheriffs watch the bisect waterfall?	284 1. Do perf sheriffs watch the bisect waterfall?

268 2. Do perf sheriffs watch the internal clank waterfall?	285 2. Do perf sheriffs watch the internal clank waterfall?

269 -->	286 -->

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »