Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(385)

Side by Side Diff: tools/perf/docs/perf_bot_sheriffing.md

Issue 2380913002: Update perfbot sheriffing docs. (Closed)
Patch Set: Rebase and comments Created 4 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # Perf Bot Sheriffing 1 # Perf Bot Sheriffing
2 2
3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf 3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf
4 waterfall up and running, and triaging performance test failures and flakes. 4 waterfall up and running, and triaging performance test failures and flakes.
5 5
6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** 6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_ 2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)**
7 7
8 ## Key Responsibilities 8 ## Key Responsibilities
9 9
10 * [Handle Device and Bot Failures](#botfailures) 10 * [Handle Device and Bot Failures](#botfailures)
(...skipping 20 matching lines...) Expand all
31 it easier to see a summary. 31 it easier to see a summary.
32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf all) 32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf all)
33 shows more details, including recent changes. 33 shows more details, including recent changes.
34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of 34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of
35 recent builds. It takes url parameter arguments: 35 recent builds. It takes url parameter arguments:
36 * **master** can be chromium.perf, tryserver.chromium.perf 36 * **master** can be chromium.perf, tryserver.chromium.perf
37 * **builder** can be a builder or tester name, like 37 * **builder** can be a builder or tester name, like
38 "Android Nexus5 Perf (2)" 38 "Android Nexus5 Perf (2)"
39 * **start_time** is seconds since the epoch. 39 * **start_time** is seconds since the epoch.
40 40
41
42 There is also [milo](https://luci-milo.appspot.com), which has the same data as
43 buildbot, but mirrored in a different datastore. It is generally faster than
44 buildbot, and links to it will not break, as the data is kept around for much
45 longer.
46
41 In addition to watching the waterfall directly, 47 In addition to watching the waterfall directly,
42 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may 48 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may
43 optionally be used to easily track the different issues and associate 49 optionally be used to easily track the different issues and associate
44 them with specific bugs. 50 them with specific bugs. It also attempts to group together similar failures
51 across different builders, so it can help to see a higher level perspective on
52 what is happening on the perf waterfall.
45 53
46 You can see a list of all previously filed bugs using the 54 You can see a list of all previously filed bugs using the
47 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li st?can=2&q=label%3APerformance-Sheriff-BotHealth)** 55 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li st?can=2&q=label%3APerformance-Sheriff-BotHealth)**
48 label in crbug. 56 label in crbug.
49 57
50 Please also check the recent 58 Please also check the recent
51 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#! forum/perf-sheriffs)** 59 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#! forum/perf-sheriffs)**
52 postings for important announcements about bot turndowns and other known issues. 60 postings for important announcements about bot turndowns and other known issues.
53 61
54 ## Handle Device and Bot Failures 62 ## Handle Device and Bot Failures
55 63
56 ### Offline Buildslaves 64 ### Offline Buildslaves
57 65
58 Some build configurations, in particular the perf builders and trybots, have 66 Some build configurations, in particular the perf builders and trybots, have
59 multiple machines attached. If one or more of the machines go down, there are 67 multiple machines attached. If one or more of the machines go down, there are
60 still other machines running, so the console or waterfall view will still show 68 still other machines running, so the console or waterfall view will still show
61 green, but those configs will run at reduced throughput. At least once during 69 green, but those configs will run at reduced throughput. At least once during
62 your shift, you should check the lists of buildslaves and ensure they're all 70 your shift, you should check the lists of buildslaves and ensure they're all
63 running. 71 running.
64 72
65 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build slaves) 73 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build slaves)
66 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver .chromium.perf/buildslaves) 74 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver .chromium.perf/buildslaves)
67 75
68 The machines restart between test runs, so just looking for "Status: Not 76 The machines restart between test runs, so just looking for "Status: Not
69 connected" is not enough to indicate a problem. For each disconnected machine, 77 connected" is not enough to indicate a problem. For each disconnected machine,
70 you can also check the "Last heard from" column to ensure that it's been gone 78 you can also check the "Last heard from" column to ensure that it's been gone
71 for at least an hour. To get it running again, 79 for at least an hour. To get it running again,
72 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla ve+offline+on+chromium.perf) 80 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla ve+offline+on+chromium.perf)
73 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper) for contacting troopers. 81 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper)
82 for contacting troopers.
83
84 The chrome infrastructure team also maintains a set of dashboards you can use to
85 view some debugging information about our systems. This is available at
86 [vi/chrome_infra](http://vi/chrome_infra). To debug offline buildslaves,
87 you can look at the "Individual machine" dashboard, (at
88 [vi/chrome_infra/Machines/per_machine](http://vi/chrome_infra/Machines/per_machi ne)
89 under the "Machines" section, which can show some useful information about the
90 machine in question.
74 91
75 ### Purple bots 92 ### Purple bots
76 93
77 When a bot goes purple, it's usually because of an infrastructure failure 94 When a bot goes purple, it's usually because of an infrastructure failure
78 outside of the tests. But you should first check the logs of a purple bot to 95 outside of the tests. But you should first check the logs of a purple bot to
79 try to better understand the problem. Sometimes a telemetry test failure can 96 try to better understand the problem. Sometimes a telemetry test failure can
80 turn the bot purple, for example. 97 turn the bot purple, for example.
81 98
82 If the bot goes purple and you believe it's an infrastructure issue, file a bug 99 If the bot goes purple and you believe it's an infrastructure issue, file a bug
83 with 100 with
(...skipping 176 matching lines...) Expand 10 before | Expand all | Expand 10 after
260 277
261 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)** 278 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A Performance-Sheriff-BotHealth+label%3APri-2)**
262 are for disabled tests. These should be pinged weekly, and work towards fixing 279 are for disabled tests. These should be pinged weekly, and work towards fixing
263 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the 280 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the
264 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified). 281 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr i-2%20modified-before:today-7&sort=modified).
265 282
266 <!-- Unresolved issues: 283 <!-- Unresolved issues:
267 1. Do perf sheriffs watch the bisect waterfall? 284 1. Do perf sheriffs watch the bisect waterfall?
268 2. Do perf sheriffs watch the internal clank waterfall? 285 2. Do perf sheriffs watch the internal clank waterfall?
269 --> 286 -->
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698