| OLD | NEW |
| 1 # Perf Bot Sheriffing | 1 # Perf Bot Sheriffing |
| 2 | 2 |
| 3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf | 3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf |
| 4 waterfall up and running, and triaging performance test failures and flakes. | 4 waterfall up and running, and triaging performance test failures and flakes. |
| 5 | 5 |
| 6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_
2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** | 6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_
2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** |
| 7 | 7 |
| 8 ## Key Responsibilities | 8 ## Key Responsibilities |
| 9 | 9 |
| 10 * [Handle Device and Bot Failures](#botfailures) | 10 * [Handle Device and Bot Failures](#botfailures) |
| (...skipping 20 matching lines...) Expand all Loading... |
| 31 it easier to see a summary. | 31 it easier to see a summary. |
| 32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf
all) | 32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf
all) |
| 33 shows more details, including recent changes. | 33 shows more details, including recent changes. |
| 34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of | 34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of |
| 35 recent builds. It takes url parameter arguments: | 35 recent builds. It takes url parameter arguments: |
| 36 * **master** can be chromium.perf, tryserver.chromium.perf | 36 * **master** can be chromium.perf, tryserver.chromium.perf |
| 37 * **builder** can be a builder or tester name, like | 37 * **builder** can be a builder or tester name, like |
| 38 "Android Nexus5 Perf (2)" | 38 "Android Nexus5 Perf (2)" |
| 39 * **start_time** is seconds since the epoch. | 39 * **start_time** is seconds since the epoch. |
| 40 | 40 |
| 41 |
| 42 There is also [milo](https://luci-milo.appspot.com), which has the same data as |
| 43 buildbot, but mirrored in a different datastore. It is generally faster than |
| 44 buildbot, and links to it will not break, as the data is kept around for much |
| 45 longer. |
| 46 |
| 41 In addition to watching the waterfall directly, | 47 In addition to watching the waterfall directly, |
| 42 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may | 48 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may |
| 43 optionally be used to easily track the different issues and associate | 49 optionally be used to easily track the different issues and associate |
| 44 them with specific bugs. | 50 them with specific bugs. It also attempts to group together similar failures |
| 51 across different builders, so it can help to see a higher level perspective on |
| 52 what is happening on the perf waterfall. |
| 45 | 53 |
| 46 You can see a list of all previously filed bugs using the | 54 You can see a list of all previously filed bugs using the |
| 47 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li
st?can=2&q=label%3APerformance-Sheriff-BotHealth)** | 55 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li
st?can=2&q=label%3APerformance-Sheriff-BotHealth)** |
| 48 label in crbug. | 56 label in crbug. |
| 49 | 57 |
| 50 Please also check the recent | 58 Please also check the recent |
| 51 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#!
forum/perf-sheriffs)** | 59 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#!
forum/perf-sheriffs)** |
| 52 postings for important announcements about bot turndowns and other known issues. | 60 postings for important announcements about bot turndowns and other known issues. |
| 53 | 61 |
| 54 ## Handle Device and Bot Failures | 62 ## Handle Device and Bot Failures |
| 55 | 63 |
| 56 ### Offline Buildslaves | 64 ### Offline Buildslaves |
| 57 | 65 |
| 58 Some build configurations, in particular the perf builders and trybots, have | 66 Some build configurations, in particular the perf builders and trybots, have |
| 59 multiple machines attached. If one or more of the machines go down, there are | 67 multiple machines attached. If one or more of the machines go down, there are |
| 60 still other machines running, so the console or waterfall view will still show | 68 still other machines running, so the console or waterfall view will still show |
| 61 green, but those configs will run at reduced throughput. At least once during | 69 green, but those configs will run at reduced throughput. At least once during |
| 62 your shift, you should check the lists of buildslaves and ensure they're all | 70 your shift, you should check the lists of buildslaves and ensure they're all |
| 63 running. | 71 running. |
| 64 | 72 |
| 65 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build
slaves) | 73 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build
slaves) |
| 66 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver
.chromium.perf/buildslaves) | 74 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver
.chromium.perf/buildslaves) |
| 67 | 75 |
| 68 The machines restart between test runs, so just looking for "Status: Not | 76 The machines restart between test runs, so just looking for "Status: Not |
| 69 connected" is not enough to indicate a problem. For each disconnected machine, | 77 connected" is not enough to indicate a problem. For each disconnected machine, |
| 70 you can also check the "Last heard from" column to ensure that it's been gone | 78 you can also check the "Last heard from" column to ensure that it's been gone |
| 71 for at least an hour. To get it running again, | 79 for at least an hour. To get it running again, |
| 72 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf
ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla
ve+offline+on+chromium.perf) | 80 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf
ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla
ve+offline+on+chromium.perf) |
| 73 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper)
for contacting troopers. | 81 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper) |
| 82 for contacting troopers. |
| 83 |
| 84 The chrome infrastructure team also maintains a set of dashboards you can use to |
| 85 view some debugging information about our systems. This is available at |
| 86 [vi/chrome_infra](http://vi/chrome_infra). To debug offline buildslaves, |
| 87 you can look at the "Individual machine" dashboard, (at |
| 88 [vi/chrome_infra/Machines/per_machine](http://vi/chrome_infra/Machines/per_machi
ne) |
| 89 under the "Machines" section, which can show some useful information about the |
| 90 machine in question. |
| 74 | 91 |
| 75 ### Purple bots | 92 ### Purple bots |
| 76 | 93 |
| 77 When a bot goes purple, it's usually because of an infrastructure failure | 94 When a bot goes purple, it's usually because of an infrastructure failure |
| 78 outside of the tests. But you should first check the logs of a purple bot to | 95 outside of the tests. But you should first check the logs of a purple bot to |
| 79 try to better understand the problem. Sometimes a telemetry test failure can | 96 try to better understand the problem. Sometimes a telemetry test failure can |
| 80 turn the bot purple, for example. | 97 turn the bot purple, for example. |
| 81 | 98 |
| 82 If the bot goes purple and you believe it's an infrastructure issue, file a bug | 99 If the bot goes purple and you believe it's an infrastructure issue, file a bug |
| 83 with | 100 with |
| (...skipping 176 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 260 | 277 |
| 261 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A
Performance-Sheriff-BotHealth+label%3APri-2)** | 278 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A
Performance-Sheriff-BotHealth+label%3APri-2)** |
| 262 are for disabled tests. These should be pinged weekly, and work towards fixing | 279 are for disabled tests. These should be pinged weekly, and work towards fixing |
| 263 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the | 280 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the |
| 264 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o
rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr
i-2%20modified-before:today-7&sort=modified). | 281 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o
rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr
i-2%20modified-before:today-7&sort=modified). |
| 265 | 282 |
| 266 <!-- Unresolved issues: | 283 <!-- Unresolved issues: |
| 267 1. Do perf sheriffs watch the bisect waterfall? | 284 1. Do perf sheriffs watch the bisect waterfall? |
| 268 2. Do perf sheriffs watch the internal clank waterfall? | 285 2. Do perf sheriffs watch the internal clank waterfall? |
| 269 --> | 286 --> |
| OLD | NEW |