OLD | NEW |
1 # Perf Bot Sheriffing | 1 # Perf Bot Sheriffing |
2 | 2 |
3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf | 3 The perf bot sheriff is responsible for keeping the bots on the chromium.perf |
4 waterfall up and running, and triaging performance test failures and flakes. | 4 waterfall up and running, and triaging performance test failures and flakes. |
5 | 5 |
6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_
2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** | 6 **[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_
2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)** |
7 | 7 |
8 ## Key Responsibilities | 8 ## Key Responsibilities |
9 | 9 |
10 * [Handle Device and Bot Failures](#botfailures) | 10 * [Handle Device and Bot Failures](#botfailures) |
(...skipping 20 matching lines...) Expand all Loading... |
31 it easier to see a summary. | 31 it easier to see a summary. |
32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf
all) | 32 2. [Waterfall view](https://uberchromegw.corp.google.com/i/chromium.perf/waterf
all) |
33 shows more details, including recent changes. | 33 shows more details, including recent changes. |
34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of | 34 3. [Firefighter](https://chromiumperfstats.appspot.com/) shows traces of |
35 recent builds. It takes url parameter arguments: | 35 recent builds. It takes url parameter arguments: |
36 * **master** can be chromium.perf, tryserver.chromium.perf | 36 * **master** can be chromium.perf, tryserver.chromium.perf |
37 * **builder** can be a builder or tester name, like | 37 * **builder** can be a builder or tester name, like |
38 "Android Nexus5 Perf (2)" | 38 "Android Nexus5 Perf (2)" |
39 * **start_time** is seconds since the epoch. | 39 * **start_time** is seconds since the epoch. |
40 | 40 |
| 41 |
| 42 There is also [milo](https://luci-milo.appspot.com), which has the same data as |
| 43 buildbot, but mirrored in a different datastore. It is generally faster than |
| 44 buildbot, and links to it will not break, as the data is kept around for much |
| 45 longer. |
| 46 |
41 In addition to watching the waterfall directly, | 47 In addition to watching the waterfall directly, |
42 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may | 48 [Sheriff-O-Matic](https://sheriff-o-matic.appspot.com/chromium.perf) may |
43 optionally be used to easily track the different issues and associate | 49 optionally be used to easily track the different issues and associate |
44 them with specific bugs. | 50 them with specific bugs. It also attempts to group together similar failures |
| 51 across different builders, so it can help to see a higher level perspective on |
| 52 what is happening on the perf waterfall. |
45 | 53 |
46 You can see a list of all previously filed bugs using the | 54 You can see a list of all previously filed bugs using the |
47 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li
st?can=2&q=label%3APerformance-Sheriff-BotHealth)** | 55 **[Performance-Sheriff-BotHealth](https://bugs.chromium.org/p/chromium/issues/li
st?can=2&q=label%3APerformance-Sheriff-BotHealth)** |
48 label in crbug. | 56 label in crbug. |
49 | 57 |
50 Please also check the recent | 58 Please also check the recent |
51 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#!
forum/perf-sheriffs)** | 59 **[perf-sheriffs@chromium.org](https://groups.google.com/a/chromium.org/forum/#!
forum/perf-sheriffs)** |
52 postings for important announcements about bot turndowns and other known issues. | 60 postings for important announcements about bot turndowns and other known issues. |
53 | 61 |
54 ## Handle Device and Bot Failures | 62 ## Handle Device and Bot Failures |
55 | 63 |
56 ### Offline Buildslaves | 64 ### Offline Buildslaves |
57 | 65 |
58 Some build configurations, in particular the perf builders and trybots, have | 66 Some build configurations, in particular the perf builders and trybots, have |
59 multiple machines attached. If one or more of the machines go down, there are | 67 multiple machines attached. If one or more of the machines go down, there are |
60 still other machines running, so the console or waterfall view will still show | 68 still other machines running, so the console or waterfall view will still show |
61 green, but those configs will run at reduced throughput. At least once during | 69 green, but those configs will run at reduced throughput. At least once during |
62 your shift, you should check the lists of buildslaves and ensure they're all | 70 your shift, you should check the lists of buildslaves and ensure they're all |
63 running. | 71 running. |
64 | 72 |
65 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build
slaves) | 73 * [chromium.perf buildslaves](https://build.chromium.org/p/chromium.perf/build
slaves) |
66 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver
.chromium.perf/buildslaves) | 74 * [tryserver.chromium.perf buildslaves](https://build.chromium.org/p/tryserver
.chromium.perf/buildslaves) |
67 | 75 |
68 The machines restart between test runs, so just looking for "Status: Not | 76 The machines restart between test runs, so just looking for "Status: Not |
69 connected" is not enough to indicate a problem. For each disconnected machine, | 77 connected" is not enough to indicate a problem. For each disconnected machine, |
70 you can also check the "Last heard from" column to ensure that it's been gone | 78 you can also check the "Last heard from" column to ensure that it's been gone |
71 for at least an hour. To get it running again, | 79 for at least an hour. To get it running again, |
72 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf
ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla
ve+offline+on+chromium.perf) | 80 [file a bug](https://bugs.chromium.org/p/chromium/issues/entry?labels=Pri-1,Perf
ormance-Sheriff-BotHealth,Infra-Troopers,OS-?&comment=Hostname:&summary=Buildsla
ve+offline+on+chromium.perf) |
73 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper)
for contacting troopers. | 81 against the current trooper and read [go/bug-a-trooper](http://go/bug-a-trooper) |
| 82 for contacting troopers. |
| 83 |
| 84 The chrome infrastructure team also maintains a set of dashboards you can use to |
| 85 view some debugging information about our systems. This is available at |
| 86 [vi/chrome_infra](http://vi/chrome_infra). To debug offline buildslaves, |
| 87 you can look at the "Individual machine" dashboard, (at |
| 88 [vi/chrome_infra/Machines/per_machine](http://vi/chrome_infra/Machines/per_machi
ne) |
| 89 under the "Machines" section, which can show some useful information about the |
| 90 machine in question. |
74 | 91 |
75 ### Purple bots | 92 ### Purple bots |
76 | 93 |
77 When a bot goes purple, it's usually because of an infrastructure failure | 94 When a bot goes purple, it's usually because of an infrastructure failure |
78 outside of the tests. But you should first check the logs of a purple bot to | 95 outside of the tests. But you should first check the logs of a purple bot to |
79 try to better understand the problem. Sometimes a telemetry test failure can | 96 try to better understand the problem. Sometimes a telemetry test failure can |
80 turn the bot purple, for example. | 97 turn the bot purple, for example. |
81 | 98 |
82 If the bot goes purple and you believe it's an infrastructure issue, file a bug | 99 If the bot goes purple and you believe it's an infrastructure issue, file a bug |
83 with | 100 with |
(...skipping 176 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
260 | 277 |
261 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A
Performance-Sheriff-BotHealth+label%3APri-2)** | 278 **[Pri-2 bugs](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label%3A
Performance-Sheriff-BotHealth+label%3APri-2)** |
262 are for disabled tests. These should be pinged weekly, and work towards fixing | 279 are for disabled tests. These should be pinged weekly, and work towards fixing |
263 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the | 280 should be ongoing when the sheriff is not working on a Pri-1 issue. Here is the |
264 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o
rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr
i-2%20modified-before:today-7&sort=modified). | 281 [list of Pri-2 bugs that have not been pinged in a week](https://bugs.chromium.o
rg/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth%20label:Pr
i-2%20modified-before:today-7&sort=modified). |
265 | 282 |
266 <!-- Unresolved issues: | 283 <!-- Unresolved issues: |
267 1. Do perf sheriffs watch the bisect waterfall? | 284 1. Do perf sheriffs watch the bisect waterfall? |
268 2. Do perf sheriffs watch the internal clank waterfall? | 285 2. Do perf sheriffs watch the internal clank waterfall? |
269 --> | 286 --> |
OLD | NEW |