Index: site/dev/sheriffing/trooper.md |
diff --git a/site/dev/sheriffing/trooper.md b/site/dev/sheriffing/trooper.md |
index 4840f6376b9c7a074d81e1cb08cccb84c3fafe94..a73aaf100f640d78c1ad8daca42e604204d873d5 100644 |
--- a/site/dev/sheriffing/trooper.md |
+++ b/site/dev/sheriffing/trooper.md |
@@ -37,4 +37,78 @@ If you need to swap shifts with someone (because you are out sick or on vacation |
<a name="tips"></a> |
Tips for troopers |
----------------- |
-Add your tips here! |
+ |
+- Make sure you are a member of |
+ [MDB group chrome-skia-ninja](https://ganpati.corp.google.com/#Group_Info?name=chrome-skia-ninja@prod.google.com). |
+ Valentine passwords and Chrome Golo access are based on membership in this |
+ group. |
+ |
+- These alerts generally auto-dismiss once the criteria for the alert is no |
+ longer met: |
+ - Monitoring alerts, including prober, collectd, and others |
+ - Disconnected build slaves |
+ |
+- These alerts generally do not auto-dismiss ([issue here](https://code.google.com/p/skia/issues/detail?id=4292)): |
+ - Build slaves that failed a step |
+ - Disconnected devices (these are detected as the "wait for device" step failing) |
+ |
+- "Failed to execute query" may show a different query than the failing one; |
+ dismiss the alert to get a new alert showing the query that is actually |
+ failing. (All "failed to execute query" alerts are lumped into a single alert, |
+ which is why the failed query which initially triggered the alert may not be |
+ failing any more but the alert is still active because another query is |
+ failing.) |
+ |
+- Where machines are located: |
+ - Machine name like "skia-vm-NNN" -> GCE |
+ - Machine name ends with "a3", "a4", "m3" -> Chrome Golo |
+ - Machine name starts with "skiabot-" -> Chapel Hill lab |
+ - Machine name starts with "win8" -> Chapel Hill lab (Windows machine |
+ names can't be very long, so the "skiabot-shuttle-" prefix is dropped.) |
+ - slave11-c3 is a Chrome infra GCE machine (not to be confused with the Skia |
+ Buildbots GCE, which we refer to as simply "GCE") |
+ |
+- The [chrome-infra IRC channel](https://comlink.googleplex.com/chrome-infra) is |
+ useful for questions regarding bots managed by the Chrome Infra team and to |
+ get visibility into upstream failures that cause problems for us. |
+ |
+- To log in to a Linux buildbot in GCE, use `gcloud compute default@<machine |
+ name>`. Choose the zone listed for the |
+ [GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances) |
+ (or specify it using the `--zone` command-line flag). |
+ |
+- To log in to a Windows buildbot in GCE, use |
+ [Chrome RDP Extension](https://chrome.google.com/webstore/detail/chrome-rdp/cbkkbcmdlboombapidmoeolnmdacpkch?hl=en-US) |
+ with the |
+ [IP address of the GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances) |
+ shown on the [host info page](https://status.skia.org/hosts) for that bot. The |
+ username is chrome-bot and the password can be found on |
+ [Valentine](https://valentine.corp.google.com/) as "chrome-bot (Win GCE)". |
+ |
+- If there is a problem with a bot in the Chrome Golo or Chrome infra GCE, the |
+ best course of action is to |
+ [file a bug](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure) |
+ with the Chrome infra team. But if you know what you're doing: |
+ - To access bots in the Chrome Golo, |
+ [follow these instructions](https://chrome-internal.googlesource.com/infra/infra_internal/+/master/doc/ssh.md). |
+ - Machine name ends with "a3" or "a4" -> ssh command looks like `ssh |
+ build3-a3.chrome` |
+ - Machine name ends with "m3" -> ssh command looks like `ssh build5-m3.golo` |
+ - For MacOS and Windows bots, you will be prompted for a password, which is |
+ stored on [Valentine](https://valentine.corp.google.com/) as "Chrome Golo, |
+ Perf, GPU bots - chrome-bot". |
+ - To access bots in the Chrome infra GCE -> command looks like `gcutil |
+ --project=google.com:chromecompute ssh --ssh_user=default slave11-c3` (or |
+ use the ccompute ssh script from the infra_internal repo). |
+ |
+- Read over the [SkiaLab documentation](../testing/skialab) for more detail on |
+ dealing with device alerts. |
+ |
+- To stop a buildslave for a device, log in to the host for that device, `cd |
+ ~/buildbot/<slave name>/build/slave; make stop`. To start it again, |
+ `TESTING_SLAVENAME=<slave name> make start`. |
+ |
+- Buildslaves can be slow to come up after reboot, but if the buildslave remains |
+ disconnected, you may need to start it manually. On Mac and Linux, check using |
+ `ps aux | grep python` that neither buildbot nor gclient are running, then run |
+ `~/skiabot-slave-start-on-boot.sh`. |