Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(3)

Unified Diff: site/dev/sheriffing/trooper.md

Issue 1308253005: Add some useful information to trooper doc. (Closed) Base URL: https://skia.googlesource.com/skia.git@master
Patch Set: Minor edits. Created 5 years, 3 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: site/dev/sheriffing/trooper.md
diff --git a/site/dev/sheriffing/trooper.md b/site/dev/sheriffing/trooper.md
index 4840f6376b9c7a074d81e1cb08cccb84c3fafe94..a73aaf100f640d78c1ad8daca42e604204d873d5 100644
--- a/site/dev/sheriffing/trooper.md
+++ b/site/dev/sheriffing/trooper.md
@@ -37,4 +37,78 @@ If you need to swap shifts with someone (because you are out sick or on vacation
<a name="tips"></a>
Tips for troopers
-----------------
-Add your tips here!
+
+- Make sure you are a member of
+ [MDB group chrome-skia-ninja](https://ganpati.corp.google.com/#Group_Info?name=chrome-skia-ninja@prod.google.com).
+ Valentine passwords and Chrome Golo access are based on membership in this
+ group.
+
+- These alerts generally auto-dismiss once the criteria for the alert is no
+ longer met:
+ - Monitoring alerts, including prober, collectd, and others
+ - Disconnected build slaves
+
+- These alerts generally do not auto-dismiss ([issue here](https://code.google.com/p/skia/issues/detail?id=4292)):
+ - Build slaves that failed a step
+ - Disconnected devices (these are detected as the "wait for device" step failing)
+
+- "Failed to execute query" may show a different query than the failing one;
+ dismiss the alert to get a new alert showing the query that is actually
+ failing. (All "failed to execute query" alerts are lumped into a single alert,
+ which is why the failed query which initially triggered the alert may not be
+ failing any more but the alert is still active because another query is
+ failing.)
+
+- Where machines are located:
+ - Machine name like "skia-vm-NNN" -> GCE
+ - Machine name ends with "a3", "a4", "m3" -> Chrome Golo
+ - Machine name starts with "skiabot-" -> Chapel Hill lab
+ - Machine name starts with "win8" -> Chapel Hill lab (Windows machine
+ names can't be very long, so the "skiabot-shuttle-" prefix is dropped.)
+ - slave11-c3 is a Chrome infra GCE machine (not to be confused with the Skia
+ Buildbots GCE, which we refer to as simply "GCE")
+
+- The [chrome-infra IRC channel](https://comlink.googleplex.com/chrome-infra) is
+ useful for questions regarding bots managed by the Chrome Infra team and to
+ get visibility into upstream failures that cause problems for us.
+
+- To log in to a Linux buildbot in GCE, use `gcloud compute default@<machine
+ name>`. Choose the zone listed for the
+ [GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances)
+ (or specify it using the `--zone` command-line flag).
+
+- To log in to a Windows buildbot in GCE, use
+ [Chrome RDP Extension](https://chrome.google.com/webstore/detail/chrome-rdp/cbkkbcmdlboombapidmoeolnmdacpkch?hl=en-US)
+ with the
+ [IP address of the GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances)
+ shown on the [host info page](https://status.skia.org/hosts) for that bot. The
+ username is chrome-bot and the password can be found on
+ [Valentine](https://valentine.corp.google.com/) as "chrome-bot (Win GCE)".
+
+- If there is a problem with a bot in the Chrome Golo or Chrome infra GCE, the
+ best course of action is to
+ [file a bug](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure)
+ with the Chrome infra team. But if you know what you're doing:
+ - To access bots in the Chrome Golo,
+ [follow these instructions](https://chrome-internal.googlesource.com/infra/infra_internal/+/master/doc/ssh.md).
+ - Machine name ends with "a3" or "a4" -> ssh command looks like `ssh
+ build3-a3.chrome`
+ - Machine name ends with "m3" -> ssh command looks like `ssh build5-m3.golo`
+ - For MacOS and Windows bots, you will be prompted for a password, which is
+ stored on [Valentine](https://valentine.corp.google.com/) as "Chrome Golo,
+ Perf, GPU bots - chrome-bot".
+ - To access bots in the Chrome infra GCE -> command looks like `gcutil
+ --project=google.com:chromecompute ssh --ssh_user=default slave11-c3` (or
+ use the ccompute ssh script from the infra_internal repo).
+
+- Read over the [SkiaLab documentation](../testing/skialab) for more detail on
+ dealing with device alerts.
+
+- To stop a buildslave for a device, log in to the host for that device, `cd
+ ~/buildbot/<slave name>/build/slave; make stop`. To start it again,
+ `TESTING_SLAVENAME=<slave name> make start`.
+
+- Buildslaves can be slow to come up after reboot, but if the buildslave remains
+ disconnected, you may need to start it manually. On Mac and Linux, check using
+ `ps aux | grep python` that neither buildbot nor gclient are running, then run
+ `~/skiabot-slave-start-on-boot.sh`.
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698