site/dev/sheriffing/trooper.md - Issue 1308253005: Add some useful information to trooper doc.

Unified Diff: site/dev/sheriffing/trooper.md

Issue 1308253005: Add some useful information to trooper doc. (Closed) Base URL: https://skia.googlesource.com/skia.git@master

Patch Set: Minor edits. Created 5 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: site/dev/sheriffing/trooper.md

diff --git a/site/dev/sheriffing/trooper.md b/site/dev/sheriffing/trooper.md

index 4840f6376b9c7a074d81e1cb08cccb84c3fafe94..a73aaf100f640d78c1ad8daca42e604204d873d5 100644

--- a/site/dev/sheriffing/trooper.md

+++ b/site/dev/sheriffing/trooper.md

@@ -37,4 +37,78 @@ If you need to swap shifts with someone (because you are out sick or on vacation

Tips for troopers

-----------------

-Add your tips here!

+- Make sure you are a member of

+ [MDB group chrome-skia-ninja](https://ganpati.corp.google.com/#Group_Info?name=chrome-skia-ninja@prod.google.com).

+ Valentine passwords and Chrome Golo access are based on membership in this

+ group.

+- These alerts generally auto-dismiss once the criteria for the alert is no

+ longer met:

+ - Monitoring alerts, including prober, collectd, and others

+ - Disconnected build slaves

+- These alerts generally do not auto-dismiss ([issue here](https://code.google.com/p/skia/issues/detail?id=4292)):

+ - Build slaves that failed a step

+ - Disconnected devices (these are detected as the "wait for device" step failing)

+- "Failed to execute query" may show a different query than the failing one;

+ dismiss the alert to get a new alert showing the query that is actually

+ failing. (All "failed to execute query" alerts are lumped into a single alert,

+ which is why the failed query which initially triggered the alert may not be

+ failing any more but the alert is still active because another query is

+ failing.)

+- Where machines are located:

+ - Machine name like "skia-vm-NNN" -> GCE

+ - Machine name ends with "a3", "a4", "m3" -> Chrome Golo

+ - Machine name starts with "skiabot-" -> Chapel Hill lab

+ - Machine name starts with "win8" -> Chapel Hill lab (Windows machine

+ names can't be very long, so the "skiabot-shuttle-" prefix is dropped.)

+ - slave11-c3 is a Chrome infra GCE machine (not to be confused with the Skia

+ Buildbots GCE, which we refer to as simply "GCE")

+- The [chrome-infra IRC channel](https://comlink.googleplex.com/chrome-infra) is

+ useful for questions regarding bots managed by the Chrome Infra team and to

+ get visibility into upstream failures that cause problems for us.

+- To log in to a Linux buildbot in GCE, use `gcloud compute default@<machine

+ name>`. Choose the zone listed for the

+ [GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances)

+ (or specify it using the `--zone` command-line flag).

+- To log in to a Windows buildbot in GCE, use

+ [Chrome RDP Extension](https://chrome.google.com/webstore/detail/chrome-rdp/cbkkbcmdlboombapidmoeolnmdacpkch?hl=en-US)

+ with the

+ [IP address of the GCE VM](https://pantheon.corp.google.com/project/31977622648/compute/instances)

+ shown on the [host info page](https://status.skia.org/hosts) for that bot. The

+ username is chrome-bot and the password can be found on

+ [Valentine](https://valentine.corp.google.com/) as "chrome-bot (Win GCE)".

+- If there is a problem with a bot in the Chrome Golo or Chrome infra GCE, the

+ best course of action is to

+ [file a bug](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure)

+ with the Chrome infra team. But if you know what you're doing:

+ - To access bots in the Chrome Golo,

+ [follow these instructions](https://chrome-internal.googlesource.com/infra/infra_internal/+/master/doc/ssh.md).

+ - Machine name ends with "a3" or "a4" -> ssh command looks like `ssh

+ build3-a3.chrome`

+ - Machine name ends with "m3" -> ssh command looks like `ssh build5-m3.golo`

+ - For MacOS and Windows bots, you will be prompted for a password, which is

+ stored on [Valentine](https://valentine.corp.google.com/) as "Chrome Golo,

+ Perf, GPU bots - chrome-bot".

+ - To access bots in the Chrome infra GCE -> command looks like `gcutil

+ --project=google.com:chromecompute ssh --ssh_user=default slave11-c3` (or

+ use the ccompute ssh script from the infra_internal repo).

+- Read over the [SkiaLab documentation](../testing/skialab) for more detail on

+ dealing with device alerts.

+- To stop a buildslave for a device, log in to the host for that device, `cd

+ ~/buildbot/<slave name>/build/slave; make stop`. To start it again,

+ `TESTING_SLAVENAME=<slave name> make start`.

+- Buildslaves can be slow to come up after reboot, but if the buildslave remains

+ disconnected, you may need to start it manually. On Mac and Linux, check using

+ `ps aux | grep python` that neither buildbot nor gclient are running, then run

+ `~/skiabot-slave-start-on-boot.sh`.

« no previous file with comments | « no previous file | no next file » | no next file with comments »