Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(632)

Unified Diff: site/dev/testing/skialab.md

Issue 1154623006: Add documentation for SkiaLab (Closed) Base URL: https://skia.googlesource.com/skia.git@master
Patch Set: Moar fix Created 5 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « site/dev/testing/buildbot.md ('k') | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: site/dev/testing/skialab.md
diff --git a/site/dev/testing/skialab.md b/site/dev/testing/skialab.md
new file mode 100644
index 0000000000000000000000000000000000000000..cdb2449c4fad676c790adbbe36b17424e7f02c8b
--- /dev/null
+++ b/site/dev/testing/skialab.md
@@ -0,0 +1,217 @@
+SkiaLab
+=======
+
+Overview
+--------
+
+Skia's buildbots are hosted in three places:
+
+* Google Compute Engine. This is the preferred location for bots which don't
+ need to run on physical hardware, ie. anything that doesn't require a GPU,
+ stable performance numbers, or a specific hardware configuration. Most of our
+ compile bots live here, along with some non-GPU test bots on Linux and
+ Windows.
+* Chrome Golo. This is the preferred location for bots which require specific
+ hardware or OS configurations that are not supported by GCE. We have several
+ Mac, Linux, and Windows bots in the Golo.
+* The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo
+ lives here. This includes newer or uncommon GPUs and all Android, ChromeOS,
+ and iOS devices.
+
+This page covers the local SkiaLab in Chapel Hill.
+
+
+Layout
+------
+
+The SkiaLab consists of three wireframe racks which hold machines connected to
+two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the
+primary mode of access to the lab machines. In general, the machines are on the
+same rack as the KVM switch used to access them. The switch nearest the door
+(labeled "DOOR"), is connected to machines on its own rack as well as a smaller
+rack closer to the door.
+
+Each machine is labeled with its hostname and the number or letter used to
+access it on the KVM switch. Android devices are located on the rack nearest
+the interior of the office (the KVM switch is labeled "OFFICE"). They are
+labeled with their serial number and the name of the buildslave they are
+associated with. Each device connects to a host machine, either directly or
+by way of a powered USB hub.
+
+**Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it
+is disruptive to the running bots and can leave the machines in a dirty state.
+If you must make changes, such as cloning a copy of Skia to run tests and debug
+failures, be sure to clean up after yourself. If a permanent change needs to be
+made on the machine (such as a driver update), please contact an infra team
+member.**
+
+
+Common Tasks
+------------
+
+### Locating the host machine for a failing bot
+
+Sometimes failures can only be reproduced on a particular hardware
+configuration. In these cases, it is sometimes necessary to log into the host
+machine where a failing bot is running in order to debug the failure.
+
+From the [Status](https://status.skia.org/) page:
+
+1. Click on the box associated with a failed build.
+2. A popup will appear with some information about the build, including the
+ builder and buildslave. Click the "Lookup" link next to "Host machine". This
+ will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page,
+ which contains information about the machines in the lab, pre-filtered to
+ select the machine which runs the buildslave in question.
+3. The information box will display the hostname of the machine as well as the
+ KVM switch and number used to access the machine, if the machine is in the
+ SkiaLab.
+4. Walk over to the lab. While standing at the KVM switch indicated by the host
+ information page, double tab <ctrl> and then press the number or letter from
+ the information page. It may be necessary to move or click the mouse to wake
+ the machine up.
+5. Log in to the machine if necessary. The password is stored in
+ [Valentine](https://valentine/).
+
+### Rebooting a problematic Android device
+
+Follow the same process as above, with some slight changes:
+
+1. On the [Status](https://status.skia.org/) page, click the box for the failed
+ build.
+2. Click the "Lookup" link for the host machine. Remember the name of the
+ buildslave which ran the build.
+3. The hosts page will display the information used to access the host machine
+ for the device as well as the serial number for the device next to the name
+ of its buildsave.
+4. Walk over to the lab and find the Android device with the serial number from
+ the hosts page. Hold the power and volume-up buttons until the device
+ reboots.
+5. Access the host machine for the device, per the above instructions. Use the
+ `which_devices.py` script to verify that the device has re-attached. From
+ the home directory:
+
+ $ python buildbot/scripts/which_devices.py
+
+
+Maintenance Tasks
+-----------------
+
+### Bringing up a new buildbot host machine
+
+This assumes that we're just adding a host machine for a new buildbot slave,
+and doesn't cover how to make changes to the buildbot code to change the
+behavior of the builder itself.
+
+1. Obtain the machine itself and place it on the racks in the lab. Connect
+ power, ethernet, and KVM cables.
+2. If we already have a disk image appropriate for this machine, follow the
+ instructions for flashing a disk image to a machine below. Otherwise, follow
+ the instructions for bringing up a new machine from scratch.
+3. Set the hostname for the machine.
+4. Add the new slave to the slaves.cfg file on the appropriate master, eg.
+ https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.client.skia/slaves.cfg,
+ and upload the change for code review.
+5. Add an entry for the new host machine to the slave_hosts_cfg.py file in the
+ Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/slave_hosts_cfg.py,
+ and upload it for review.
+6. Commit the change to add the slave to the master. Once it lands, commit the
+ slave_hosts_cfg.py change immediately afterward.
+7. Restart the build master. Either ask borenet@ to do this or file a
+ [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restart%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%20restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20when%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it.
+8. Reboot the machine and monitor the build master to ensure that it connects.
+
+
+### Bringing up a new Android bot
+
+1. Locate or add a host machine. We generally want to keep the number of
+ devices attached to each host below 5 or so. If a new host machine is
+ required, follow the above instructions for bringing up a new buildbot
+ host machine, with the exception that the slave corresponds to the Android
+ device, not the host machine itself.
+2. Ensure that the buildslave is not yet running:
+
+ $ killall python
+
+3. Connect the device to the host machine, either through a powered USB hub or
+ directly to the machine.
+4. Make sure that the device is in developer mode and that USB debugging is
+ enabled.
+5. Authorize the device for USB debugging on the host machine by checking the
+ "always allow" box on dialog box which appears on the Android device after
+ plugging it into the host.
+6. Ensure that the device appears as "connected" when you run the
+ `which_devices.py` script:
+
+ $ python buildbot/scripts/which_devices.py
+
+7. Reboot the machine to start the buildslave.
+
+
+### Bringing up a new machine from scratch
+
+TODO(borenet): Migrate from Google Docs.
+
+OS-specific instructions are available in a
+[Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAICLkB7ipDcxRV8/edit)
+
+
+### Flashing a disk image to a machine
+
+1. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the
+ machine.
+2. Turn on the machine and load the boot menu. For Shuttle machines, press
+ \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and
+ press the \<option\> key at boot. Boot from the USB key. It's typically UEFI
+ and named something like "FlashBlu" or "Kanguru".
+3. At the Clonezilla menu, choose the "to RAM" option.
+4. Choose your preferred language.
+5. "Don't touch keymap".
+6. "Start Clonezilla".
+7. "device-image".
+8. "local_dev".
+9. Unplug the flash drive and plug in the external hard drive labeled, "Disk
+ images." Wait for the "Attached Enclosure device" message to appear, then
+ hit \<enter\>.
+10. Select the external drive to use for /home/partimag, something like,
+ "1000GB_ntfs_My_Passport".
+11. Select the bot_img directory.
+12. Hit \<enter\> to continue.
+13. "Beginner"
+14. "restoredisk"
+15. Select the image to use. Make sure that it's compatible with this machine.
+16. Choose the hard drive in the machine. It should be the only option.
+17. "y" and "y"
+18. Choose "reboot" after flashing the image to the machine.
+19. Set the hostname of the machine so that it doesn't conflict with any
+ existing machines.
+
+### Capturing a disk image
+
+1. Make sure that the machine is in a clean state: no pre-existing buildslave
+ checkouts, extra software, etc.
+2. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the
+ machine.
+3. Turn on the machine and load the boot menu. For Shuttle machines, press
+ \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and
+ press the \<option\> key at boot. Boot from the USB key. It's typically UEFI
+ and named something like "FlashBlu" or "Kanguru".
+4. At the Clonezilla menu, choose the "to RAM" option.
+5. Choose your preferred language.
+6. "Don't touch keymap".
+7. "Start Clonezilla".
+8. "device-image".
+9. "local_dev"
+10. Unplug the flash drive and plug in the external hard drive labeled, "Disk
+ images." Wait for the "Attached Enclosure device" message to appear, then
+ hit \<enter\>.
+11. Select the external drive to use for /home/partimag, something like,
+ "1000GB_ntfs_My_Passport".
+12. Select the bot_img directory.
+13. "Beginner"
+14. "savedisk"
+15. Choose a name for the disk image. The convention is:
+ `skiabot-<hardware type>-<OS>-<disk image revision #>`
+12. Choose the hard drive in the machine. It should be the only option.
+13. "y"
+14. Choose "reboot" or "shut down" when finished.
« no previous file with comments | « site/dev/testing/buildbot.md ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698