Index: site/dev/testing/skialab.md |
diff --git a/site/dev/testing/skialab.md b/site/dev/testing/skialab.md |
new file mode 100644 |
index 0000000000000000000000000000000000000000..cdb2449c4fad676c790adbbe36b17424e7f02c8b |
--- /dev/null |
+++ b/site/dev/testing/skialab.md |
@@ -0,0 +1,217 @@ |
+SkiaLab |
+======= |
+ |
+Overview |
+-------- |
+ |
+Skia's buildbots are hosted in three places: |
+ |
+* Google Compute Engine. This is the preferred location for bots which don't |
+ need to run on physical hardware, ie. anything that doesn't require a GPU, |
+ stable performance numbers, or a specific hardware configuration. Most of our |
+ compile bots live here, along with some non-GPU test bots on Linux and |
+ Windows. |
+* Chrome Golo. This is the preferred location for bots which require specific |
+ hardware or OS configurations that are not supported by GCE. We have several |
+ Mac, Linux, and Windows bots in the Golo. |
+* The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo |
+ lives here. This includes newer or uncommon GPUs and all Android, ChromeOS, |
+ and iOS devices. |
+ |
+This page covers the local SkiaLab in Chapel Hill. |
+ |
+ |
+Layout |
+------ |
+ |
+The SkiaLab consists of three wireframe racks which hold machines connected to |
+two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the |
+primary mode of access to the lab machines. In general, the machines are on the |
+same rack as the KVM switch used to access them. The switch nearest the door |
+(labeled "DOOR"), is connected to machines on its own rack as well as a smaller |
+rack closer to the door. |
+ |
+Each machine is labeled with its hostname and the number or letter used to |
+access it on the KVM switch. Android devices are located on the rack nearest |
+the interior of the office (the KVM switch is labeled "OFFICE"). They are |
+labeled with their serial number and the name of the buildslave they are |
+associated with. Each device connects to a host machine, either directly or |
+by way of a powered USB hub. |
+ |
+**Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it |
+is disruptive to the running bots and can leave the machines in a dirty state. |
+If you must make changes, such as cloning a copy of Skia to run tests and debug |
+failures, be sure to clean up after yourself. If a permanent change needs to be |
+made on the machine (such as a driver update), please contact an infra team |
+member.** |
+ |
+ |
+Common Tasks |
+------------ |
+ |
+### Locating the host machine for a failing bot |
+ |
+Sometimes failures can only be reproduced on a particular hardware |
+configuration. In these cases, it is sometimes necessary to log into the host |
+machine where a failing bot is running in order to debug the failure. |
+ |
+From the [Status](https://status.skia.org/) page: |
+ |
+1. Click on the box associated with a failed build. |
+2. A popup will appear with some information about the build, including the |
+ builder and buildslave. Click the "Lookup" link next to "Host machine". This |
+ will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page, |
+ which contains information about the machines in the lab, pre-filtered to |
+ select the machine which runs the buildslave in question. |
+3. The information box will display the hostname of the machine as well as the |
+ KVM switch and number used to access the machine, if the machine is in the |
+ SkiaLab. |
+4. Walk over to the lab. While standing at the KVM switch indicated by the host |
+ information page, double tab <ctrl> and then press the number or letter from |
+ the information page. It may be necessary to move or click the mouse to wake |
+ the machine up. |
+5. Log in to the machine if necessary. The password is stored in |
+ [Valentine](https://valentine/). |
+ |
+### Rebooting a problematic Android device |
+ |
+Follow the same process as above, with some slight changes: |
+ |
+1. On the [Status](https://status.skia.org/) page, click the box for the failed |
+ build. |
+2. Click the "Lookup" link for the host machine. Remember the name of the |
+ buildslave which ran the build. |
+3. The hosts page will display the information used to access the host machine |
+ for the device as well as the serial number for the device next to the name |
+ of its buildsave. |
+4. Walk over to the lab and find the Android device with the serial number from |
+ the hosts page. Hold the power and volume-up buttons until the device |
+ reboots. |
+5. Access the host machine for the device, per the above instructions. Use the |
+ `which_devices.py` script to verify that the device has re-attached. From |
+ the home directory: |
+ |
+ $ python buildbot/scripts/which_devices.py |
+ |
+ |
+Maintenance Tasks |
+----------------- |
+ |
+### Bringing up a new buildbot host machine |
+ |
+This assumes that we're just adding a host machine for a new buildbot slave, |
+and doesn't cover how to make changes to the buildbot code to change the |
+behavior of the builder itself. |
+ |
+1. Obtain the machine itself and place it on the racks in the lab. Connect |
+ power, ethernet, and KVM cables. |
+2. If we already have a disk image appropriate for this machine, follow the |
+ instructions for flashing a disk image to a machine below. Otherwise, follow |
+ the instructions for bringing up a new machine from scratch. |
+3. Set the hostname for the machine. |
+4. Add the new slave to the slaves.cfg file on the appropriate master, eg. |
+ https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.client.skia/slaves.cfg, |
+ and upload the change for code review. |
+5. Add an entry for the new host machine to the slave_hosts_cfg.py file in the |
+ Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/slave_hosts_cfg.py, |
+ and upload it for review. |
+6. Commit the change to add the slave to the master. Once it lands, commit the |
+ slave_hosts_cfg.py change immediately afterward. |
+7. Restart the build master. Either ask borenet@ to do this or file a |
+ [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restart%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%20restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20when%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it. |
+8. Reboot the machine and monitor the build master to ensure that it connects. |
+ |
+ |
+### Bringing up a new Android bot |
+ |
+1. Locate or add a host machine. We generally want to keep the number of |
+ devices attached to each host below 5 or so. If a new host machine is |
+ required, follow the above instructions for bringing up a new buildbot |
+ host machine, with the exception that the slave corresponds to the Android |
+ device, not the host machine itself. |
+2. Ensure that the buildslave is not yet running: |
+ |
+ $ killall python |
+ |
+3. Connect the device to the host machine, either through a powered USB hub or |
+ directly to the machine. |
+4. Make sure that the device is in developer mode and that USB debugging is |
+ enabled. |
+5. Authorize the device for USB debugging on the host machine by checking the |
+ "always allow" box on dialog box which appears on the Android device after |
+ plugging it into the host. |
+6. Ensure that the device appears as "connected" when you run the |
+ `which_devices.py` script: |
+ |
+ $ python buildbot/scripts/which_devices.py |
+ |
+7. Reboot the machine to start the buildslave. |
+ |
+ |
+### Bringing up a new machine from scratch |
+ |
+TODO(borenet): Migrate from Google Docs. |
+ |
+OS-specific instructions are available in a |
+[Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAICLkB7ipDcxRV8/edit) |
+ |
+ |
+### Flashing a disk image to a machine |
+ |
+1. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the |
+ machine. |
+2. Turn on the machine and load the boot menu. For Shuttle machines, press |
+ \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and |
+ press the \<option\> key at boot. Boot from the USB key. It's typically UEFI |
+ and named something like "FlashBlu" or "Kanguru". |
+3. At the Clonezilla menu, choose the "to RAM" option. |
+4. Choose your preferred language. |
+5. "Don't touch keymap". |
+6. "Start Clonezilla". |
+7. "device-image". |
+8. "local_dev". |
+9. Unplug the flash drive and plug in the external hard drive labeled, "Disk |
+ images." Wait for the "Attached Enclosure device" message to appear, then |
+ hit \<enter\>. |
+10. Select the external drive to use for /home/partimag, something like, |
+ "1000GB_ntfs_My_Passport". |
+11. Select the bot_img directory. |
+12. Hit \<enter\> to continue. |
+13. "Beginner" |
+14. "restoredisk" |
+15. Select the image to use. Make sure that it's compatible with this machine. |
+16. Choose the hard drive in the machine. It should be the only option. |
+17. "y" and "y" |
+18. Choose "reboot" after flashing the image to the machine. |
+19. Set the hostname of the machine so that it doesn't conflict with any |
+ existing machines. |
+ |
+### Capturing a disk image |
+ |
+1. Make sure that the machine is in a clean state: no pre-existing buildslave |
+ checkouts, extra software, etc. |
+2. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the |
+ machine. |
+3. Turn on the machine and load the boot menu. For Shuttle machines, press |
+ \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and |
+ press the \<option\> key at boot. Boot from the USB key. It's typically UEFI |
+ and named something like "FlashBlu" or "Kanguru". |
+4. At the Clonezilla menu, choose the "to RAM" option. |
+5. Choose your preferred language. |
+6. "Don't touch keymap". |
+7. "Start Clonezilla". |
+8. "device-image". |
+9. "local_dev" |
+10. Unplug the flash drive and plug in the external hard drive labeled, "Disk |
+ images." Wait for the "Attached Enclosure device" message to appear, then |
+ hit \<enter\>. |
+11. Select the external drive to use for /home/partimag, something like, |
+ "1000GB_ntfs_My_Passport". |
+12. Select the bot_img directory. |
+13. "Beginner" |
+14. "savedisk" |
+15. Choose a name for the disk image. The convention is: |
+ `skiabot-<hardware type>-<OS>-<disk image revision #>` |
+12. Choose the hard drive in the machine. It should be the only option. |
+13. "y" |
+14. Choose "reboot" or "shut down" when finished. |