OLD | NEW |
(Empty) | |
| 1 SkiaLab |
| 2 ======= |
| 3 |
| 4 Overview |
| 5 -------- |
| 6 |
| 7 Skia's buildbots are hosted in three places: |
| 8 |
| 9 * Google Compute Engine. This is the preferred location for bots which don't |
| 10 need to run on physical hardware, ie. anything that doesn't require a GPU, |
| 11 stable performance numbers, or a specific hardware configuration. Most of our |
| 12 compile bots live here, along with some non-GPU test bots on Linux and |
| 13 Windows. |
| 14 * Chrome Golo. This is the preferred location for bots which require specific |
| 15 hardware or OS configurations that are not supported by GCE. We have several |
| 16 Mac, Linux, and Windows bots in the Golo. |
| 17 * The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo |
| 18 lives here. This includes newer or uncommon GPUs and all Android, ChromeOS, |
| 19 and iOS devices. |
| 20 |
| 21 This page covers the local SkiaLab in Chapel Hill. |
| 22 |
| 23 |
| 24 Layout |
| 25 ------ |
| 26 |
| 27 The SkiaLab consists of three wireframe racks which hold machines connected to |
| 28 two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the |
| 29 primary mode of access to the lab machines. In general, the machines are on the |
| 30 same rack as the KVM switch used to access them. The switch nearest the door |
| 31 (labeled "DOOR"), is connected to machines on its own rack as well as a smaller |
| 32 rack closer to the door. |
| 33 |
| 34 Each machine is labeled with its hostname and the number or letter used to |
| 35 access it on the KVM switch. Android devices are located on the rack nearest |
| 36 the interior of the office (the KVM switch is labeled "OFFICE"). They are |
| 37 labeled with their serial number and the name of the buildslave they are |
| 38 associated with. Each device connects to a host machine, either directly or |
| 39 by way of a powered USB hub. |
| 40 |
| 41 **Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it |
| 42 is disruptive to the running bots and can leave the machines in a dirty state. |
| 43 If you must make changes, such as cloning a copy of Skia to run tests and debug |
| 44 failures, be sure to clean up after yourself. If a permanent change needs to be |
| 45 made on the machine (such as a driver update), please contact an infra team |
| 46 member.** |
| 47 |
| 48 |
| 49 Common Tasks |
| 50 ------------ |
| 51 |
| 52 ### Locating the host machine for a failing bot |
| 53 |
| 54 Sometimes failures can only be reproduced on a particular hardware |
| 55 configuration. In these cases, it is sometimes necessary to log into the host |
| 56 machine where a failing bot is running in order to debug the failure. |
| 57 |
| 58 From the [Status](https://status.skia.org/) page: |
| 59 |
| 60 1. Click on the box associated with a failed build. |
| 61 2. A popup will appear with some information about the build, including the |
| 62 builder and buildslave. Click the "Lookup" link next to "Host machine". This |
| 63 will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page, |
| 64 which contains information about the machines in the lab, pre-filtered to |
| 65 select the machine which runs the buildslave in question. |
| 66 3. The information box will display the hostname of the machine as well as the |
| 67 KVM switch and number used to access the machine, if the machine is in the |
| 68 SkiaLab. |
| 69 4. Walk over to the lab. While standing at the KVM switch indicated by the host |
| 70 information page, double tab <ctrl> and then press the number or letter from |
| 71 the information page. It may be necessary to move or click the mouse to wake |
| 72 the machine up. |
| 73 5. Log in to the machine if necessary. The password is stored in |
| 74 [Valentine](https://valentine/). |
| 75 |
| 76 ### Rebooting a problematic Android device |
| 77 |
| 78 Follow the same process as above, with some slight changes: |
| 79 |
| 80 1. On the [Status](https://status.skia.org/) page, click the box for the failed |
| 81 build. |
| 82 2. Click the "Lookup" link for the host machine. Remember the name of the |
| 83 buildslave which ran the build. |
| 84 3. The hosts page will display the information used to access the host machine |
| 85 for the device as well as the serial number for the device next to the name |
| 86 of its buildsave. |
| 87 4. Walk over to the lab and find the Android device with the serial number from |
| 88 the hosts page. Hold the power and volume-up buttons until the device |
| 89 reboots. |
| 90 5. Access the host machine for the device, per the above instructions. Use the |
| 91 `which_devices.py` script to verify that the device has re-attached. From |
| 92 the home directory: |
| 93 |
| 94 $ python buildbot/scripts/which_devices.py |
| 95 |
| 96 |
| 97 Maintenance Tasks |
| 98 ----------------- |
| 99 |
| 100 ### Bringing up a new buildbot host machine |
| 101 |
| 102 This assumes that we're just adding a host machine for a new buildbot slave, |
| 103 and doesn't cover how to make changes to the buildbot code to change the |
| 104 behavior of the builder itself. |
| 105 |
| 106 1. Obtain the machine itself and place it on the racks in the lab. Connect |
| 107 power, ethernet, and KVM cables. |
| 108 2. If we already have a disk image appropriate for this machine, follow the |
| 109 instructions for flashing a disk image to a machine below. Otherwise, follow |
| 110 the instructions for bringing up a new machine from scratch. |
| 111 3. Set the hostname for the machine. |
| 112 4. Add the new slave to the slaves.cfg file on the appropriate master, eg. |
| 113 https://chromium.googlesource.com/chromium/tools/build/+/master/masters/maste
r.client.skia/slaves.cfg, |
| 114 and upload the change for code review. |
| 115 5. Add an entry for the new host machine to the slave_hosts_cfg.py file in the |
| 116 Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/
slave_hosts_cfg.py, |
| 117 and upload it for review. |
| 118 6. Commit the change to add the slave to the master. Once it lands, commit the |
| 119 slave_hosts_cfg.py change immediately afterward. |
| 120 7. Restart the build master. Either ask borenet@ to do this or file a |
| 121 [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Inf
rastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restar
t%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%2
0restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20
otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20wh
en%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it. |
| 122 8. Reboot the machine and monitor the build master to ensure that it connects. |
| 123 |
| 124 |
| 125 ### Bringing up a new Android bot |
| 126 |
| 127 1. Locate or add a host machine. We generally want to keep the number of |
| 128 devices attached to each host below 5 or so. If a new host machine is |
| 129 required, follow the above instructions for bringing up a new buildbot |
| 130 host machine, with the exception that the slave corresponds to the Android |
| 131 device, not the host machine itself. |
| 132 2. Ensure that the buildslave is not yet running: |
| 133 |
| 134 $ killall python |
| 135 |
| 136 3. Connect the device to the host machine, either through a powered USB hub or |
| 137 directly to the machine. |
| 138 4. Make sure that the device is in developer mode and that USB debugging is |
| 139 enabled. |
| 140 5. Authorize the device for USB debugging on the host machine by checking the |
| 141 "always allow" box on dialog box which appears on the Android device after |
| 142 plugging it into the host. |
| 143 6. Ensure that the device appears as "connected" when you run the |
| 144 `which_devices.py` script: |
| 145 |
| 146 $ python buildbot/scripts/which_devices.py |
| 147 |
| 148 7. Reboot the machine to start the buildslave. |
| 149 |
| 150 |
| 151 ### Bringing up a new machine from scratch |
| 152 |
| 153 TODO(borenet): Migrate from Google Docs. |
| 154 |
| 155 OS-specific instructions are available in a |
| 156 [Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAIC
LkB7ipDcxRV8/edit) |
| 157 |
| 158 |
| 159 ### Flashing a disk image to a machine |
| 160 |
| 161 1. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the |
| 162 machine. |
| 163 2. Turn on the machine and load the boot menu. For Shuttle machines, press |
| 164 \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard an
d |
| 165 press the \<option\> key at boot. Boot from the USB key. It's typically UEFI |
| 166 and named something like "FlashBlu" or "Kanguru". |
| 167 3. At the Clonezilla menu, choose the "to RAM" option. |
| 168 4. Choose your preferred language. |
| 169 5. "Don't touch keymap". |
| 170 6. "Start Clonezilla". |
| 171 7. "device-image". |
| 172 8. "local_dev". |
| 173 9. Unplug the flash drive and plug in the external hard drive labeled, "Disk |
| 174 images." Wait for the "Attached Enclosure device" message to appear, then |
| 175 hit \<enter\>. |
| 176 10. Select the external drive to use for /home/partimag, something like, |
| 177 "1000GB_ntfs_My_Passport". |
| 178 11. Select the bot_img directory. |
| 179 12. Hit \<enter\> to continue. |
| 180 13. "Beginner" |
| 181 14. "restoredisk" |
| 182 15. Select the image to use. Make sure that it's compatible with this machine. |
| 183 16. Choose the hard drive in the machine. It should be the only option. |
| 184 17. "y" and "y" |
| 185 18. Choose "reboot" after flashing the image to the machine. |
| 186 19. Set the hostname of the machine so that it doesn't conflict with any |
| 187 existing machines. |
| 188 |
| 189 ### Capturing a disk image |
| 190 |
| 191 1. Make sure that the machine is in a clean state: no pre-existing buildslave |
| 192 checkouts, extra software, etc. |
| 193 2. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the |
| 194 machine. |
| 195 3. Turn on the machine and load the boot menu. For Shuttle machines, press |
| 196 \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard an
d |
| 197 press the \<option\> key at boot. Boot from the USB key. It's typically UEFI |
| 198 and named something like "FlashBlu" or "Kanguru". |
| 199 4. At the Clonezilla menu, choose the "to RAM" option. |
| 200 5. Choose your preferred language. |
| 201 6. "Don't touch keymap". |
| 202 7. "Start Clonezilla". |
| 203 8. "device-image". |
| 204 9. "local_dev" |
| 205 10. Unplug the flash drive and plug in the external hard drive labeled, "Disk |
| 206 images." Wait for the "Attached Enclosure device" message to appear, then |
| 207 hit \<enter\>. |
| 208 11. Select the external drive to use for /home/partimag, something like, |
| 209 "1000GB_ntfs_My_Passport". |
| 210 12. Select the bot_img directory. |
| 211 13. "Beginner" |
| 212 14. "savedisk" |
| 213 15. Choose a name for the disk image. The convention is: |
| 214 `skiabot-<hardware type>-<OS>-<disk image revision #>` |
| 215 12. Choose the hard drive in the machine. It should be the only option. |
| 216 13. "y" |
| 217 14. Choose "reboot" or "shut down" when finished. |
OLD | NEW |