|
|
Created:
7 years, 2 months ago by navabi Modified:
7 years, 1 month ago CC:
chromium-reviews, craigdh+watch_chromium.org, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org, klundberg+watch_chromium.org, frankf+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd sleep after restarting usb ports and before checking for devices.
Need to wait a couple seconds after restarting usb before doing 'adb devices',
or the devices do not show up.
E.g. https://chromegw.corp.google.com/i/clank/builders/s3-sharded-official-perf-
noflash-clankium/builds/3572/steps/device_status_check/logs/stdio
BUG=299891
TBR=ilevy@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=231310
Patch Set 1 #
Total comments: 7
Patch Set 2 : Poll for missing devices. #Patch Set 3 : Rebase. #Patch Set 4 : Rebase (KillAdb before RestartUsb). #Messages
Total messages: 23 (0 generated)
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) I think we want adb wait-for-device instead of time.sleep.
lgtm but wait for tonyg https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) On 2013/10/24 01:10:59, tonyg wrote: > I think we want adb wait-for-device instead of time.sleep. I think this actually does require sleep in worst case. wait-for-device will return after a single device comes back, I believe. This is kind of a hack (more efficient would be to poll adb devices until all devices were back) but it's OK w/ me.
On 2013/10/24 01:20:36, Isaac (use chromium) wrote: > lgtm but wait for tonyg > > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > File build/android/buildbot/bb_device_status_check.py (right): > > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) > On 2013/10/24 01:10:59, tonyg wrote: > > I think we want adb wait-for-device instead of time.sleep. > > I think this actually does require sleep in worst case. wait-for-device will > return after a single device comes back, I believe. This is kind of a hack > (more efficient would be to poll adb devices until all devices were back) but > it's OK w/ me. I always forget to consider the multi-device world. Could we poll adb devices instead then? I'm really worried that the sleep is just a ticking time bomb of flakiness.
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) wait-for-device works for particular devices (using -s). Probably the cleanest option is to add a method to android_commands.py and catch the timeout exception here. On 2013/10/24 01:20:37, Isaac (use chromium) wrote: > On 2013/10/24 01:10:59, tonyg wrote: > > I think we want adb wait-for-device instead of time.sleep. > > I think this actually does require sleep in worst case. wait-for-device will > return after a single device comes back, I believe. This is kind of a hack > (more efficient would be to poll adb devices until all devices were back) but > it's OK w/ me.
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) I talked to Frank about this, and I think sleep is the best solution here. As Isaac pointed out there are multiple devices to wait for. Some devices may come back and some may not, so we don't want to poll on all 'expected' devices. Say one device on the bot has turned off. Restarting USB doesn't help in that case and we will poll forever waiting for it to come back. We can sleep a little bit longer to make sure the device has a chance to come back. Other than that, I'm not sure what a polling solution would look like.
On 2013/10/24 02:42:34, navabi wrote: > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > File build/android/buildbot/bb_device_status_check.py (right): > > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) > I talked to Frank about this, and I think sleep is the best solution here. As > Isaac pointed out there are multiple devices to wait for. > > Some devices may come back and some may not, so we don't want to poll on all > 'expected' devices. Say one device on the bot has turned off. Restarting USB > doesn't help in that case and we will poll forever waiting for it to come back. > > We can sleep a little bit longer to make sure the device has a chance to come > back. Other than that, I'm not sure what a polling solution would look like. What if we poll on expected devices with a timeout? Say 5s.
> > What if we poll on expected devices with a timeout? Say 5s. We could do that, but I'm not sure even the little bit of extra logic is worth it. I.e. Is polling on expected devices with a 5s timeout preferred to just sleeping for 5s.
On 2013/10/24 04:09:05, navabi wrote: > > > > What if we poll on expected devices with a timeout? Say 5s. > > We could do that, but I'm not sure even the little bit of extra logic is worth > it. > I.e. Is polling on expected devices with a 5s timeout preferred to just sleeping > for 5s. The only tradeoff there is cycle time, not reliability, so I guess if you raise the sleep then this lgtm. I still feel a little bit dirty saying that because sleeping is practically never the right answer.
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) On 2013/10/24 02:42:35, navabi wrote: > I talked to Frank about this, and I think sleep is the best solution here. As > Isaac pointed out there are multiple devices to wait for. > > Some devices may come back and some may not, so we don't want to poll on all > 'expected' devices. Say one device on the bot has turned off. Restarting USB > doesn't help in that case and we will poll forever waiting for it to come back. > > We can sleep a little bit longer to make sure the device has a chance to come > back. Other than that, I'm not sure what a polling solution would look like. FYI, one issue I saw was that there was a lingering adb process that would not recover after the RestartUSB()... I have this ready for your comments :) https://codereview.chromium.org/39833003/
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) On 2013/10/24 13:01:14, bulach wrote: > On 2013/10/24 02:42:35, navabi wrote: > > I talked to Frank about this, and I think sleep is the best solution here. As > > Isaac pointed out there are multiple devices to wait for. > > > > Some devices may come back and some may not, so we don't want to poll on all > > 'expected' devices. Say one device on the bot has turned off. Restarting USB > > doesn't help in that case and we will poll forever waiting for it to come > back. > > > > We can sleep a little bit longer to make sure the device has a chance to come > > back. Other than that, I'm not sure what a polling solution would look like. > > > > FYI, one issue I saw was that there was a lingering adb process that would not > recover after the RestartUSB()... > I have this ready for your comments :) > https://codereview.chromium.org/39833003/ btw, dominik just tested and apparently 1s is not enough but 2 is.. :) so perhaps we could poll in 1s interval up to say, 5s, and stop as soon as _anything_ appears in GetAttachedDevices() ?
https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) I don't think we want to stop as soon as anything appears. That would mean we would not wait for devices to come back if the bot already had a device attached. I'm fine polling for the expected devices, and waiting up to 5s. I'm a little worried that we will end up waiting 5s most of the time because there will be at least 1 missing device, but as long as all of them come back we will save about 3s per build vs. just sleeping 5s.
On 2013/10/24 18:56:23, navabi wrote: > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > File build/android/buildbot/bb_device_status_check.py (right): > > https://codereview.chromium.org/38483002/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:264: time.sleep(2) > I don't think we want to stop as soon as anything appears. That would mean we > would not wait for devices to come back if the bot already had a device > attached. > > I'm fine polling for the expected devices, and waiting up to 5s. I'm a little > worried that we will end up waiting 5s most of the time because there will be at > least 1 missing device, but as long as all of them come back we will save about > 3s per build vs. just sleeping 5s. PTAL.
lgtm, I'm CQing on your behalf to get the downstream bots going.. thanks!!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/38483002/100001
Retried try job too often on chromium_presubmit for step(s) presubmit http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=chromium_p...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/38483002/100001
Failed to apply patch for build/android/buildbot/bb_device_status_check.py: While running patch -p1 --forward --force --no-backup-if-mismatch; patching file build/android/buildbot/bb_device_status_check.py Hunk #1 FAILED at 8. Hunk #2 succeeded at 111 (offset 2 lines). Hunk #3 succeeded at 147 (offset 2 lines). Hunk #4 succeeded at 156 (offset 2 lines). Hunk #5 FAILED at 266. 2 out of 5 hunks FAILED -- saving rejects to file build/android/buildbot/bb_device_status_check.py.rej Patch: build/android/buildbot/bb_device_status_check.py Index: build/android/buildbot/bb_device_status_check.py diff --git a/build/android/buildbot/bb_device_status_check.py b/build/android/buildbot/bb_device_status_check.py index e279b5095c32ee9b33f1f17d0927139d602f1cb9..bda95c7312cfe5258b8ad260f2d6a37f45ef1e1c 100755 --- a/build/android/buildbot/bb_device_status_check.py +++ b/build/android/buildbot/bb_device_status_check.py @@ -8,10 +8,11 @@ import logging import optparse import os +import re import smtplib import subprocess import sys -import re +import time import urllib import bb_annotations @@ -109,6 +110,26 @@ def DeviceInfo(serial, options): return device_type, device_build, battery_level, full_report, errors, True +def GetLastDevices(out_dir): + """Returns a list of devices that have been seen on the bot. + + Args: + options: out_dir parameter of options argument is used as the base + directory to load and update the cache file. + + Returns: List of device serial numbers that were on the bot. + """ + devices_path = os.path.join(out_dir, '.last_devices') + devices = [] + try: + with open(devices_path) as f: + devices = f.read().splitlines() + except IOError: + # Ignore error, file might not exist + pass + return devices + + def CheckForMissingDevices(options, adb_online_devs): """Uses file of previous online devices to detect broken phones. @@ -125,17 +146,6 @@ def CheckForMissingDevices(options, adb_online_devs): out_dir = os.path.abspath(options.out_dir) - def ReadDeviceList(file_name): - devices_path = os.path.join(out_dir, file_name) - devices = [] - try: - with open(devices_path) as f: - devices = f.read().splitlines() - except IOError: - # Ignore error, file might not exist - pass - return devices - def WriteDeviceList(file_name, device_list): path = os.path.join(out_dir, file_name) if not os.path.exists(out_dir): @@ -145,7 +155,7 @@ def CheckForMissingDevices(options, adb_online_devs): f.write('\n'.join(set(device_list))) last_devices_path = os.path.join(out_dir, '.last_devices') - last_devices = ReadDeviceList('.last_devices') + last_devices = GetLastDevices(out_dir) missing_devs = list(set(last_devices) - set(adb_online_devs)) all_known_devices = list(set(adb_online_devs) | set(last_devices)) @@ -257,9 +267,19 @@ def main(): parser.error('Unknown options %s' % args) if options.restart_usb: - rc = RestartUsb() - if rc: - return 1 + expected_devices = GetLastDevices(os.path.abspath(options.out_dir)) + devices = android_commands.GetAttachedDevices() + # Only restart usb if devices are missing + if set(expected_devices) != set(devices): + if RestartUsb(): + return 1 + retries = 5 + while retries: + time.sleep(1) + devices = android_commands.GetAttachedDevices() + if set(expected_devices) == set(devices): + break + retries -= 1 devices = android_commands.GetAttachedDevices() # TODO(navabi): Test to make sure this fails and then fix call
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/38483002/250001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/38483002/250001
Step "update" is always a major failure. Look at the try server FAQ for more details. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win7_aura&...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/38483002/250001
Message was sent while issue was closed.
Change committed as 231310 |