|
|
Created:
7 years, 2 months ago by navabi Modified:
7 years, 2 months ago CC:
chromium-reviews, craigdh+watch_chromium.org, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org, klundberg+watch_chromium.org, frankf+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd option to restart usb on device status check before performing check.
Restart USB has been known to bring back offline devices. Add option to device
status check to restart usb ports before performing the check. This requires
restart_usb to be installed (see
https://code.google.com/p/chromium/issues/detail?id=305769). Prints warning if
the utility has not been installed on the host.
We will add this first to perf bots. Restarting usb has fixed device issues on these bots. If we find it works and does not have unseen consequences, we will deploy on all bots (i.e. make default of --restart-usb to True).
BUG=299891
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=229553
Patch Set 1 #
Total comments: 8
Patch Set 2 : Fix style nits. #Patch Set 3 : Add restart-usb option for bots to pass to device status check. #
Total comments: 15
Patch Set 4 : Make lsusb fail step. #Patch Set 5 : Make failed restart_usb.py fail on error. #
Total comments: 1
Patch Set 6 : Remove default flunk_on_failure=True. #Patch Set 7 : Sleep after restarting usb before getting attached devices. #
Messages
Total messages: 28 (0 generated)
lgtm % style nits. Somebody who knows the clank bb setup should approve https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:17: import bb_utils nit: alphabetize below bb_annotations https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:208: def restart_usb(): style nit: RestartUsb() https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:209: # Expects restart_usb to be installed (BUG=305769) This comment doesn't add value, I'd drop it. https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:231: proc = bb_utils.SpawnCmd(['/usr/bin/restart_usb', bus, dev]) Do you want RunCmd instead of SpawnCmd+communicate here?
https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:17: import bb_utils On 2013/10/10 01:56:11, tonyg wrote: > nit: alphabetize below bb_annotations Done. https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:208: def restart_usb(): On 2013/10/10 01:56:11, tonyg wrote: > style nit: RestartUsb() Done. https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:209: # Expects restart_usb to be installed (BUG=305769) On 2013/10/10 01:56:11, tonyg wrote: > This comment doesn't add value, I'd drop it. Done. https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... build/android/buildbot/bb_device_status_check.py:231: proc = bb_utils.SpawnCmd(['/usr/bin/restart_usb', bus, dev]) On 2013/10/10 01:56:11, tonyg wrote: > Do you want RunCmd instead of SpawnCmd+communicate here? Done.
On 2013/10/10 02:10:20, navabi wrote: > https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... > File build/android/buildbot/bb_device_status_check.py (right): > > https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:17: import bb_utils > On 2013/10/10 01:56:11, tonyg wrote: > > nit: alphabetize below bb_annotations > > Done. > > https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:208: def restart_usb(): > On 2013/10/10 01:56:11, tonyg wrote: > > style nit: RestartUsb() > > Done. > > https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:209: # Expects restart_usb to > be installed (BUG=305769) > On 2013/10/10 01:56:11, tonyg wrote: > > This comment doesn't add value, I'd drop it. > > Done. > > https://codereview.chromium.org/26747004/diff/1/build/android/buildbot/bb_dev... > build/android/buildbot/bb_device_status_check.py:231: proc = > bb_utils.SpawnCmd(['/usr/bin/restart_usb', bus, dev]) > On 2013/10/10 01:56:11, tonyg wrote: > > Do you want RunCmd instead of SpawnCmd+communicate here? > > Done. PTAL.
Hi Armand, some comments below. On first glance, I wasn't sure if you were sending this review to me, since there were so many others listed in the reviewer field, but makes sense now that I see what files are affected. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:208: def RestartUsb(): I'd rather this return error than warning. This check also needs a guard for testing mode. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:216: if lsusb_proc.returncode: Please return error. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:227: for (bus, dev) in reversed(sorted(usb_devices)): Is this possible to restrict to only android devices, and skip hubs, keyboards, etc? https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:231: flunk_on_failure=False) flunk on failure.
https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:208: def RestartUsb(): On 2013/10/11 18:10:08, Isaac wrote: > I'd rather this return error than warning. I disagree. The failure of restart_usb.py should not make device status check fail, as it is not a necessary step, just one that will hopefully make devices more reliable. If we see a bot is not being more reliable, we can check this step for the warning message to see if it is doing the restart usb. > This check also needs a guard for testing mode. Ok. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:227: for (bus, dev) in reversed(sorted(usb_devices)): On 2013/10/11 18:10:08, Isaac wrote: > Is this possible to restrict to only android devices, and skip hubs, keyboards, > etc? It is, but I don't think that's what we want. I think it makes sense to restart the USB hub ports too. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:231: flunk_on_failure=False) On 2013/10/11 18:10:08, Isaac wrote: > flunk on failure. See first comment. I don't think this should flunk on failure.
https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:208: def RestartUsb(): If you're not failing here, what's your plan to ensure that missetup machines don't go undetected? https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:227: for (bus, dev) in reversed(sorted(usb_devices)): On 2013/10/11 18:39:02, navabi wrote: > On 2013/10/11 18:10:08, Isaac wrote: > > Is this possible to restrict to only android devices, and skip hubs, > keyboards, > > etc? > > It is, but I don't think that's what we want. I think it makes sense to restart > the USB hub ports too. I'd be surprised if a USB hub needed to be reset. I'm more inclined to have this be surgical -- have you seen reseting a hub to be beneficial? https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:231: flunk_on_failure=False) On 2013/10/11 18:39:02, navabi wrote: > On 2013/10/11 18:10:08, Isaac wrote: > > flunk on failure. > > See first comment. I don't think this should flunk on failure. Do you expect this binary to occasionally fail?
Had an offline chat with Armand. I would prefer we not add more knobs that can silently fall into a less ideal state. In the case of this CL, we know that restart_usb makes some phones more reliable. Printing a message (or even turning the step yellow) is not a reliable feedback mechanism. The options I proposed: 1) Fail the build if restart_usb is missing or if the command fails. 2) Email the chrome android alerts list on these conditions, and turn step yellow.
On 2013/10/11 19:39:07, Isaac wrote: > Had an offline chat with Armand. I would prefer we not add more knobs that can > silently fall into a less ideal state. In the case of this CL, we know that > restart_usb makes some phones more reliable. Printing a message (or even > turning the step yellow) is not a reliable feedback mechanism. The options I > proposed: > > 1) Fail the build if restart_usb is missing or if the command fails. > 2) Email the chrome android alerts list on these conditions, and turn step > yellow. Thanks Isaac for the summary. Here is how I see it: I strongly feel it should not fail the bots, and would prefer not sending out an email notifications. At this point, I want to see this CL land with as little interruption to the bots as possible, to see if it improves the device reliability, particularly on the Android perf bots. If restart_usb failed the build, we would stop getting (already lacking) perf results if something went wrong with restart_usb (e.g. restart_usb was not installed on the host or installed with the correct permission). Note that in general, the device could still be fine and able to produce useful perf results, but we would be blocking it because of restart_usb. I would need to immediately revert, so we don't stop getting perf results all together, and then reapply a new patch. It just doesn't seem efficient. I even think email notifications is overkill at this point. Instead, I'd prefer to land this as is with a warning message being printed in the device status check step. Then I can monitor if it is successfully restarting the usb ports (i.e. no warning messages) and what bots, if any, it helps. We expect to help downstream perf bots a lot, we are less sure about upstream perf bots. Later, if we find this has no negative side effects, and improves device reliatiblity, I would support making restart_usb stop the build on failure. I'm fine with making lsusb flunk on failure as that is a Linux utility (but I don't want to make restart_usb fail). Please weight in, as I know there are many people interested in fixing the device reliability issues on the perf bots.
It sounds like the lsusb part of the discussion is addressed and will flunk on failure. That's good. For this part, I agree that landing this may end up disrupting more things right away if we return error when we don't yet have the necessary coverage to know how well it will work. I also agree that if we don't return error, failures will go overlooked. I can see how we could enable this without returning error as a first step to see how well this works/would work, and then shortly afterwards (like 2-3 days later) and, if it's working well, change this to return error. What's our plan if it turns out we see it would not be possible to return error later without a disruption to our build results? Would we revert this CL completely?
> What's our plan if it turns out we see it would not be possible to return error > later without a disruption to our build results? Would we revert this CL > completely? That depends on what the disruption is, but I don't think so. I see landing this without error will allow us to verify restart_usb is working as intended without disrupting the bots. After that, I don't foresee returning error to cause disruption. Should it cause disruption (e.g. restart_usb is flaky and sometimes fails even though the bot could otherwise continue), then I think instead of failing we would send an email notification (as ilevy suggested). An advantage of putting this on first without either failure or email notification, is that it will provide us with some sense of how stable restart_usb is allowing us to make a better decision about failing vs. email notification.
https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:208: def RestartUsb(): On 2013/10/11 19:04:10, Isaac wrote: > If you're not failing here, what's your plan to ensure that missetup machines > don't go undetected? The plan is to add error (or email notification) once 1) we find that restart_usb is helpful, 2) we've found that restart_usb works on the select bots we try it on first and 3) we believe all bots are set up correctly with restart_usb For 2) we can rely on the logs. The plan is to test first on upstream and downstream perf bots first. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:208: def RestartUsb(): > This check also needs a guard for testing mode. bb_device_status_check does not run in testing mode. Only bb_run_bot has testing mode, and in that mode it will say that it is going to call bb_device_status_check, but does not call it. If we want to call bb_device_status_check in testing mode, then the option needs to be added and all the steps need to have a testing mode condition. This is probably best as another change. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:216: if lsusb_proc.returncode: On 2013/10/11 18:10:08, Isaac wrote: > Please return error. Done. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:227: for (bus, dev) in reversed(sorted(usb_devices)): > I'd be surprised if a USB hub needed to be reset. I'm more inclined to have > this be surgical -- have you seen reseting a hub to be beneficial? The script has always restarted all usb ports (including hubs), and has been shown to fix device issues. https://codereview.chromium.org/26747004/diff/9001/build/android/buildbot/bb_... build/android/buildbot/bb_device_status_check.py:231: flunk_on_failure=False) > Do you expect this binary to occasionally fail? Let's land it and see. We don't know yet as currently it is only run manually by a few people that know about it.
PTAL. While I would rather not error on restart_usb.py failure, in the interest of not letting this CL get blocked and making progress, I made the change to exit on error. I will first land this on a few bots (perf bots), and make sure it lands cleanly and make the appropriate fixes, so that if/when we deploy on all bots it won't break.
lgtm, looking forward to see the results, thanks!!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/26747004/22001
On 2013/10/17 04:07:57, I haz the power (commit-bot) wrote: > CQ is trying da patch. Follow status at > https://chromium-status.appspot.com/cq/navabi%40google.com/26747004/22001 Need LGTM from OWNER.
Retried try job too often on chromium_presubmit for step(s) presubmit http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=chromium_p...
I think you're getting presubmit failures too. but lgtm https://codereview.chromium.org/26747004/diff/22001/build/android/buildbot/bb... File build/android/buildbot/bb_device_status_check.py (right): https://codereview.chromium.org/26747004/diff/22001/build/android/buildbot/bb... build/android/buildbot/bb_device_status_check.py:232: flunk_on_failure=True) this is the default, line not needed.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/26747004/33001
On 2013/10/17 22:08:20, I haz the power (commit-bot) wrote: > CQ is trying da patch. Follow status at > https://chromium-status.appspot.com/cq/navabi%40google.com/26747004/33001 Yeah. I do have presubmit failures, though I'm not sure why. I ahve no presubmit errors locally. On the bot, I get the following: Error running the bot script with id="webrtc-tests" No adb found in $PATH, fallback to checked in binary. Traceback (most recent call last): File "build/android/buildbot/bb_device_steps.py", line 601, in <module> sys.exit(main(sys.argv)) File "build/android/buildbot/bb_device_steps.py", line 597, in main MainTestWrapper(options) File "build/android/buildbot/bb_device_steps.py", line 542, in MainTestWrapper LogcatDump(options) File "build/android/buildbot/bb_device_steps.py", line 500, in LogcatDump with open(logcat_file, 'w') as f: IOError: [Errno 2] No such file or directory: '/mnt/scratch0/b_used/build/slave/linux/build/src/out/Debug/full_log' Any idea why I am getting that?
On 2013/10/17 22:21:07, navabi wrote: > On 2013/10/17 22:08:20, I haz the power (commit-bot) wrote: > > CQ is trying da patch. Follow status at > > https://chromium-status.appspot.com/cq/navabi%2540google.com/26747004/33001 > > Yeah. I do have presubmit failures, though I'm not sure why. I ahve no presubmit > errors locally. On the bot, I get the following: > > Error running the bot script with id="webrtc-tests" No adb found in $PATH, > fallback to checked in binary. > Traceback (most recent call last): > File "build/android/buildbot/bb_device_steps.py", line 601, in <module> > sys.exit(main(sys.argv)) > File "build/android/buildbot/bb_device_steps.py", line 597, in main > MainTestWrapper(options) > File "build/android/buildbot/bb_device_steps.py", line 542, in MainTestWrapper > LogcatDump(options) > File "build/android/buildbot/bb_device_steps.py", line 500, in LogcatDump > with open(logcat_file, 'w') as f: > IOError: [Errno 2] No such file or directory: > '/mnt/scratch0/b_used/build/slave/linux/build/src/out/Debug/full_log' > > Any idea why I am getting that? Try removing your out/Debug/full_log file and run presubmit again?
On 2013/10/17 22:59:54, Isaac wrote: > > > > I have no presubmit errors locally. On the bot, I get the following: > > Try removing your out/Debug/full_log file and run presubmit again? Remove my out/Debug/full_log? Not sure how that would help as I have no presubmit errors locally (see above). On the bot there is no /mnt/scratch0/b_used/build/slave/linux/build/src/out/ dir. So when it tries to create the out/Debug/full_log file it fails. Not sure why that directory is not there. Is it created by run hooks? Something should create it, as it is not checked out, but out/Debug should exist.
On 2013/10/17 23:13:04, navabi wrote: > On 2013/10/17 22:59:54, Isaac wrote: > > > > > > I have no presubmit errors locally. On the bot, I get the following: > > > > Try removing your out/Debug/full_log file and run presubmit again? > > Remove my out/Debug/full_log? Not sure how that would help as I have no > presubmit errors locally (see above). > On the bot there is no /mnt/scratch0/b_used/build/slave/linux/build/src/out/ > dir. > So when it tries to create the out/Debug/full_log file it fails. Not sure why > that directory is not there. > > Is it created by run hooks? Something should create it, as it is not checked > out, but out/Debug should exist. Why is the logger running during this test? it should not be.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/26747004/33001
Retried try job too often on chromium_presubmit for step(s) presubmit http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=chromium_p...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/navabi@google.com/26747004/33001
Message was sent while issue was closed.
Change committed as 229553 |