|
|
Created:
7 years, 1 month ago by Primiano Tucci (use gerrit) Modified:
7 years ago CC:
chromium-reviews, craigdh+watch_chromium.org, chrome-speed-team+watch_google.com, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org, klundberg+watch_chromium.org, telemetry+watch_chromium.org, frankf+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Description[Telemetry] Refactor common Android test harness functions.
Refactoring common cleanup code required by both
android_browser_finder.py and perf/setup.py and moving it into
pyblib/utils/test_environment.py.
Also, this change avoids killing twice the adb server when running
in a bot environment, where the adb restart is already invoked by
the bot steps.
BUG=268450
R=bulach@chromium.org,tonyg@chromium.org
NOTRY=true
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=237667
Patch Set 1 : #
Total comments: 6
Patch Set 2 : Always adb root + docstring #Patch Set 3 : Rebase + deflake RestartAdbdOnDevice #
Total comments: 14
Patch Set 4 : Rename + move wait-for-device + rebase #
Messages
Total messages: 20 (0 generated)
lgtm, but please wait for marcus https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils... build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses(restart_adb_as_root): Please add a docstring https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py (right): https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py:214: test_environment.CleanupPendingProcesses(restart_adb_as_root=False) What happens if adb can't be put into root mode? If the failure is non-fatal, I'd like to try to upgrade to root here.
lgtm, thanks! I'm looking forward to all our "workarounds" to be happening at this single place across all our trees, yay! :) https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py (right): https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py:209: if (not finder_options.keep_test_server_ports and note to myself: this option is no longer needed, I'll cleanup from everywhere.. https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py:214: test_environment.CleanupPendingProcesses(restart_adb_as_root=False) On 2013/11/15 05:45:10, tonyg wrote: > What happens if adb can't be put into root mode? If the failure is non-fatal, > I'd like to try to upgrade to root here. +1!
Thanks for the comments, see the reply inline. Unfortunately I'm OOO in MTV right now and can't test the change (reverse adb port forwarding doesn't really like the adb server to get killed). I'll wait to be back before ticking the commit box...unless someone is willing to try the change locally in the meanwhile! https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils... build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses(restart_adb_as_root): On 2013/11/15 05:45:10, tonyg wrote: > Please add a docstring Done. https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py (right): https://codereview.chromium.org/62953024/diff/50001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py:214: test_environment.CleanupPendingProcesses(restart_adb_as_root=False) On 2013/11/15 05:45:10, tonyg wrote: > What happens if adb can't be put into root mode? If the failure is non-fatal, > I'd like to try to upgrade to root here. From adb_interface.py it look like that, unless SetAbortOnError is called on run_command.py (which is not the case), that should just log an error message and continue.
Question: I got a bit lost after last recent changes. Do we still want this CL? If yes, I just rebased it, PTAL. note: I found a flakiness in android_commands.py which is fixed in this CL (see the most recent patchset).
Thanks for getting back to this. I really want this change! Just a few more questions/comments first though. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... File build/android/pylib/android_commands.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... build/android/pylib/android_commands.py:468: self._adb.SendCommand('wait-for-device') While I love the fix, it seems misplaced to me. This method doesn't do anything special before this that would require wait-for-device. So that tells me that the wait-for-device belongs elsewhere (e.g. at the end of RestartAdbServer). Otherwise, you could foresee a wait-for-device call at the beginning of every method in this file. Also, does this fix http://crbug.com/322828 ? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf... build/android/pylib/perf/setup.py:39: logging.info('Restarting adb and waiting for device.') Is this necessary given that the methods themselves do some logging? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:13: for retry in range(5): Style nit: use xrange when it is constant. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses(): Would CleanupLeftoverProcesses() be a better name? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:28: for device in android_commands.GetAttachedDevices(): Is this a problem for sharding? Could we kill an adb that is in use by another shard?
bulach: Can you please take a look to Tony's question about sharding? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... File build/android/pylib/android_commands.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... build/android/pylib/android_commands.py:468: self._adb.SendCommand('wait-for-device') Right, I should have added more context, my bad. The call stack involved here is. android_commands.py::RunShellCommand -> adb_interface.py::SendShellCommand -> adb_interface.py::SendCommand -> run_command.py::RunCommand Now, run_command.py::RunCommand is not robust enough to tolerate adb server restarts. If you look to its code it has a retry_count. However the retry logic kicks in only in case of a command timeout, and not in the case in which the device temporary disappear (e.g., while adb server is respawning). I think (but I might be wrong, I still miss some adb details) that the main problem is that at the time RestartAdbdOnDevice is called, there is no guarantee that adbd is already running, thus KillAll fails. I think that in my specific case, this happens because of the call to RestartAdbServer() which precedes this call in CleanupPendingProcesses. Looking better at the code, it looks like waiting for the adb (host) daemon in StartAdbServer with psgrep is not enough and does not guarantee that the device will be visible in the next call to adb whatever. Ideally we'd move wait-for-device in StartAdbServer. However, the adb server (host side) is only one, while the devices can be many, and there is no knowledge about them in StartAdbServer. That's why I placed the wait-for-device here (does it make sense till this point?). The only other place in which it could make sense IMHO is in CleanupPendingProcesses() itself. > Also, does this fix http://crbug.com/322828 ? I suspect something similar is happening. But I miss a mental picture of the current adb stop-start flow in the clank bots. Is it possible that, in that bug, the adb server was restarted just before the failure point? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses(): On 2013/11/25 15:46:33, tonyg wrote: > Would CleanupLeftoverProcesses() be a better name? Why not, sounds good to me. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:28: for device in android_commands.GetAttachedDevices(): On 2013/11/25 15:46:33, tonyg wrote: > Is this a problem for sharding? Could we kill an adb that is in use by another > shard? Hmm, to be honest I have no idea how sharding is handled. I wrote this piece of code following Marcus' suggestions. Therefore, I invoke the supreme help of the sharding master to answer this question! :)
lgtm, thanks for bringing up the StartAdbServer issue! one suggestion below and a few other clarifications. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... File build/android/pylib/android_commands.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... build/android/pylib/android_commands.py:468: self._adb.SendCommand('wait-for-device') On 2013/11/25 17:26:51, Primiano Tucci wrote: > Right, I should have added more context, my bad. > The call stack involved here is. > android_commands.py::RunShellCommand > -> adb_interface.py::SendShellCommand > -> adb_interface.py::SendCommand > -> run_command.py::RunCommand > > Now, run_command.py::RunCommand is not robust enough to tolerate adb server > restarts. If you look to its code it has a retry_count. However the retry logic > kicks in only in case of a command timeout, and not in the case in which the > device temporary disappear (e.g., while adb server is respawning). > > I think (but I might be wrong, I still miss some adb details) that the main > problem is that at the time RestartAdbdOnDevice is called, there is no guarantee > that adbd is already running, thus KillAll fails. I think that in my specific > case, this happens because of the call to RestartAdbServer() which precedes this > call in CleanupPendingProcesses. > > Looking better at the code, it looks like waiting for the adb (host) daemon in > StartAdbServer with psgrep is not enough and does not guarantee that the device > will be visible in the next call to adb whatever. > > Ideally we'd move wait-for-device in StartAdbServer. > However, the adb server (host side) is only one, while the devices can be many, > and there is no knowledge about them in StartAdbServer. That's why I placed the > wait-for-device here (does it make sense till this point?). > The only other place in which it could make sense IMHO is in > CleanupPendingProcesses() itself. > > > > Also, does this fix http://crbug.com/322828 ? > I suspect something similar is happening. But I miss a mental picture of the > current adb stop-start flow in the clank bots. Is it possible that, in that bug, > the adb server was restarted just before the failure point? I agree this is misplaced and StartAdbServer's pgrep condition is insufficient :) I think it should be enough do "adb devices" in line 506 before declaring it has succeeded? at that point, pgrep would have said there's an adb running, so running any command that would involve talking to the adb daemon in the host should be enough? wdyt? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:13: for retry in range(5): On 2013/11/25 15:46:33, tonyg wrote: > Style nit: use xrange when it is constant. just in case :) python3 has removed xrange, there's only range there.. not that we'd change anytime soon, but anyways... :) https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:28: for device in android_commands.GetAttachedDevices(): On 2013/11/25 17:26:51, Primiano Tucci wrote: > On 2013/11/25 15:46:33, tonyg wrote: > > Is this a problem for sharding? Could we kill an adb that is in use by another > > shard? > > Hmm, to be honest I have no idea how sharding is handled. I wrote this piece of > code following Marcus' suggestions. Therefore, I invoke the supreme help of the > sharding master to answer this question! :) tl;dr; it's not a problem as it is right now :) long version: - for bots, build/android/pylib/perf/setup.py kicks off at the "shard master" process, before any actual "shard slave" has started, and calls this.. nothing else should call this method. - for developers, only tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py calls this... it's guarded there so when running in buildbot, the shards won't mess up with each other.. it's a bit brittle since it relies on the caller, but anyways...
https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... File build/android/pylib/android_commands.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr... build/android/pylib/android_commands.py:468: self._adb.SendCommand('wait-for-device') > at that point, pgrep would have said there's an adb running, so running any > command that would involve talking to the adb daemon in the host should be > enough? wdyt? Oh right, I got lost in the many layers. makes perfect sense. Very funny that now I can't re-reproduce the issue locally anymore (but I'm sure I've seen it failing yesterday exactly in that point). Maybe let's add a note to the crbug.com/322828 and see if the flakiness goes away back after this cl. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf... build/android/pylib/perf/setup.py:39: logging.info('Restarting adb and waiting for device.') On 2013/11/25 15:46:33, tonyg wrote: > Is this necessary given that the methods themselves do some logging? Done. https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util... build/android/pylib/utils/test_environment.py:13: for retry in range(5): On 2013/11/27 12:10:53, bulach wrote: > On 2013/11/25 15:46:33, tonyg wrote: > > Style nit: use xrange when it is constant. > > just in case :) python3 has removed xrange, there's only range there.. > not that we'd change anytime soon, but anyways... :) Ah right (both). It looks, however, that there is enough xrange precedent in chromium, so someone at some point will have to remove the x when we'll move to Py3. On the other side we won't spend 700 nanoseconds just for a missing 'x' (http://pastebin.com/j7Pa17vL, I did some quick measurements)
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
Retried try job too often on win_rel for step(s) content_unittests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
Retried try job too often on win_rel for step(s) telemetry_unittests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/62953024/420001
Message was sent while issue was closed.
Change committed as 237667 |