Issue 62953024: [Telemetry] Refactor common Android cleanup functions.

Issue 62953024: [Telemetry] Refactor common Android cleanup functions. (Closed)

Created:
7 years, 1 month ago by Primiano Tucci (use gerrit)

Modified:
7 years ago

Reviewers:
bulach, tonyg

CC:
chromium-reviews, craigdh+watch_chromium.org, chrome-speed-team+watch_google.com, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org, klundberg+watch_chromium.org, telemetry+watch_chromium.org, frankf+watch_chromium.org

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

[Telemetry] Refactor common Android test harness functions. Refactoring common cleanup code required by both android_browser_finder.py and perf/setup.py and moving it into pyblib/utils/test_environment.py. Also, this change avoids killing twice the adb server when running in a bot environment, where the adb restart is already invoked by the bot steps. BUG=268450 R=bulach@chromium.org,tonyg@chromium.org NOTRY=true Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=237667

Patch Set 1 : #

Total comments: 6

Patch Set 2 : Always adb root + docstring #

Patch Set 3 : Rebase + deflake RestartAdbdOnDevice #

Total comments: 14

Patch Set 4 : Rename + move wait-for-device + rebase #

Created: 7 years ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+46 lines, -30 lines)			Patch
M	build/android/pylib/android_commands.py	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	build/android/pylib/perf/setup.py	View	1 2 3	2 chunks	+3 lines, -28 lines	0 comments	Download
A	build/android/pylib/utils/test_environment.py	View	1 2 3	1 chunk	+36 lines, -0 lines	0 comments	Download
M	tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py	View	1 2 3	2 chunks	+4 lines, -1 line	0 comments	Download

Messages

Total messages: 20 (0 generated)

Expand Messages | Collapse Messages

tonyg

lgtm, but please wait for marcus https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils/test_environment.py File build/android/pylib/utils/test_environment.py (right): https://codereview.chromium.org/62953024/diff/50001/build/android/pylib/utils/test_environment.py#newcode24 build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses(restart_adb_as_root): Please ...

7 years, 1 month ago (2013-11-15 05:45:10 UTC) #2

bulach

lgtm, thanks! I'm looking forward to all our "workarounds" to be happening at this single ...

7 years, 1 month ago (2013-11-15 09:36:07 UTC) #3

Primiano Tucci (use gerrit)

Thanks for the comments, see the reply inline. Unfortunately I'm OOO in MTV right now ...

7 years, 1 month ago (2013-11-19 18:13:02 UTC) #4

Primiano Tucci (use gerrit)

Question: I got a bit lost after last recent changes. Do we still want this ...

7 years ago (2013-11-25 12:52:50 UTC) #5

tonyg

Thanks for getting back to this. I really want this change! Just a few more ...

7 years ago (2013-11-25 15:46:33 UTC) #6

Primiano Tucci (use gerrit)

bulach: Can you please take a look to Tony's question about sharding? https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/android_commands.py File build/android/pylib/android_commands.py ...

7 years ago (2013-11-25 17:26:51 UTC) #7

bulach: Can you please take a look to Tony's question about sharding?

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
File build/android/pylib/android_commands.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
build/android/pylib/android_commands.py:468:
self._adb.SendCommand('wait-for-device')
Right, I should have added more context, my bad.
The call stack involved here is.
android_commands.py::RunShellCommand
 -> adb_interface.py::SendShellCommand
   -> adb_interface.py::SendCommand
     -> run_command.py::RunCommand

Now, run_command.py::RunCommand is not robust enough to tolerate adb server
restarts. If you look to its code it has a retry_count. However the retry logic
kicks in only in case of a command timeout, and not in the case in which the
device temporary disappear (e.g., while adb server is respawning).

I think (but I might be wrong, I still miss some adb details) that the main
problem is that at the time RestartAdbdOnDevice is called, there is no guarantee
that adbd is already running, thus KillAll fails. I think that in my specific
case, this happens because of the call to RestartAdbServer() which precedes this
call in CleanupPendingProcesses.

Looking better at the code, it looks like waiting for the adb (host) daemon in
StartAdbServer with psgrep is not enough and does not guarantee that the device
will be visible in the next call to adb whatever.

Ideally we'd move wait-for-device in StartAdbServer.
However, the adb server (host side) is only one, while the devices can be many,
and there is no knowledge about them in StartAdbServer. That's why I placed the
wait-for-device here (does it make sense till this point?).
The only other place in which it could make sense IMHO is in
CleanupPendingProcesses() itself.

> Also, does this fix http://crbug.com/322828 ?
I suspect something similar is happening. But I miss a mental picture of the
current adb stop-start flow in the clank bots. Is it possible that, in that bug,
the adb server was restarted just before the failure point?

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
File build/android/pylib/utils/test_environment.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
build/android/pylib/utils/test_environment.py:24: def CleanupPendingProcesses():
On 2013/11/25 15:46:33, tonyg wrote:
> Would CleanupLeftoverProcesses() be a better name?

Why not, sounds good to me.

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
build/android/pylib/utils/test_environment.py:28: for device in
android_commands.GetAttachedDevices():
On 2013/11/25 15:46:33, tonyg wrote:
> Is this a problem for sharding? Could we kill an adb that is in use by another
> shard?

Hmm, to be honest I have no idea how sharding is  handled. I wrote this piece of
code following Marcus' suggestions. Therefore, I invoke the supreme help of the
sharding master to answer this question! :)

bulach

lgtm, thanks for bringing up the StartAdbServer issue! one suggestion below and a few other ...

7 years ago (2013-11-27 12:10:53 UTC) #8

lgtm, thanks for bringing up the StartAdbServer issue! one suggestion below and
a few other clarifications.

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
File build/android/pylib/android_commands.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
build/android/pylib/android_commands.py:468:
self._adb.SendCommand('wait-for-device')
On 2013/11/25 17:26:51, Primiano Tucci wrote:
> Right, I should have added more context, my bad.
> The call stack involved here is.
> android_commands.py::RunShellCommand
>  -> adb_interface.py::SendShellCommand
>    -> adb_interface.py::SendCommand
>      -> run_command.py::RunCommand
> 
> Now, run_command.py::RunCommand is not robust enough to tolerate adb server
> restarts. If you look to its code it has a retry_count. However the retry
logic
> kicks in only in case of a command timeout, and not in the case in which the
> device temporary disappear (e.g., while adb server is respawning).
> 
> I think (but I might be wrong, I still miss some adb details) that the main
> problem is that at the time RestartAdbdOnDevice is called, there is no
guarantee
> that adbd is already running, thus KillAll fails. I think that in my specific
> case, this happens because of the call to RestartAdbServer() which precedes
this
> call in CleanupPendingProcesses.
> 
> Looking better at the code, it looks like waiting for the adb (host) daemon in
> StartAdbServer with psgrep is not enough and does not guarantee that the
device
> will be visible in the next call to adb whatever.
> 
> Ideally we'd move wait-for-device in StartAdbServer.
> However, the adb server (host side) is only one, while the devices can be
many,
> and there is no knowledge about them in StartAdbServer. That's why I placed
the
> wait-for-device here (does it make sense till this point?).
> The only other place in which it could make sense IMHO is in
> CleanupPendingProcesses() itself.
> 
> 
> > Also, does this fix http://crbug.com/322828 ?
> I suspect something similar is happening. But I miss a mental picture of the
> current adb stop-start flow in the clank bots. Is it possible that, in that
bug,
> the adb server was restarted just before the failure point?

I agree this is misplaced and StartAdbServer's pgrep condition is insufficient
:)
I think it should be enough do "adb devices" in line 506 before declaring it has
succeeded?
at that point, pgrep would have said there's an adb running, so running any
command that would involve talking to the adb daemon in the host should be
enough? wdyt?

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
File build/android/pylib/utils/test_environment.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
build/android/pylib/utils/test_environment.py:13: for retry in range(5):
On 2013/11/25 15:46:33, tonyg wrote:
> Style nit: use xrange when it is constant.

just in case :) python3 has removed xrange, there's only range there..
not that we'd change anytime soon, but anyways... :)

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
build/android/pylib/utils/test_environment.py:28: for device in
android_commands.GetAttachedDevices():
On 2013/11/25 17:26:51, Primiano Tucci wrote:
> On 2013/11/25 15:46:33, tonyg wrote:
> > Is this a problem for sharding? Could we kill an adb that is in use by
another
> > shard?
> 
> Hmm, to be honest I have no idea how sharding is  handled. I wrote this piece
of
> code following Marcus' suggestions. Therefore, I invoke the supreme help of
the
> sharding master to answer this question! :)

tl;dr; it's not a problem as it is right now :)

long version:
- for bots, build/android/pylib/perf/setup.py kicks off at the "shard master"
process, before any actual "shard slave" has started, and calls this.. nothing
else should call this method.

- for developers, only
tools/telemetry/telemetry/core/backends/chrome/android_browser_finder.py calls
this... it's guarded there so when running in buildbot, the shards won't mess up
with each other..

it's a bit brittle since it relies on the caller, but anyways...

Primiano Tucci (use gerrit)

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/android_commands.py File build/android/pylib/android_commands.py (right): https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/android_commands.py#newcode468 build/android/pylib/android_commands.py:468: self._adb.SendCommand('wait-for-device') > at that point, pgrep would have said ...

7 years ago (2013-11-27 15:14:41 UTC) #9

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
File build/android/pylib/android_commands.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/andr...
build/android/pylib/android_commands.py:468:
self._adb.SendCommand('wait-for-device')
> at that point, pgrep would have said there's an adb running, so running any
> command that would involve talking to the adb daemon in the host should be
> enough? wdyt?
Oh right, I got lost in the many layers. makes perfect sense.
Very funny that now I can't re-reproduce the issue locally anymore (but I'm sure
I've seen it failing yesterday exactly in that point).
Maybe let's add a note to the crbug.com/322828 and see if the flakiness goes
away back after this cl.

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf...
File build/android/pylib/perf/setup.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/perf...
build/android/pylib/perf/setup.py:39: logging.info('Restarting adb and waiting
for device.')
On 2013/11/25 15:46:33, tonyg wrote:
> Is this necessary given that the methods themselves do some logging?

Done.

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
File build/android/pylib/utils/test_environment.py (right):

https://codereview.chromium.org/62953024/diff/220002/build/android/pylib/util...
build/android/pylib/utils/test_environment.py:13: for retry in range(5):
On 2013/11/27 12:10:53, bulach wrote:
> On 2013/11/25 15:46:33, tonyg wrote:
> > Style nit: use xrange when it is constant.
> 
> just in case :) python3 has removed xrange, there's only range there..
> not that we'd change anytime soon, but anyways... :)

Ah right (both). It looks, however, that there is enough xrange precedent in
chromium, so someone at some point will have to remove the x when we'll move to
Py3.
On the other side we won't spend 700 nanoseconds just for a missing 'x'
(http://pastebin.com/j7Pa17vL, I did some quick measurements)