|
|
DescriptionCurrently, the most time consuming operation when creating a new browser
during Telemetry tests consists of sending an "adb force-stop" command
to the device, in order to force kill any previous browsers which did
not close as expected.
Sending a kill -9 to the process is, however, more efficient.
This CL also adds some efficiency improvements to DeviceUtils.KillAll.
The change amounts to a 6-7% speedup when running all of telemetry
unittests.
BUG=379378
Committed: https://crrev.com/1fd28e9eb7414c43c3811f684832296a85289b2a
Cr-Commit-Position: refs/heads/master@{#294368}
Patch Set 1 #
Total comments: 2
Patch Set 2 : Using GetPids to check if package is running #
Total comments: 2
Patch Set 3 : Using KillAll instead of ForceClose #Patch Set 4 : Using KillAll with retries=0 #
Total comments: 1
Patch Set 5 : Renamed _CloseBrowser to _KillBrowser #Patch Set 6 : Rebase from master #Patch Set 7 : git cl upload master #Patch Set 8 : resolved patch conflict #Patch Set 9 : apply change that caused conflict #
Messages
Total messages: 45 (12 generated)
perezju@chromium.org changed reviewers: + jbudorick@chromium.org, nduca@chromium.org
John, Nat: do you think it is reasonable to expect grep to be available on all devices where android Telemetry tests are meant to be run?
How long does the ForceStop call actually take? https://codereview.chromium.org/536343003/diff/1/tools/telemetry/telemetry/co... File tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py (right): https://codereview.chromium.org/536343003/diff/1/tools/telemetry/telemetry/co... tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py:201: if self._adb.device().RunShellCommand( To answer your immediate question, yes, I expect that devices will have grep. However, DeviceUtils.GetPids already does what you're trying to do, i.e., this should be if self._adb.device().GetPids(self._backend_settings.package):
The aggregated time, when opening 85 browser instances, is around 52.501 seconds to do the ForceStop, with the patch this is reduced to 11.980 seconds.
https://codereview.chromium.org/536343003/diff/1/tools/telemetry/telemetry/co... File tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py (right): https://codereview.chromium.org/536343003/diff/1/tools/telemetry/telemetry/co... tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py:201: if self._adb.device().RunShellCommand( On 2014/09/05 14:01:41, jbudorick wrote: > To answer your immediate question, yes, I expect that devices will have grep. > However, DeviceUtils.GetPids already does what you're trying to do, i.e., this > should be > > if self._adb.device().GetPids(self._backend_settings.package): Done.
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/perezju@chromium.org/536343003/20001
The CQ bit was unchecked by commit-bot@chromium.org
No LGTM from a valid reviewer yet. Only full committers are accepted. Even if an LGTM may have been provided, it was from a non-committer or a provisional committer, _not_ a full super star committer. See http://www.chromium.org/getting-involved/become-a-committer Note that this has nothing to do with OWNERS files.
lgtm
On 2014/09/09 13:55:32, jbudorick wrote: > lgtm but you probably need OWNERS approval, too
tonyg@chromium.org changed reviewers: + tonyg@chromium.org
https://codereview.chromium.org/536343003/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py (right): https://codereview.chromium.org/536343003/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py:201: if self._adb.device().GetPids(self._backend_settings.package): Excellent numbers -- I'm excited to get the benefit of this patch. But this shortcut seems like an implementation detail of ForceStop that is likely to change. Could we instead perform the test inside of the ForceStop() method itself? If I'm understanding correctly, the semantics of the method would remain unchanged and it would be faster for all callers.
https://codereview.chromium.org/536343003/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py (right): https://codereview.chromium.org/536343003/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py:201: if self._adb.device().GetPids(self._backend_settings.package): On 2014/09/09 14:39:07, tonyg wrote: > Excellent numbers -- I'm excited to get the benefit of this patch. > > But this shortcut seems like an implementation detail of ForceStop that is > likely to change. Could we instead perform the test inside of the ForceStop() > method itself? If I'm understanding correctly, the semantics of the method would > remain unchanged and it would be faster for all callers. I guess I'd be ok with moving this check into ForceStop. It's intended as a simple wrapper around 'adb shell am force-stop', but given the startup costs of am, it's probably ok to do the check all the time. We'll have to split DeviceUtils.GetPids into a GetPids and a _GetPidsImpl, though.
BTW, do we know if some form of kill -9 could get us the same semantics without using am?
On 2014/09/09 15:08:31, tonyg wrote: > BTW, do we know if some form of kill -9 could get us the same semantics without > using am? I would imagine that there's a difference in how a process shuts down between force-stop and kill, but I'm not sure. If we don't care about that difference, DeviceUtils.KillAll does exactly that.
On 2014/09/09 15:46:19, jbudorick wrote: > On 2014/09/09 15:08:31, tonyg wrote: > > BTW, do we know if some form of kill -9 could get us the same semantics > without > > using am? > > I would imagine that there's a difference in how a process shuts down between > force-stop and kill, but I'm not sure. If we don't care about that difference, > DeviceUtils.KillAll does exactly that. Maybe just switching to KillAll is a simpler approach for this patch then?
On 2014/09/09 16:33:46, tonyg wrote: > On 2014/09/09 15:46:19, jbudorick wrote: > > On 2014/09/09 15:08:31, tonyg wrote: > > > BTW, do we know if some form of kill -9 could get us the same semantics > > without > > > using am? > > > > I would imagine that there's a difference in how a process shuts down between > > force-stop and kill, but I'm not sure. If we don't care about that difference, > > DeviceUtils.KillAll does exactly that. > > Maybe just switching to KillAll is a simpler approach for this patch then? I've just done some quick tests, running all of `tools/telemetry/run_tests --browser android-chrome`, which opens and closes 73 browser instances: - Moving the GetPids check into ForceStop (compared to the current version with no checks at all) reduces the time spent in ForceStop during browser creation an 81%, but *increases* the time spent closing the browser by 16% (presumably because most of the times the check wasn't actually needed). Overall, still, running all tests we save about 2-3% of the time. - Using `kill -9` *only* on the pid of the package in question, reduces ForceStop's time during browser creation also by 81%, while *also reducing* the time spent closing the browser a 74% (presumably because kill is much faster than am force-stop). Overall this adds up to a speedup of about 7-8% (currently a bit over a whole minute). I did not saw any problems with this approach, in particular not any more or less test cases failing. Note that DeviceUtils.KillAll as currently implemented: (1) kills *all* process that have `package` as a substring (this might or might not be what we want), and (2) raises an exception when the process is not currently running (so I might need to catch and dismiss this exception, or provide an alternate implementation that doesn't raise the exception).
On 2014/09/09 16:42:10, perezju wrote: > On 2014/09/09 16:33:46, tonyg wrote: > > On 2014/09/09 15:46:19, jbudorick wrote: > > > On 2014/09/09 15:08:31, tonyg wrote: > > > > BTW, do we know if some form of kill -9 could get us the same semantics > > > without > > > > using am? > > > > > > I would imagine that there's a difference in how a process shuts down > between > > > force-stop and kill, but I'm not sure. If we don't care about that > difference, > > > DeviceUtils.KillAll does exactly that. > > > > Maybe just switching to KillAll is a simpler approach for this patch then? > > I've just done some quick tests, running all of `tools/telemetry/run_tests > --browser android-chrome`, which opens and closes 73 browser instances: > > - Moving the GetPids check into ForceStop (compared to the current version with > no checks at all) reduces the time spent in ForceStop during browser creation an > 81%, but *increases* the time spent closing the browser by 16% (presumably > because most of the times the check wasn't actually needed). Overall, still, > running all tests we save about 2-3% of the time. > > - Using `kill -9` *only* on the pid of the package in question, reduces > ForceStop's time during browser creation also by 81%, while *also reducing* the > time spent closing the browser a 74% (presumably because kill is much faster > than am force-stop). Overall this adds up to a speedup of about 7-8% (currently > a bit over a whole minute). I did not saw any problems with this approach, in > particular not any more or less test cases failing. > > Note that DeviceUtils.KillAll as currently implemented: (1) kills *all* process > that have `package` as a substring (this might or might not be what we want), > and (2) raises an exception when the process is not currently running (so I > might need to catch and dismiss this exception, or provide an alternate > implementation that doesn't raise the exception). Sounds to me like #2 + exception ignoring is the way to go.
On 2014/09/09 16:42:10, perezju wrote: > On 2014/09/09 16:33:46, tonyg wrote: > > On 2014/09/09 15:46:19, jbudorick wrote: > > > On 2014/09/09 15:08:31, tonyg wrote: > > > > BTW, do we know if some form of kill -9 could get us the same semantics > > > without > > > > using am? > > > > > > I would imagine that there's a difference in how a process shuts down > between > > > force-stop and kill, but I'm not sure. If we don't care about that > difference, > > > DeviceUtils.KillAll does exactly that. > > > > Maybe just switching to KillAll is a simpler approach for this patch then? > I would be ok with this. > I've just done some quick tests, running all of `tools/telemetry/run_tests > --browser android-chrome`, which opens and closes 73 browser instances: > > - Moving the GetPids check into ForceStop (compared to the current version with > no checks at all) reduces the time spent in ForceStop during browser creation an > 81%, but *increases* the time spent closing the browser by 16% (presumably > because most of the times the check wasn't actually needed). Overall, still, > running all tests we save about 2-3% of the time. > > - Using `kill -9` *only* on the pid of the package in question, reduces > ForceStop's time during browser creation also by 81%, while *also reducing* the > time spent closing the browser a 74% (presumably because kill is much faster > than am force-stop). Overall this adds up to a speedup of about 7-8% (currently > a bit over a whole minute). I did not saw any problems with this approach, in > particular not any more or less test cases failing. > > Note that DeviceUtils.KillAll as currently implemented: (1) kills *all* process > that have `package` as a substring (this might or might not be what we want), This is how we kill processes spawned by the main process, e.g. foo.bar foo.bar:sandboxed_process0 foo.bar:sandboxed_process1 # etc > and (2) raises an exception when the process is not currently running (so I > might need to catch and dismiss this exception, or provide an alternate > implementation that doesn't raise the exception) This is on purpose. If you don't care if a process was killed or not, silently catching the exception is fine.
Patchset #3 (id:40001) has been deleted
I've just added a new patch set with my new proposal using KillAll rather than ForceStop. You will note that, however, I did have to change a bit the implementation of KillAll, and add a _GetPidsImpl. The problem with the current implementation of KillAll is that it ends up calling `ps` at least twice, and thus negates any benefits of switching from KillAll to ForceStop. I also added an option `must_kill` which defaults to True, but when set to False will not raise an exception if no processes to kill were found; the alternative (throwing, catching and ignoring the exception) was also a bit slower. As a short summary: - Using KillAll as is (instead of ForceStop) *increases* total time by about 3% - A better implementation of KillAll + ignoring exception has a speedup of 5% - A better implementation of KillAll + not throwing the exception has a speedup of 7-8% Thoughts?
> - A better implementation of KillAll + ignoring exception has a speedup of 5% > - A better implementation of KillAll + not throwing the exception has a speedup > of 7-8% Can you explain why the perf of these two methods differs significantly?
On 2014/09/10 15:05:49, perezju wrote: > I've just added a new patch set with my new proposal using KillAll rather than > ForceStop. > > You will note that, however, I did have to change a bit the implementation of > KillAll, and add a _GetPidsImpl. The problem with the current implementation of > KillAll is that it ends up calling `ps` at least twice, and thus negates any Hah, the problem with the current implementation of KillAll (along with most of the functions in DeviceUtils) is that it's still dependent on AndroidCommands. > benefits of switching from KillAll to ForceStop. I also added an option > `must_kill` which defaults to True, but when set to False will not raise an > exception if no processes to kill were found; the alternative (throwing, > catching and ignoring the exception) was also a bit slower. > > As a short summary: > - Using KillAll as is (instead of ForceStop) *increases* total time by about 3% > - A better implementation of KillAll + ignoring exception has a speedup of 5% > - A better implementation of KillAll + not throwing the exception has a speedup > of 7-8% > > Thoughts? I'm also curious about the perf difference between the exception version vs the non-exception version.
On 2014/09/10 15:42:11, tonyg wrote: > > - A better implementation of KillAll + ignoring exception has a speedup of 5% > > - A better implementation of KillAll + not throwing the exception has a > speedup > > of 7-8% > > Can you explain why the perf of these two methods differs significantly? Hmmm.... hmmmm.... aha! I think I know why. When raising an exception, functions in DeviceUtils will by default retry 3 times before giving up. Which does not make any sense in this case, we keep calling ps, but a non existing process will not just magically pop-up to be killed. I'm re-running again with retries=0 to confirm. This is actually another issue I wanted to bring up later but, as it has arisen now, I'm not sure whether the 3-retries default makes sense for some (most) of the functions in DeviceUtils. It maybe makes sense to retry because of failed communication with the device, but not because a file or process was not found, or because we attempted to do some protected operation without privileges.
On 2014/09/10 16:35:03, perezju wrote: > On 2014/09/10 15:42:11, tonyg wrote: > > > - A better implementation of KillAll + ignoring exception has a speedup of > 5% > > > - A better implementation of KillAll + not throwing the exception has a > > speedup > > > of 7-8% > > > > Can you explain why the perf of these two methods differs significantly? > > Hmmm.... hmmmm.... aha! I think I know why. When raising an exception, functions > in DeviceUtils will by default retry 3 times before giving up. Which does not > make any sense in this case, we keep calling ps, but a non existing process will > not just magically pop-up to be killed. I'm re-running again with retries=0 to > confirm. > Ah, I hadn't thought about that, but it makes sense. > This is actually another issue I wanted to bring up later but, as it has arisen > now, I'm not sure whether the 3-retries default makes sense for some (most) of > the functions in DeviceUtils. It maybe makes sense to retry because of failed > communication with the device, but not because a file or process was not found, > or because we attempted to do some protected operation without privileges. This is getting out of the scope of this CL, but perhaps we should have timeout_retry support selective retries on a given set of exception types.
On 2014/09/10 16:41:41, jbudorick wrote: > On 2014/09/10 16:35:03, perezju wrote: > > I'm re-running again with retries=0 to confirm. Confirmed. With retries=0 I get similar speedups. I've uploaded a new patch set, please have a look. > This is getting out of the scope of this CL, but perhaps we should have > timeout_retry support selective retries on a given set of exception types. Agreed. After finishing some of my open CL's I would probably open a new bug to discuss this.
lgtm++ Thanks for sticking with this one, it looks really nice now :) Please wait for John before landing. https://codereview.chromium.org/536343003/diff/80001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py (right): https://codereview.chromium.org/536343003/diff/80001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/backends/chrome/android_browser_backend.py:225: def _CloseBrowser(self): Let's call this _KillBrowser. That more accurately describes what it does an differentiates it from the existing Browser.Close() method.
lgtm cc me on the ticket for selective retries.
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/536343003/100001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios_rel_device_ninja on tryserver.chromium.mac (http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ni...)
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/536343003/120001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_gpu on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/win_gpu/builds/59499) chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) ios_rel_device_ninja on tryserver.chromium.mac (http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ni...) mac_chromium_compile_dbg on tryserver.chromium.mac (http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_swarming on tryserver.chromium.mac (http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...) win8_chromium_rel on tryserver.chromium.win (http://build.chromium.org/p/tryserver.chromium.win/builders/win8_chromium_rel...)
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/536343003/140001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_gpu on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/mac_gpu/builds/54565)
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/536343003/180001
Message was sent while issue was closed.
Committed patchset #9 (id:180001) as 0fda08cb3c59763ce01d21340c5003b7412bbac5
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/1fd28e9eb7414c43c3811f684832296a85289b2a Cr-Commit-Position: refs/heads/master@{#294368} |