|
|
Created:
6 years, 6 months ago by bulach Modified:
6 years, 6 months ago CC:
chromium-reviews, klundberg+watch_chromium.org, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAndroid: adds device affinity for perf tests.
Step (1) on bringing device affinity for perf tests,
which will minimize some of the noise we're seeing
and allow chrome.perf to scale a bit better.
BUG=378862
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=274670
Patch Set 1 #
Total comments: 7
Patch Set 2 : Comments #
Total comments: 15
Patch Set 3 : #
Messages
Total messages: 20 (0 generated)
ptal :) https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... File build/android/pylib/perf/test_runner.py (right): https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... build/android/pylib/perf/test_runner.py:35: } tony: this is the information this script needs. it doesn't matter where it lives, or how it's built: either a static file in $SRC/, or dynamically maintained by the buildbot and shared with other platforms, it doesn't matter much... downstream, we're keeping checked in under build/sharded_perf_tests.json, so we can tweak the commands there without messing up with bot configs or anything... the format itself is fairly flexible, happy to change if you want. once we agree and this patch lands, then I'll update the json files downstream...
lgtm THANKS!!
I'm not crazy about reintroducing a test-type-specific sharding method, but I can see why it'd be useful for perf tests to run on the same devices. https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... build/android/pylib/perf/setup.py:22: devices_path = os.path.join(out_dir, '.last_devices') Where does the .last_devices file come from? https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... build/android/pylib/perf/setup.py:26: devices = f.read().splitlines() I'm concerned about how the tests will behave if the devices change. What if a device is offline? What if a device falls offline during a run? https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... File build/android/pylib/perf/test_runner.py (right): https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... build/android/pylib/perf/test_runner.py:27: "device_affinity": int, This might get into how we generate the json, and so it may not be in the scope of this review. Why is device_affinity a 'shard_index' and not just the device serial? The former seems like an obfuscated way of indicating the latter. If a device disappears from last_devices, we could just clear the affinity for all tests that have affinity for it.
thanks! +navabi, since I had to change bb_device_status_check in order to reuse some bits and pieces in the perf test runner. john, some clarifications below. https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... build/android/pylib/perf/setup.py:22: devices_path = os.path.join(out_dir, '.last_devices') On 2014/05/29 19:45:25, jbudorick wrote: > Where does the .last_devices file come from? good point! I did some changes in buildbot/bb_device_status_check.py, so there's now only one place to get this. https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/set... build/android/pylib/perf/setup.py:26: devices = f.read().splitlines() On 2014/05/29 19:45:25, jbudorick wrote: > I'm concerned about how the tests will behave if the devices change. What if a > device is offline? What if a device falls offline during a run? sorry, my bad! :) I should've given more context here... up until now, we were optimizing for throughput and reliability: whenever a device finished its test, it'd pop the next one, and if a device went bad, we'd just keep going with the remaining ones... however, this is suboptimal for perf tests: devices have slightly different characteristics, and the final result was noisy and not as deterministic as we need. this change is intentional :) we'll be trading off more determinism, for less throughput/reliability. tony can certainly provide far more details here. https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... File build/android/pylib/perf/test_runner.py (right): https://codereview.chromium.org/301183004/diff/1/build/android/pylib/perf/tes... build/android/pylib/perf/test_runner.py:27: "device_affinity": int, On 2014/05/29 19:45:25, jbudorick wrote: > This might get into how we generate the json, and so it may not be in the scope > of this review. > > Why is device_affinity a 'shard_index' and not just the device serial? The > former seems like an obfuscated way of indicating the latter. > > If a device disappears from last_devices, we could just clear the affinity for > all tests that have affinity for it. it's complicated :) as above, we do not want to run tests on devices that nave no affinity. the index versus serial: it'll make it easier to maintain by just using the index, due to: 1) ultimately, we don't care about a specific device serial, just that a given test X always run in the given device in that bot. 2) by using index rather than serial, we'll have less configuration to maintain.. by using serial, we'd potentially need to hardcode the mapping for each individual bot, say, Nexus7 Perf, Nexus4 Perf, etc... the config would end up something like: "on Nexus7 Perf Bot, run Octane on device serialfoobar". by using just an index, it's a bit more generic: "on all bots, run Octane on the 3rd device" it'll also allow it to be more elastic... for instance, say Nexus7 Perf only have 2 devices.. by saying "run on the 3rd", we can then "shard % total_devices" and still keep the affinity.
The bb_device_status_steps.py changes lgtm
thanks, armand! john, is it ok to go ahead with this?
Sorry about the delay. I'm still a little hung up on the implementation of device affinity. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_list.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_list.py:5: """A module to keep track of devices across builds.""" I'm not sure that this module goes far enough - it'd be nice to have the persistent device list logic more self-contained - but for now, it's ok. Something to think about down the line, though. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_list.py:13: def GetDeviceList(file_name): This should be renamed to avoid confusion with AdbWrapper.GetDevices / android_commands.GetAttachedDevices. GetPersistentDeviceList, GetLastDevices, ...? Something that indicates that the list of devices being retrieved is not a list of the currently attached devices. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_list.py:27: pass This should log something. Not an error, obviously, but maybe a warning. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/setup.py:27: logging.error('Unable to find %s [%s]', devices_path, e) Your GetDeviceList function eats all of the IOErrors, right? You'll never hit this exception handler, and if you can't retrieve the device list in devices_path, you'll just return an empty list. I'm not sure that that's what you want to do... https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... File build/android/pylib/perf/test_runner.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:114: total_time += result['total_time'] nit: you don't really need a separate total_time any more. this can now just be done with sum(device_total_time.values()) https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:193: affinity = (self._tests['steps'][test_name]['device_affinity'] % My point with the shard index vs the serial was that we can ensure that tests that have run on a particular device run on that same device, not that we care which device a particular test runs on. adb devices doesn't necessarily maintain a particular order, and it appears to me that, if the list of devices changes in some way, you're liable to hit false positives for device affinity and run tests on devices that don't actually have affinity for them. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:252: self.device_serial) Sneaking in ahead of me here, I see. (In the future, I'm going to be removing BaseTestRunner.device_serial and just using str(self.device)) https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... File build/android/test_runner.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... build/android/test_runner.py:649: results, _ = test_dispatcher.RunTests( FTR, I am ok with the perf-specific affinity sharding.
thanks john! another look please? https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_list.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_list.py:13: def GetDeviceList(file_name): On 2014/06/03 13:52:52, jbudorick wrote: > This should be renamed to avoid confusion with AdbWrapper.GetDevices / > android_commands.GetAttachedDevices. GetPersistentDeviceList, GetLastDevices, > ...? Something that indicates that the list of devices being retrieved is not a > list of the currently attached devices. good point! Get|WritePersistentDeviceList. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_list.py:27: pass On 2014/06/03 13:52:52, jbudorick wrote: > This should log something. Not an error, obviously, but maybe a warning. good catch.. the exception shouldn't be handled here, but rather by the callers. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... File build/android/pylib/perf/setup.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/setup.py:27: logging.error('Unable to find %s [%s]', devices_path, e) On 2014/06/03 13:52:52, jbudorick wrote: > Your GetDeviceList function eats all of the IOErrors, right? You'll never hit > this exception handler, and if you can't retrieve the device list in > devices_path, you'll just return an empty list. I'm not sure that that's what > you want to do... good catch! yeah, the exception shouldn't be handled in device_list, bubbling up and handling it here. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... File build/android/pylib/perf/test_runner.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:114: total_time += result['total_time'] On 2014/06/03 13:52:52, jbudorick wrote: > nit: you don't really need a separate total_time any more. this can now just be > done with sum(device_total_time.values()) Done. https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:193: affinity = (self._tests['steps'][test_name]['device_affinity'] % On 2014/06/03 13:52:52, jbudorick wrote: > My point with the shard index vs the serial was that we can ensure that tests > that have run on a particular device run on that same device, not that we care > which device a particular test runs on. > > adb devices doesn't necessarily maintain a particular order, and it appears to > me that, if the list of devices changes in some way, you're liable to hit false > positives for device affinity and run tests on devices that don't actually have > affinity for them. let's split this: 1) this is using the persistent list of devices, not adb devices. also, the list is sorted at setup.py:_GetAllDevices()... 2) again, the thing is that if we use serial, something/someone somewhere will have to maintain a map: ... 'serial1': [tests], 'serial2': [tests], ... this will explode quite quickly... whilst I fully agree there's a risk of tests running on devices without affinity, right now it'd only happen when the .last_devices file is wiped and generated with a different list of devices... in which case, we'd also need to change the map with serials.. right now, we have no affinity at all :) I suppose we should go with this, analyse, and if it still don't provide affinity enough, then we change to use hardcoded serials. wdyt? https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... build/android/pylib/perf/test_runner.py:252: self.device_serial) On 2014/06/03 13:52:52, jbudorick wrote: > Sneaking in ahead of me here, I see. > > (In the future, I'm going to be removing BaseTestRunner.device_serial and just > using str(self.device)) want me to keep the old way? happy either way :) https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... File build/android/test_runner.py (right): https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... build/android/test_runner.py:649: results, _ = test_dispatcher.RunTests( On 2014/06/03 13:52:52, jbudorick wrote: > FTR, I am ok with the perf-specific affinity sharding. thanks! I'm very happy that all the sharding building blocks already support this mode, and this change is only confined to Perf :)
On 2014/06/03 14:25:51, bulach wrote: > thanks john! another look please? > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... > File build/android/pylib/device/device_list.py (right): > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... > build/android/pylib/device/device_list.py:13: def GetDeviceList(file_name): > On 2014/06/03 13:52:52, jbudorick wrote: > > This should be renamed to avoid confusion with AdbWrapper.GetDevices / > > android_commands.GetAttachedDevices. GetPersistentDeviceList, GetLastDevices, > > ...? Something that indicates that the list of devices being retrieved is not > a > > list of the currently attached devices. > > good point! Get|WritePersistentDeviceList. > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/devi... > build/android/pylib/device/device_list.py:27: pass > On 2014/06/03 13:52:52, jbudorick wrote: > > This should log something. Not an error, obviously, but maybe a warning. > > good catch.. the exception shouldn't be handled here, but rather by the callers. > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > File build/android/pylib/perf/setup.py (right): > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > build/android/pylib/perf/setup.py:27: logging.error('Unable to find %s [%s]', > devices_path, e) > On 2014/06/03 13:52:52, jbudorick wrote: > > Your GetDeviceList function eats all of the IOErrors, right? You'll never hit > > this exception handler, and if you can't retrieve the device list in > > devices_path, you'll just return an empty list. I'm not sure that that's what > > you want to do... > > good catch! yeah, the exception shouldn't be handled in device_list, bubbling up > and handling it here. > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > File build/android/pylib/perf/test_runner.py (right): > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > build/android/pylib/perf/test_runner.py:114: total_time += result['total_time'] > On 2014/06/03 13:52:52, jbudorick wrote: > > nit: you don't really need a separate total_time any more. this can now just > be > > done with sum(device_total_time.values()) > > Done. > > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > build/android/pylib/perf/test_runner.py:193: affinity = > (self._tests['steps'][test_name]['device_affinity'] % > On 2014/06/03 13:52:52, jbudorick wrote: > > My point with the shard index vs the serial was that we can ensure that tests > > that have run on a particular device run on that same device, not that we care > > which device a particular test runs on. > > > > adb devices doesn't necessarily maintain a particular order, and it appears to > > me that, if the list of devices changes in some way, you're liable to hit > false > > positives for device affinity and run tests on devices that don't actually > have > > affinity for them. > > let's split this: > 1) this is using the persistent list of devices, not adb devices. also, the list > is sorted at setup.py:_GetAllDevices()... > > 2) again, the thing is that if we use serial, something/someone somewhere will > have to maintain a map: > ... > 'serial1': [tests], > 'serial2': [tests], > ... > this will explode quite quickly... > whilst I fully agree there's a risk of tests running on devices without > affinity, right now it'd only happen when the .last_devices file is wiped and > generated with a different list of devices... in which case, we'd also need to > change the map with serials.. > > right now, we have no affinity at all :) > I suppose we should go with this, analyse, and if it still don't provide > affinity enough, then we change to use hardcoded serials. wdyt? > That sounds reasonable. > https://codereview.chromium.org/301183004/diff/20001/build/android/pylib/perf... > build/android/pylib/perf/test_runner.py:252: self.device_serial) > On 2014/06/03 13:52:52, jbudorick wrote: > > Sneaking in ahead of me here, I see. > > > > (In the future, I'm going to be removing BaseTestRunner.device_serial and just > > using str(self.device)) > > want me to keep the old way? happy either way :) nope, this is fine, too. > > https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... > File build/android/test_runner.py (right): > > https://codereview.chromium.org/301183004/diff/20001/build/android/test_runne... > build/android/test_runner.py:649: results, _ = test_dispatcher.RunTests( > On 2014/06/03 13:52:52, jbudorick wrote: > > FTR, I am ok with the perf-specific affinity sharding. > > thanks! > I'm very happy that all the sharding building blocks already support this mode, > and this change is only confined to Perf :) lgtm
The CQ bit was checked by bulach@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/301183004/40001
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: android_dbg_triggered_tests on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_dbg_triggered...) chromium_presubmit on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...)
The CQ bit was checked by tonyg@chromium.org
On 2014/06/03 21:46:09, I haz the power (commit-bot) wrote: > Try jobs failed on following builders: > chromium_presubmit on tryserver.chromium > (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...) Armand lgtm'd from his @google, but his @chromium is in the OWNERS. I'm going to TBR him on the patch so the CQ will be happy.
The CQ bit was unchecked by tonyg@chromium.org
The CQ bit was checked by tonyg@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/301183004/40001
Message was sent while issue was closed.
Change committed as 274670 |