|
|
Description[Android] Extract MD5sum logic.
BUG=428729, 433101
Committed: https://crrev.com/08a5b328c0adcdf952eb77564aae6693645f836f
Cr-Commit-Position: refs/heads/master@{#305100}
Patch Set 1 #
Total comments: 24
Patch Set 2 : #Patch Set 3 : #
Messages
Total messages: 22 (6 generated)
jbudorick@chromium.org changed reviewers: + klundberg@chromium.org, perezju@chromium.org
Note that this is slightly slower than the existing implementation, presumably because of the shell script. We could also modify tools/android/md5sum/md5sum.cc to skip missing files and files other than regular files and directories.
What is slightly slower? How much of a slowdown do you think we will see on the CQ?
On 2014/11/20 15:37:41, klundberg wrote: > What is slightly slower? How much of a slowdown do you think we will see on the > CQ? The slowdown comes from the transition from running md5sum on an entire directory to running it on a bunch of specific files, checking for their existence and file type ahead of time. The total gain in time appears to be around 1m (although it's hard to say for certain because of the variance we get otherwise). That's distributed across all steps that push files. I haven't yet experimented with modifying md5sum.cc and seeing how that performs.
On 2014/11/20 15:47:36, jbudorick wrote: > On 2014/11/20 15:37:41, klundberg wrote: > > What is slightly slower? How much of a slowdown do you think we will see on > the > > CQ? > > The slowdown comes from the transition from running md5sum on an entire > directory to running it on a bunch of specific files, checking for their > existence and file type ahead of time. > > The total gain in time appears to be around 1m (although it's hard to say for > certain because of the variance we get otherwise). That's distributed across all > steps that push files. > > I haven't yet experimented with modifying md5sum.cc and seeing how that > performs. ok, I'm fine with getting this in. Just please keep an eye on the cycle time for the CQ afterwards. LGTM
The CQ bit was checked by jbudorick@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/738413002/1
https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... File build/android/pylib/utils/md5sum.py (right): https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:21: 'test -f {path} -o -d {path} ' why not just 'test -e {path}' ? https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:22: '&& LD_LIBRARY_PATH={md5sum_lib} {md5sum_bin} {path}') could we send the LD_LIBRARY_PATH using the RunShellCommand env? https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:34: paths = [paths] I would check 'if isinstance(paths, basestr)' so that, rather than just lists, we can also pass any kind of sequence. Something like: cmd = [path_to_md5bin_thing] if isinstance(paths, basestr): cmd.append(paths) else: cmd.extend(paths) https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:36: [os.path.join(constants.GetOutDirectory(), 'md5sum_bin_host')] + paths) if GetOutDirectory() is a constant, can we move this join to the top of the module so we only do it once? https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:51: paths = [paths] Same here, we could just write: if isinstance(paths, basestr): paths = [paths] note, there is no need to turn the generator into a list, since we'll only iterate over it anyway. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:53: if not device.FileExists(MD5SUM_DEVICE_BIN_PATH): Probably not something to do now, but maybe DeviceUtils should own these sorts of checks, so that they can be cached there? (i.e. some "Is Device Tool Installed"?, or "Install Device Tool if Needed"?) https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:55: os.path.join(constants.GetOutDirectory(), 'md5sum_dist'), Same thing about the constant, though. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:69: out = device.RunShellCommand(['sh', md5sum_device_script_file.name]) Why don't we just run the script on the device rather than pushing it through temp files? btw, I also thought about this when doing the for each cpu loop, maybe we want a well written and well tested "RepeatShellCommand" that does the loop for us in the device?
The CQ bit was unchecked by jbudorick@chromium.org
https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... File build/android/pylib/utils/md5sum.py (right): https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:69: out = device.RunShellCommand(['sh', md5sum_device_script_file.name]) On 2014/11/20 17:04:53, perezju wrote: > Why don't we just run the script on the device rather than pushing it through > temp files? > > btw, I also thought about this when doing the for each cpu loop, maybe we want a > well written and well tested "RepeatShellCommand" that does the loop for us in > the device? Another btw: looking at md5sum.cc, it also seems that we can pass a list of files as arguments to the tool.
Addressed comments + added tests. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... File build/android/pylib/utils/md5sum.py (right): https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:21: 'test -f {path} -o -d {path} ' On 2014/11/20 17:04:53, perezju wrote: > why not just 'test -e {path}' ? This avoids file types we don't want to md5sum, e.g. named pipes. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:22: '&& LD_LIBRARY_PATH={md5sum_lib} {md5sum_bin} {path}') On 2014/11/20 17:04:53, perezju wrote: > could we send the LD_LIBRARY_PATH using the RunShellCommand env? No, not with the way this script is rolled. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:34: paths = [paths] On 2014/11/20 17:04:53, perezju wrote: > I would check 'if isinstance(paths, basestr)' so that, rather than just lists, > we can also pass any kind of sequence. Something like: > > cmd = [path_to_md5bin_thing] > if isinstance(paths, basestr): > cmd.append(paths) > else: > cmd.extend(paths) With the list addition below, we actually do want this to be a list. I switched this check to be the same as the one below (i.e., it now checks for GeneratorType, too) https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:36: [os.path.join(constants.GetOutDirectory(), 'md5sum_bin_host')] + paths) On 2014/11/20 17:04:53, perezju wrote: > if GetOutDirectory() is a constant, can we move this join to the top of the > module so we only do it once? It's one of the not-quite-constant constants. I was talking to mikecase@ about those recently; we should talk about them this week. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:51: paths = [paths] On 2014/11/20 17:04:53, perezju wrote: > Same here, we could just write: > > if isinstance(paths, basestr): > paths = [paths] > > note, there is no need to turn the generator into a list, since we'll only > iterate over it anyway. Sticking with this for consistency with the above, although here, you're correct w.r.t generator -> list conversion since we're only using it once. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:53: if not device.FileExists(MD5SUM_DEVICE_BIN_PATH): On 2014/11/20 17:04:53, perezju wrote: > Probably not something to do now, but maybe DeviceUtils should own these sorts > of checks, so that they can be cached there? (i.e. some "Is Device Tool > Installed"?, or "Install Device Tool if Needed"?) Definitely not something to do now, but not a bad idea. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:55: os.path.join(constants.GetOutDirectory(), 'md5sum_dist'), On 2014/11/20 17:04:53, perezju wrote: > Same thing about the constant, though. same response. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:69: out = device.RunShellCommand(['sh', md5sum_device_script_file.name]) On 2014/11/20 17:04:53, perezju wrote: > Why don't we just run the script on the device rather than pushing it through > temp files? > Because the script is pretty likely to be longer than 1024 characters, which means adb would choke on it. > btw, I also thought about this when doing the for each cpu loop, maybe we want a > well written and well tested "RepeatShellCommand" that does the loop for us in > the device? I was considering a RunShellScript that handles the script file logic. I'm not sure that we do either enough at the moment to warrant a new function in DeviceUtils. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:69: out = device.RunShellCommand(['sh', md5sum_device_script_file.name]) On 2014/11/20 17:16:13, perezju wrote: > On 2014/11/20 17:04:53, perezju wrote: > > Why don't we just run the script on the device rather than pushing it through > > temp files? > > > > btw, I also thought about this when doing the for each cpu loop, maybe we want > a > > well written and well tested "RepeatShellCommand" that does the loop for us in > > the device? > > Another btw: looking at md5sum.cc, it also seems that we can pass a list of > files as arguments to the tool. We can. I tried doing so (https://codereview.chromium.org/627083002/#ps550001), but: - it's noticeably slower if we prune the list of files beforehand to only those that exist, and - it doesn't really cooperate if we don't.
https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... File build/android/pylib/utils/md5sum.py (right): https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:21: 'test -f {path} -o -d {path} ' On 2014/11/20 19:38:07, jbudorick wrote: > On 2014/11/20 17:04:53, perezju wrote: > > why not just 'test -e {path}' ? > > This avoids file types we don't want to md5sum, e.g. named pipes. Ok! https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:34: paths = [paths] On 2014/11/20 19:38:07, jbudorick wrote: > On 2014/11/20 17:04:53, perezju wrote: > > I would check 'if isinstance(paths, basestr)' so that, rather than just lists, > > we can also pass any kind of sequence. Something like: > > > > cmd = [path_to_md5bin_thing] > > if isinstance(paths, basestr): > > cmd.append(paths) > > else: > > cmd.extend(paths) > > With the list addition below, we actually do want this to be a list. I switched > this check to be the same as the one below (i.e., it now checks for > GeneratorType, too) But what if I want to pass a tuple? or some other fancy object that just implements the sequence interface? (granted, we will probably never do that, but I think my suggestion was somewhat cleaner and more general, any thoughts on that?) https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:36: [os.path.join(constants.GetOutDirectory(), 'md5sum_bin_host')] + paths) On 2014/11/20 19:38:07, jbudorick wrote: > On 2014/11/20 17:04:53, perezju wrote: > > if GetOutDirectory() is a constant, can we move this join to the top of the > > module so we only do it once? > > It's one of the not-quite-constant constants. I was talking to mikecase@ about > those recently; we should talk about them this week. Ok! https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:69: out = device.RunShellCommand(['sh', md5sum_device_script_file.name]) On 2014/11/20 19:38:08, jbudorick wrote: > On 2014/11/20 17:16:13, perezju wrote: > > On 2014/11/20 17:04:53, perezju wrote: > > > Why don't we just run the script on the device rather than pushing it > through > > > temp files? > > > > > > btw, I also thought about this when doing the for each cpu loop, maybe we > want > > a > > > well written and well tested "RepeatShellCommand" that does the loop for us > in > > > the device? > > > > Another btw: looking at md5sum.cc, it also seems that we can pass a list of > > files as arguments to the tool. > > We can. I tried doing so (https://codereview.chromium.org/627083002/#ps550001), > but: > - it's noticeably slower if we prune the list of files beforehand to only those > that exist, and > - it doesn't really cooperate if we don't. Yeah, I thought about all those problems just after hitting send. Another idea (for another CL, and when WriteFile is ready), just make RunShellCommand write the file and run it if the script is too big.
https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... File build/android/pylib/utils/md5sum.py (right): https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:34: paths = [paths] On 2014/11/20 20:38:42, perezju wrote: > On 2014/11/20 19:38:07, jbudorick wrote: > > On 2014/11/20 17:04:53, perezju wrote: > > > I would check 'if isinstance(paths, basestr)' so that, rather than just > lists, > > > we can also pass any kind of sequence. Something like: > > > > > > cmd = [path_to_md5bin_thing] > > > if isinstance(paths, basestr): > > > cmd.append(paths) > > > else: > > > cmd.extend(paths) > > > > With the list addition below, we actually do want this to be a list. I > switched > > this check to be the same as the one below (i.e., it now checks for > > GeneratorType, too) > > But what if I want to pass a tuple? or some other fancy object that just Good point. > implements the sequence interface? (granted, we will probably never do that, (this one seems unlikely, though) > but I think my suggestion was somewhat cleaner and more general, any thoughts on > that?) Yeah, I guess that makes sense. Switched. https://codereview.chromium.org/738413002/diff/1/build/android/pylib/utils/md... build/android/pylib/utils/md5sum.py:51: paths = [paths] On 2014/11/20 19:38:07, jbudorick wrote: > On 2014/11/20 17:04:53, perezju wrote: > > Same here, we could just write: > > > > if isinstance(paths, basestr): > > paths = [paths] > > > > note, there is no need to turn the generator into a list, since we'll only > > iterate over it anyway. > > Sticking with this for consistency with the above, although here, you're correct > w.r.t generator -> list conversion since we're only using it once. (also switched)
lgtm!
The CQ bit was checked by jbudorick@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/738413002/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by jbudorick@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/738413002/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/08a5b328c0adcdf952eb77564aae6693645f836f Cr-Commit-Position: refs/heads/master@{#305100} |