|
|
DescriptionImprovements of TimeoutRetryThread so that it is possible to stop threads which have already timed out (instead of keeping them like zombie threads running in the background).
Provides a timeout_retry.WaitFor to wait until a condition becomes true.
Also migrates some DeviceUtils methods to use the new adb_wrapper:
- WaitUntilFullyBooted
- GetProp/SetProp
- GetApplicationPath
BUG=267773
Committed: https://crrev.com/8851b7f06044d506745f0a0547e247cb6660f1d9
Cr-Commit-Position: refs/heads/master@{#302250}
Patch Set 1 #
Total comments: 20
Patch Set 2 : some fixes, tests ready #
Total comments: 22
Patch Set 3 : moved check for timed out thread into adb_wrapper #
Total comments: 10
Patch Set 4 : improved TimeoutRetryThread and timeouts in adb_wrapper #Patch Set 5 : do not catch CommandFailedError in _WaitFor #
Total comments: 23
Patch Set 6 : more improvements on TimeoutRetryThread #
Total comments: 2
Patch Set 7 : fixed nit #
Messages
Total messages: 19 (2 generated)
perezju@chromium.org changed reviewers: + jbudorick@chromium.org
John, this is very much a work in progress (tests are still missing), but wanted to show you the direction in which I am trying to go. This starts to take care of the problem of timeouts not stopping threads (at least on the host side).
While I generally like this approach, I'm not sure I like the stop mechanism as implemented. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:249: r = False nit: similarly here w.r.t False v None https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:253: logging.warning('%s: condition %r met (%1.2f) at %s', str(self), nit: should be info or debug, not warning https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:259: return False nit: I think this should return None, as False implies (to me at least) that we're otherwise returning true. We're returning the result of the condition, which isn't necessarily True/False https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:930: def GetProp(self, property_name, cache=False, timeout=None, retries=None): We should make sure we test the timeout/retries part of this pretty well. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:947: if cache and cache_name in self._cache: I think that, for now, we should limit property caching to what we were doing before (ro.boot.*, ro.build.*, and ro.product.*). We can look at expanding that list separately. See b/a/pylib/system_properties.py. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:959: def SetProp(self, property_name, value, check=False, timeout=None, What's the motivation behind check? https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:974: self.RunShellCommand(['setprop', property_name, value]) If we expand caching beyond read-only properties, we should update the cache here. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... File build/android/pylib/utils/reraiser_thread.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... build/android/pylib/utils/reraiser_thread.py:73: def StopThread(self): I don't think we want to provide this in reraiser_thread unless we can actually make reraiser_thread stop the thread. Relying on self._func to check Stopped seems problematic. I don't have any great suggestions for implementing this otherwise, but I'll keep thinking about how we could. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... build/android/pylib/utils/reraiser_thread.py:149: watcher=watchdog_timer.WatchdogTimer(None) nice catch. nit: spaces around = https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/ti... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/ti... build/android/pylib/utils/timeout_retry.py:45: name = 'TimeoutThread-%d-for-%s' % (retries, Good idea.
https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:253: logging.warning('%s: condition %r met (%1.2f) at %s', str(self), On 2014/10/24 17:30:51, jbudorick wrote: > nit: should be info or debug, not warning Yes, I agree these should be info. At the moment this was just a temporary thing for me to debug while developing. The problem was tht setting the logger level to info already dumps too much to the screen because of cmd_helper logging every single shell command run on the host. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:259: return False On 2014/10/24 17:30:51, jbudorick wrote: > nit: I think this should return None, as False implies (to me at least) that > we're otherwise returning true. We're returning the result of the condition, > which isn't necessarily True/False Agree. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:930: def GetProp(self, property_name, cache=False, timeout=None, retries=None): On 2014/10/24 17:30:51, jbudorick wrote: > We should make sure we test the timeout/retries part of this pretty well. Acknowledged. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:947: if cache and cache_name in self._cache: On 2014/10/24 17:30:51, jbudorick wrote: > I think that, for now, we should limit property caching to what we were doing > before (ro.boot.*, ro.build.*, and ro.product.*). We can look at expanding that > list separately. See b/a/pylib/system_properties.py. An idea I had was to provide the class with some properties such as: build_id, build_type, product_name, etc; which are implemented as calls to GetProp with cache=True. Thoughts? I'm undecided about this. Maybe the best behavior is indeed for GetProp to already "know" which properties should be cached as currently done? https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:959: def SetProp(self, property_name, value, check=False, timeout=None, On 2014/10/24 17:30:51, jbudorick wrote: > What's the motivation behind check? The TODO here https://cs.corp.google.com/#clankium/src/build/android/pylib/system_propertie... Without root on the device, trying to setprop just fails silently (no error messages on the output nor non-zero exit code). I actually think this shouldn't be an option, and the check should always be made. But also didn't want to add a second shell call all the time. (Or maybe I should make a small shell script that sets and checks?) https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:974: self.RunShellCommand(['setprop', property_name, value]) On 2014/10/24 17:30:51, jbudorick wrote: > If we expand caching beyond read-only properties, we should update the cache > here. This happens "auto-magically" when check=True. Maybe we should just always do the check? (The alternative is that in some cases we might end up storing something in the cache which didn't actually made it into the device) https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... File build/android/pylib/utils/reraiser_thread.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... build/android/pylib/utils/reraiser_thread.py:73: def StopThread(self): On 2014/10/24 17:30:51, jbudorick wrote: > I don't think we want to provide this in reraiser_thread unless we can actually > make reraiser_thread stop the thread. Relying on self._func to check Stopped > seems problematic. > > I don't have any great suggestions for implementing this otherwise, but I'll > keep thinking about how we could. Looking around, e.g. http://stackoverflow.com/questions/323972/is-there-any-way-to-kill-a-thread-i..., asking a thread to stop is perhaps the least problematic approach. (And definitely better than what we now do, which is... nothing and keep running 2 or 3 threads at the same time during retries). There seems to be a hack using PyThreadState_SetAsyncExc to make one thread raise an exception on another, but besides risky also seems quite useless if the following is true: "If the thread is busy in a system call (time.sleep(), socket.accept(), ...), the exception is simply ignored.".
A new stab at _WaitFor, which makes sure to stop threads that have timed out. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:171: return self.build_type == 'user' This would be a property of the class, see below. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:249: thread.CheckTimeout() This would raise an exception if the current thread should have timed out by now. Should be added after any potentially lengthy operation. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:279: def Reboot(self, block=True, timeout=None, retries=None): This brings in some of the logic from android_commands.Reboot. I did not include, however, the check of whether we're running on HIVE [1] (apparently a now defunct project?) [1]: https://cs.corp.google.com/#clankium/src/build/android/pylib/android_commands... https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:941: return self.GetProp('ro.build.type', cache=True) This is the sort of thing that I propose to both: cache properties which make sense to cache, and provide clients with simple access to them. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:992: % (property_name, value), str(self)) Still wondering whether to always make the check, and pack these two shell calls in a single script? https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:30: Moved the changes from reraiser_thread into here. Maybe a bit better? https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:62: threading.current_thread().name)) With, say, 3 retries, thread numbers now count from 1, 2, 3, 4.
https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/device/d... build/android/pylib/device/device_utils.py:959: def SetProp(self, property_name, value, check=False, timeout=None, On 2014/10/27 11:07:07, perezju wrote: > On 2014/10/24 17:30:51, jbudorick wrote: > > What's the motivation behind check? > > The TODO here > https://cs.corp.google.com/#clankium/src/build/android/pylib/system_propertie... > Use the public cs for upstream reviews: https://code.google.com/p/chromium/codesearch#chromium/src/build/android/pyli... > Without root on the device, trying to setprop just fails silently (no error > messages on the output nor non-zero exit code). > > I actually think this shouldn't be an option, and the check should always be > made. But also didn't want to add a second shell call all the time. (Or maybe I > should make a small shell script that sets and checks?) That makes sense to me, but we should leave it as a default off option in this CL. Add a TODO to remove the option, make the check mandatory, and rework it as a shell script that does setprop & getprop. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... File build/android/pylib/utils/reraiser_thread.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... build/android/pylib/utils/reraiser_thread.py:73: def StopThread(self): On 2014/10/27 11:07:07, perezju wrote: > On 2014/10/24 17:30:51, jbudorick wrote: > > I don't think we want to provide this in reraiser_thread unless we can > actually > > make reraiser_thread stop the thread. Relying on self._func to check Stopped > > seems problematic. > > > > I don't have any great suggestions for implementing this otherwise, but I'll > > keep thinking about how we could. > > Looking around, e.g. > http://stackoverflow.com/questions/323972/is-there-any-way-to-kill-a-thread-i..., > asking a thread to stop is perhaps the least problematic approach. (And > definitely better than what we now do, which is... nothing and keep running 2 or > 3 threads at the same time during retries). > I'm not sure this is better than what we do now. Is there some way that we can either guarantee that reraiser threads check to see if they should stop or only make this interface available if the thread supports stopping? If not, this just seems liable to deceive clients. We should also rename this to make it clear that this merely asks the thread to stop. > There seems to be a hack using PyThreadState_SetAsyncExc to make one thread > raise an exception on another, but besides risky also seems quite useless if the > following is true: "If the thread is busy in a system call (time.sleep(), > socket.accept(), ...), the exception is simply ignored.". Yeah, I don't like that here. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:249: thread.CheckTimeout() On 2014/10/27 17:43:34, perezju wrote: > This would raise an exception if the current thread should have timed out by > now. Should be added after any potentially lengthy operation. I really don't like delegating this responsibility to the client implementation -- see my comment in reraiser_thread. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:279: def Reboot(self, block=True, timeout=None, retries=None): On 2014/10/27 17:43:34, perezju wrote: > This brings in some of the logic from android_commands.Reboot. I did not > include, however, the check of whether we're running on HIVE [1] (apparently a > now defunct project?) > > [1]: > https://cs.corp.google.com/#clankium/src/build/android/pylib/android_commands... public cs: https://code.google.com/p/chromium/codesearch#chromium/src/build/android/pyli... I'm guessing you asked torne about this? https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:291: def device_offline(): nit: for something this simple, just use a lambda inline. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:296: self._WaitFor(device_offline, wait_period=1, max_tries=5) With the lambda, this becomes: self._WaitFor(lambda: not self.IsOnline(), ...) https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:941: return self.GetProp('ro.build.type', cache=True) On 2014/10/27 17:43:34, perezju wrote: > This is the sort of thing that I propose to both: cache properties which make > sense to cache, and provide clients with simple access to them. I'll have to think about this. It bloats the interface some, but clients don't have to worry about the name of the particular android property they're interested in. Leaning yes at the moment, though. In general, what do you think would be the threshold for switching from GetProp to a property? https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:968: if cache or cache_name in self._cache: The second clause here is interesting. Should we update the cache or remove the item from the cache? https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:992: % (property_name, value), str(self)) On 2014/10/27 17:43:34, perezju wrote: > Still wondering whether to always make the check, and pack these two shell calls > in a single script? Not yet. In a separate CL, definitely. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:30: On 2014/10/27 17:43:34, perezju wrote: > Moved the changes from reraiser_thread into here. Maybe a bit better? Definitely, but I'm still not a fan of making the client figure out whether it has timed out or not.
Ok, another try at improving the TimeoutRetry mechanism. This time moved the checks into adb_wrapper, so that clients do not have to worry (too much) about that. https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... File build/android/pylib/utils/reraiser_thread.py (right): https://codereview.chromium.org/636273004/diff/1/build/android/pylib/utils/re... build/android/pylib/utils/reraiser_thread.py:73: def StopThread(self): On 2014/10/27 19:15:36, jbudorick wrote: > I'm not sure this is better than what we do now. Just for fun, trying to reboot with a very low timeout. This is what the TimeoutRetry wrapper currently does: https://drive.google.com/a/chromium.org/file/d/0B7JfGcdoP-YzTnAzOEJRUXFhNmM/v... This is what I think it should do: https://drive.google.com/a/chromium.org/file/d/0B7JfGcdoP-YzQlZIWnEyLTZ1TXM/v... https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:279: def Reboot(self, block=True, timeout=None, retries=None): On 2014/10/27 19:15:36, jbudorick wrote: > I'm guessing you asked torne about this? Just checked with him. HIVE is not used anymore, so it's OK to remove anything that deals with it. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:296: self._WaitFor(device_offline, wait_period=1, max_tries=5) On 2014/10/27 19:15:36, jbudorick wrote: > With the lambda, this becomes: > > self._WaitFor(lambda: not self.IsOnline(), ...) Yes, but if we use a named function then _WaitFor can also use the more descriptive label "device_offline" in log messages. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:941: return self.GetProp('ro.build.type', cache=True) On 2014/10/27 19:15:36, jbudorick wrote: > On 2014/10/27 17:43:34, perezju wrote: > > This is the sort of thing that I propose to both: cache properties which make > > sense to cache, and provide clients with simple access to them. > > I'll have to think about this. It bloats the interface some, but clients don't > have to worry about the name of the particular android property they're > interested in. Leaning yes at the moment, though. > > In general, what do you think would be the threshold for switching from GetProp > to a property? I say we could start with properties that (1) are read-only (so makes sense to cache them) and (2) clients actually want to access them. Currently this means (grepping through uses of GetProp and system_properties): ro.build.description ro.build.fingerprint ro.build.id ro.build.product ro.build.type ro.build.version.sdk ro.product.cpu.abi ro.product.model ro.product.name ro.setupwizard.mode https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:968: if cache or cache_name in self._cache: On 2014/10/27 19:15:36, jbudorick wrote: > The second clause here is interesting. Should we update the cache or remove the > item from the cache? I say we update. From the point of view of the caller, cache=False means "I want to get a real value directly from the device", while cache=True means "I'm happy to get a cached value from a previous call, if any". Updating here seems less wasteful and keeps both types of callers happy. https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... File build/android/pylib/device/adb_wrapper.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/adb_wrapper.py:74: threading.current_thread().CheckTimeout() doing the check here at least we: (1) make sure that it's always called after (almost all) "potentially lengthy" operations, and (2) do not have to rely on clients of adb_wrapper remembering to make this check. https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:284: r = None clients don't need to check whether their thread had timed-out! https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:294: self.adb.Wait(wait_period) well, almost... time.sleep needs would need to be replaced with adb.Wait https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:462: return '' this is what "pm path" returns for non-existent packages, so thought it's better to return an empty string rather than to raise a command error which might get confused with adb or other device problems. I also made sure that this change also makes sense for all other callers with single_line=true (thankfully not that many yet!). https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... File build/android/pylib/device/device_utils_test.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/device_utils_test.py:560: self.adb.WaitForDevice.assert_called_once_with() I've got an idea for an "assertAdbCallSequence" to allow mixing calls to different adb.Methods, but that will have to wait for another CL. https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:25: self._timeout_expired = True Have tried several names now, maybe RequestThreadTimeout would work better?
I like this better, and naming it "RequestThreadTimeout" (or similar) is definitely a good step. That said, the onus is still on the client to behave. The difference is that we've moved that client behavior up into AdbWrapper, which, while definitely an improvement (maybe event to the point that this is good enough), isn't quite what I'd like to see. (Maybe what I'd like to see isn't possible?) Ideally, I think we should either: - Have the TimeoutRetryThread handle timeouts directly - Have the TimeoutRetryThread wrap functions it's invoking s.t. they get timeout checking "for free" - Have clients compatible with the request explicitly declare themselves as such (via decorator, type, or something along those lines) and _only_ provide the Request... interface for those clients. Basically, I'm scared of clients silently ignoring the Request. (Admittedly, what you have now nicely solves this for everything using AdbWrapper.) https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:296: self._WaitFor(device_offline, wait_period=1, max_tries=5) On 2014/10/28 14:31:17, perezju wrote: > On 2014/10/27 19:15:36, jbudorick wrote: > > With the lambda, this becomes: > > > > self._WaitFor(lambda: not self.IsOnline(), ...) > > Yes, but if we use a named function then _WaitFor can also use the more > descriptive label "device_offline" in log messages. Ah. Stick with device_offline, then. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:941: return self.GetProp('ro.build.type', cache=True) On 2014/10/28 14:31:18, perezju wrote: > On 2014/10/27 19:15:36, jbudorick wrote: > > On 2014/10/27 17:43:34, perezju wrote: > > > This is the sort of thing that I propose to both: cache properties which > make > > > sense to cache, and provide clients with simple access to them. > > > > I'll have to think about this. It bloats the interface some, but clients don't > > have to worry about the name of the particular android property they're > > interested in. Leaning yes at the moment, though. > > > > In general, what do you think would be the threshold for switching from > GetProp > > to a property? > > I say we could start with properties that (1) are read-only (so makes sense to > cache them) and (2) clients actually want to access them. > > Currently this means (grepping through uses of GetProp and system_properties): > > ro.build.description > ro.build.fingerprint > ro.build.id > ro.build.product > ro.build.type > ro.build.version.sdk > ro.product.cpu.abi > ro.product.model > ro.product.name > ro.setupwizard.mode Sounds good to me. https://codereview.chromium.org/636273004/diff/20001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:968: if cache or cache_name in self._cache: On 2014/10/28 14:31:18, perezju wrote: > On 2014/10/27 19:15:36, jbudorick wrote: > > The second clause here is interesting. Should we update the cache or remove > the > > item from the cache? > > I say we update. From the point of view of the caller, cache=False means "I want > to get a real value directly from the device", while cache=True means "I'm happy > to get a cached value from a previous call, if any". Updating here seems less > wasteful and keeps both types of callers happy. Makes sense. https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:462: return '' On 2014/10/28 14:31:18, perezju wrote: > this is what "pm path" returns for non-existent packages, so thought it's better > to return an empty string rather than to raise a command error which might get > confused with adb or other device problems. > > I also made sure that this change also makes sense for all other callers with > single_line=true (thankfully not that many yet!). sgtm https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... File build/android/pylib/utils/reraiser_thread.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... build/android/pylib/utils/reraiser_thread.py:120: raise TimeoutError('Timed out waiting for %d of %d threads.' % One other thing to consider (which you may already have considered): can we do something here with the threads still in alive_threads? https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:25: self._timeout_expired = True On 2014/10/28 14:31:18, perezju wrote: > Have tried several names now, maybe RequestThreadTimeout would work better? Yeah, I like that better. That implies that we're asking the thread to do something, not telling it something, which seems more accurate. https://codereview.chromium.org/636273004/diff/40001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:69: child_thread.SetTimeoutAsExpired() I think we only want to do this if we catch a timeout exception.
On 2014/10/28 17:28:46, jbudorick wrote: > Ideally, I think we should either: > - Have the TimeoutRetryThread handle timeouts directly > - Have the TimeoutRetryThread wrap functions it's invoking s.t. they get > timeout checking "for free" > - Have clients compatible with the request explicitly declare themselves as > such (via decorator, type, or something along those lines) and _only_ provide > the Request... interface for those clients. I think that the first two are not possible because, once client code starts running on a thread, there is no (sensible) way to stop it or kill it from outside. And I don't see much different between "clients silently ignoring the Request" and what is currently happening anyway: clients don't "ignore" the request just because they don't get it, so they just keep running after the timeout anyway. Client code already tag themselves with @WithTimeoutAndRetries under the (false) assumption that their execution will be stopped with a timeout if they run for too long. So @WithTimeoutAndRetries is broken and we need to fix it. I wrote a lengthier discussion about this and a proposed solution on our shared DeviceUtils implementation notes https://docs.google.com/a/google.com/document/d/19Tv7Wbx0YipbJ3BDfxyKb5RUEvff...
On 2014/10/29 12:48:40, perezju wrote: > On 2014/10/28 17:28:46, jbudorick wrote: > > Ideally, I think we should either: > > - Have the TimeoutRetryThread handle timeouts directly > > - Have the TimeoutRetryThread wrap functions it's invoking s.t. they get > > timeout checking "for free" > > - Have clients compatible with the request explicitly declare themselves as > > such (via decorator, type, or something along those lines) and _only_ provide > > the Request... interface for those clients. > > I think that the first two are not possible because, once client code starts > running on a thread, there is no (sensible) way to stop it or kill it from > outside. > > And I don't see much different between "clients silently ignoring the Request" > and what is currently happening anyway: clients don't "ignore" the request just > because they don't get it, so they just keep running after the timeout anyway. Commented on the doc in more detail, but the general point is that this makes sense. > Client code already tag themselves with @WithTimeoutAndRetries under the > (false) assumption that their execution will be stopped with a timeout if they > run for too long. So @WithTimeoutAndRetries is broken and we need to fix it. > > I wrote a lengthier discussion about this and a proposed solution on our shared > DeviceUtils implementation notes > https://docs.google.com/a/google.com/document/d/19Tv7Wbx0YipbJ3BDfxyKb5RUEvff...
As discussed, the new approach to stop a TimeoutRetryThread. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:243: single_line=True, check_return=True) Note that this might throw a CommandFailedError if GetExternalStoragePath fails (the path is not set). Maybe this is what we want, because it doesn't make much sense to "wait" for this to change? (Hmm, actually not sure about what happens when the sd card is not ready, I've never seen this happen) https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:249: return False We need to catch CommandFailedError here because this is what we get when the pm is not ready (something like "Error. Is the package manager running?) https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:49: reraiser_thread.LogThreadStack(self) Note, we need to log here because the thread is going to kill itself and wont be "alive" for the reraiser_thread to even notice that a timeout has occurred (as far as the reraiser_thread is concerned, this thread has probably finished "in time"). There is small race condition (specially if a thread mixes the new "TimeoutThread" aware methods with direct low-level IO calls). I'm not entirely happy about this, but the worst that can happen is that we get the log printed twice (both from inside and outside the thread). I'm afraid a real solution would involve some reworking of how reraiser_thread works, but that's already too much for this CL. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:100: thread_group.JoinAll(child_thread.GetWatcher()) This is so that both the thread_group and the timeout_thread use the same stopwatch. Otherwise we have even more chances of funny race conditions.
Can't sign off on this yet, unfortunately. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... File build/android/pylib/device/adb_wrapper.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/adb_wrapper.py:51: """Wait for the specified ammount of seconds. nit: ammount -> amount https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/adb_wrapper.py:64: # no need to wait, we would time out anyway nit: Add a log message here (probably at warning) about timing out early. This is liable to confuse some people otherwise. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:113: return state == 'device' return self.adb.getState() == 'device' ? https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:243: single_line=True, check_return=True) On 2014/10/29 18:16:50, perezju wrote: > Note that this might throw a CommandFailedError if GetExternalStoragePath fails > (the path is not set). Maybe this is what we want, because it doesn't make much > sense to "wait" for this to change? (Hmm, actually not sure about what happens > when the sd card is not ready, I've never seen this happen) I've never seen this either, so I have no idea. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:289: str(self), condition.__name__, thread.ElapsedTime(), thread.name) no need for the thread name, we already print it when we log. If you look at the provision_devices stdio from your trybot run, you'll see these: I 44.300s TimeoutThread-1-for-ProvisionDevice(009eabcbcada0bae) (device: 009eabcbcada0bae) condition 'pm_ready' not yet met (42.99s) at TimeoutThread-1-for-ProvisionDevice(009eabcbcada0bae) https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:293: str(self), condition.__name__, thread.ElapsedTime(), thread.name) ditto https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:462: return '' What's up with this? If a caller requests a single line but gets no lines, why shouldn't it be an error...? https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:14: class TimeoutRetryThread(reraiser_thread.ReraiserThread): This interface feels bloated. Why can't we just narrow it down to CheckTimeout()? Everything else seems ... unnecessary. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:53: def GetTimeoutThread(): Nit: rename this, it sounds like it's creating a timeout thread. Maybe "CurrentTimeoutThread" to mirror current_thread? (fine with the functionality, though) https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:91: num_try = 1 I'm guessing you're trying to "correct" the retries=3 retrying 3 times for a total of 4 tries. Leave this the way it is. Even if we do want to do this, this CL is probably doing too much already. We can talk about this next week. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... File build/android/pylib/utils/watchdog_timer.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/watchdog_timer.py:23: self._start_time = None This should still set self._start_time to time.time(). Otherwise, users _have_ to call Reset() before use, which seems counterintuitive.
John, thanks for pushing back on the review yesterday. It did give me time to think about the implementation quite a bit better. In the end I decided to promote DeviceUtils._WaitFor to a more generic timeout_retry.WaitFor. Hope you don't hate me for it :P, I think it keeps things a bit more contained, and reduces some of the bloat. Let me know what you think. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... File build/android/pylib/device/adb_wrapper.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/adb_wrapper.py:64: # no need to wait, we would time out anyway On 2014/10/30 02:27:02, jbudorick wrote: > nit: Add a log message here (probably at warning) about timing out early. This > is liable to confuse some people otherwise. This is much better now in my new proposal. An exception will be raised with a message reading something like: "CommandTimeoutError: Early timeout waiting for 'pm_ready', wait of 5.0 secs required but only 2.5 secs left" https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... File build/android/pylib/device/device_utils.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:113: return state == 'device' On 2014/10/30 02:27:03, jbudorick wrote: > return self.adb.getState() == 'device' > > ? :S .. probably a left-over from some (now missing) debug code. Fixed now. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:289: str(self), condition.__name__, thread.ElapsedTime(), thread.name) On 2014/10/30 02:27:03, jbudorick wrote: > no need for the thread name, we already print it when we log. If you look at the > provision_devices stdio from your trybot run, you'll see these: > > I 44.300s TimeoutThread-1-for-ProvisionDevice(009eabcbcada0bae) (device: > 009eabcbcada0bae) condition 'pm_ready' not yet met (42.99s) at > TimeoutThread-1-for-ProvisionDevice(009eabcbcada0bae) Looking at that message, we don't need to include the device serial either, which was essentially the only use of "self" here. This is what lead me to think about a more general function better placed at timeout_retry.WaitFor. Have a look at it on my new proposal. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/devi... build/android/pylib/device/device_utils.py:462: return '' On 2014/10/30 02:27:03, jbudorick wrote: > What's up with this? If a caller requests a single line but gets no lines, why > shouldn't it be an error...? I explained this on a previous review of the code, but I had another thought about it again anyway. The issue is that some commands, like "pm path <package>", return with no output if the package is not installed, and clients of GetApplicationPath expect a return value of None in this case (which makes a lot of sense). If we raise an error, then callers have a hard time figuring out whether the error was raised due to empty output, or because the output consisted, e.g., of a large dump of data due to some other error (which should genuinely raise all sorts of error alarms!). So I think that returning an empty string when the output is empty is the most useful approach. Many callers have to check, anyway, whether the value returned by the command is or not empty (e.g. in GetExternalStoragePath). And it further seems perfectly acceptable to return an empty string if the output was an empty line (i.e. "\n", like "getprop ro.wifi.channels" does) without raising errors. This behavior is also consistent with the idea that both an output of "hello" and "hello\n" return the string "hello". Which again seems like a better compromise than raising an exception if the command forgot to put the endline in the output. I guess that if a client *really* wants to distinguish between "" and "\n" in the output, they might as well just call with single_line=False. In the new patch I've updated the doc-string of this method, hopefully making it more clear of what is the expected behavior. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:14: class TimeoutRetryThread(reraiser_thread.ReraiserThread): On 2014/10/30 02:27:03, jbudorick wrote: > This interface feels bloated. Why can't we just narrow it down to > CheckTimeout()? Everything else seems ... unnecessary. Yeah ..., have a look at the new one. I hope it's better. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:53: def GetTimeoutThread(): On 2014/10/30 02:27:03, jbudorick wrote: > Nit: rename this, it sounds like it's creating a timeout thread. Maybe > "CurrentTimeoutThread" to mirror current_thread? > > (fine with the functionality, though) Done. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/timeout_retry.py:91: num_try = 1 On 2014/10/30 02:27:03, jbudorick wrote: > I'm guessing you're trying to "correct" the retries=3 retrying 3 times for a > total of 4 tries. Leave this the way it is. > > Even if we do want to do this, this CL is probably doing too much already. We > can talk about this next week. Mmm, nope, this was also the original behavior. This change is only to count threads like 1, 2, 3, 4; rather than 3, 2, 1, 0. Because I've also added the number to the name of the thread, I think this makes more sense when seen in the logs. https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... File build/android/pylib/utils/watchdog_timer.py (right): https://codereview.chromium.org/636273004/diff/80001/build/android/pylib/util... build/android/pylib/utils/watchdog_timer.py:23: self._start_time = None On 2014/10/30 02:27:03, jbudorick wrote: > This should still set self._start_time to time.time(). Otherwise, users _have_ > to call Reset() before use, which seems counterintuitive. Yeah, I scrapped much of this in the new version. https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... File build/android/pylib/utils/timeout_retry.py (right): https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... build/android/pylib/utils/timeout_retry.py:164: child_thread.LogTimeoutException() Note that this fixes the "double log" problem I mentioned yesterday. Now either rereaiser_thread logs a stack dump if the thread kept running after the timeout or, otherwise, we catch the thread timing out itself here. Both log calls are "outside" on the main thread, so the chances of having a race condition become essentially nil.
lgtm w/ nit Let's try to keep the CLs a little smaller next time :) https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... File build/android/pylib/utils/watchdog_timer.py (right): https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... build/android/pylib/utils/watchdog_timer.py:36: return self._start_time + self._timeout - time.time() nit: self._timeout - self.GetElapsed()?
On 2014/10/30 19:00:14, jbudorick wrote: > lgtm w/ nit > > Let's try to keep the CLs a little smaller next time :) > "smaller" -> "more focused". Ideally, this would have been: - the child thread timeout issue in timeout_retry - the conversion in device_utils More focused CLs -> easier to review -> quicker to approve -> :) > https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... > File build/android/pylib/utils/watchdog_timer.py (right): > > https://codereview.chromium.org/636273004/diff/100001/build/android/pylib/uti... > build/android/pylib/utils/watchdog_timer.py:36: return self._start_time + > self._timeout - time.time() > nit: self._timeout - self.GetElapsed()?
The CQ bit was checked by perezju@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/636273004/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/8851b7f06044d506745f0a0547e247cb6660f1d9 Cr-Commit-Position: refs/heads/master@{#302250} |