|
|
Created:
7 years, 5 months ago by Philippe Modified:
7 years, 5 months ago CC:
chromium-reviews, craigdh+watch_chromium.org, kkania, chrome-speed-team+watch_google.com, bulach+watch_chromium.org, yfriedman+watch_chromium.org, ilevy-cc_chromium.org, klundberg+watch_chromium.org, telemetry+watch_chromium.org, frankf+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionMove Python setup/tear down logic into Forwarder itself.
Forwarder used to be a pain to setup/tear down across all the various
harnesses.
This CL should hopefully solve these issues by hiding these implementation
details. The host daemon is now killed once the first time that the Forwarder
class is used and the daemon running on the devices is also killed the first
time a port is forwarded for a specific device.
BUG=242846
R=bulach@chromium.org, frankf@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=212020
Patch Set 1 #Patch Set 2 : #
Total comments: 13
Patch Set 3 : Address Frank's comments + use multiprocess lock #Patch Set 4 : Use file lock #
Total comments: 6
Patch Set 5 : Implement Marcus' idea #
Total comments: 34
Patch Set 6 : Address Marcus' and Frank's comments #Patch Set 7 : Do not share lock instance #Patch Set 8 : Update outdated comment #
Total comments: 2
Patch Set 9 : Address Frank's comments #
Total comments: 2
Patch Set 10 : Address Marcus' comment #Messages
Total messages: 32 (0 generated)
Thanks pliard, this is much cleaner. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/base/b... File build/android/pylib/base/base_test_runner.py (right): https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/base/b... build/android/pylib/base/base_test_runner.py:146: Forwarder.Map(port_pairs, self.adb, self.build_type, self.tool) Just inline ForwardPorts, UnmapPorts, ForwardPortsForHttpServer? https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:87: def UnmapDevicePort(device_port, adb): Is it too dangerous to add a convenience method to unmap all the ports? It's a common use case. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:94: Forwarder._GetInstanceLocked(None, None)._UnmapDevicePortInternalLocked( What happens if Unmap is called before Map? https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:110: def __init__(self, build_type, tool): This is private and should be called once, maybe do: assert not Forwarder._instance https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:124: self._build_type = 'Release' if self._build_type == 'Debug' else 'Debug' Why don't we know the definite path to the binary? https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:131: def _InitDeviceOnceLocked(self, adb, tool): Can you add brief docstrings for all methods in this module?
I also moved to multiprocessing.Lock which should address the multiprocess synchronization issue. ipcs didn't show any named IPCs for the python processes. I believe multiprocessing.Lock uses an unnamed semaphore on Linux which should not be too flaky. I tried to kill the test suite aggressively and didn't see any leftovers or any issue with the next run. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/base/b... File build/android/pylib/base/base_test_runner.py (right): https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/base/b... build/android/pylib/base/base_test_runner.py:146: Forwarder.Map(port_pairs, self.adb, self.build_type, self.tool) On 2013/07/10 18:15:38, frankf wrote: > Just inline ForwardPorts, UnmapPorts, ForwardPortsForHttpServer? These methods are exposed to clients and proved to be quite valuable (as an intermediate layer). For instance we have some code downstream using them (or StartForwarder() below to be exact) rather than Forwarder directly which makes any change in Forwarder painless. What do you think? https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:87: def UnmapDevicePort(device_port, adb): On 2013/07/10 18:15:38, frankf wrote: > Is it too dangerous to add a convenience method to unmap all the ports? It's a > common use case. Good idea. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:94: Forwarder._GetInstanceLocked(None, None)._UnmapDevicePortInternalLocked( On 2013/07/10 18:15:38, frankf wrote: > What happens if Unmap is called before Map? An error would be logged but no exception would be thrown. We used to throw an exception but it proved to be too aggressive. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:110: def __init__(self, build_type, tool): On 2013/07/10 18:15:38, frankf wrote: > This is private and should be called once, maybe do: > > assert not Forwarder._instance Good idea. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:124: self._build_type = 'Release' if self._build_type == 'Debug' else 'Debug' On 2013/07/10 18:15:38, frankf wrote: > Why don't we know the definite path to the binary? In theory we do if the client provides the build_type. If he doesn't we default to 'Debug' and fallback to 'Release' if the host_forwarder binary wasn't found. A bunch of clients don't have the build type around. I found this a reasonable compromise. What do you think? https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:131: def _InitDeviceOnceLocked(self, adb, tool): On 2013/07/10 18:15:38, frankf wrote: > Can you add brief docstrings for all methods in this module? Done. https://codereview.chromium.org/18086004/diff/6001/build/android/pylib/forwar... build/android/pylib/forwarder.py:147: def _UnmapDevicePortInternalLocked(self, device_port, adb): FYI, I inlined this method in the latest patch set since it now has a single caller.
sorry, I should've seen this yesterday, my bad!! for the reasons explained in the other patch, this won't work as is (multiprocess forks, doesn't use threads..) So let me try to take a step back. there are two scenarios: 1) forwarder started from a non-sharded run: kill everything! 2) forwarder started from a sharded run: only the first one should kill everything. There's no easy way to do synchronization using multiprocess itself, we'd need to pass a bunch of params through many layers, at which point the "cleanup before sharding" becomes cleaner... So here's my proposal: 1) upon forwarder.Run, flock os.getppid() (note: parent!) 1.1) if it succeeds, it means this is either the first process in a shard, or a non-sharded runner: kill all forwarders! 1.2) if it fails, fair enough, no need to kill anything. 2) should the process die unexpectedly, the flock would be automatically removed, then subsequent runs will do the cleanup.. would that work?
On 2013/07/11 09:36:02, bulach wrote: > sorry, I should've seen this yesterday, my bad!! > > for the reasons explained in the other patch, this won't work as is > (multiprocess forks, doesn't use threads..) > > So let me try to take a step back. there are two scenarios: > 1) forwarder started from a non-sharded run: kill everything! > 2) forwarder started from a sharded run: only the first one should kill > everything. > > There's no easy way to do synchronization using multiprocess itself, we'd need > to pass a bunch of params through many layers, at which point the "cleanup > before sharding" becomes cleaner... > > So here's my proposal: > 1) upon forwarder.Run, flock os.getppid() (note: parent!) > 1.1) if it succeeds, it means this is either the first process in a shard, or a > non-sharded runner: kill all forwarders! > 1.2) if it fails, fair enough, no need to kill anything. > 2) should the process die unexpectedly, the flock would be automatically > removed, then subsequent runs will do the cleanup.. > > would that work? No problem :) I clearly see the issue with a mutex but I'm not sure I see why a semaphore (multiprocessing.Lock) wouldn't work? Can you elaborate? :)
actually, we just need a known file name :) something like: import os f = '/tmp/forwarder' a = open(f, 'w') l = os.system('flock -w 0 -e ' + str(a.fileno())) print l does it look promising? :)
I think multiprocessing.Lock has to be passed down from the parent process, right? that is, all the sharders "parent" would need to create the lock, then pass all the way down to forwarders, at which point we're back to the same point :) the advantage of "flock" is that it can be done entirely at the forwarder level, I suppose.
I have just uploaded a new patch set using a file lock. As Marcus said before we can't share a single multiprocessing.Lock instance across processes without passing it around.
On 2013/07/11 13:58:48, Philippe wrote: > I have just uploaded a new patch set using a file lock. As Marcus said before we > can't share a single multiprocessing.Lock instance across processes without > passing it around. And I have just realized that this would not work either :) So please hold on.
On 2013/07/11 14:00:55, Philippe wrote: > On 2013/07/11 13:58:48, Philippe wrote: > > I have just uploaded a new patch set using a file lock. As Marcus said before > we > > can't share a single multiprocessing.Lock instance across processes without > > passing it around. > > And I have just realized that this would not work either :) So please hold on. So we've moved to multi-threading in base/shard.py, and host-driven tests will soon be refactored as well. If there are not other multi-process callers (telemetry?) then this is not something we have to worry about long term. If this the case, maybe just hack host-driven tests, until the refactoring CLs land (couple of weeks).
telemetry and all perf tests uses multiprocess: http://src.chromium.org/viewvc/chrome/trunk/src/build/android/bb_run_sharded_... it'd be nice not to, but I don't see this happening any time soon unfortunately :(
sorry, I should clarify: all perf tests downstream.. upstream they are not at all sharded, there's some discussion going on in terms of the buildbot, but I presume it'll use roughly the same mechanism...
On 2013/07/11 16:24:06, bulach wrote: > sorry, I should clarify: all perf tests downstream.. > > upstream they are not at all sharded, there's some discussion going on in terms > of the buildbot, but I presume it'll use roughly the same mechanism... While it would be easier here not to use multiprocessing I expect that it could be a lot of work to run multiple instances of telemetry (e.g. on the perf bots downstream) in a same process (in different threads) if telemetry/pylib and other dependencies were not designed with thread-safety in mind. The problem here in this CL is that we don't want do end up killing the device_forwarder daemon twice on a same device nor the host_process daemon. The fact that we don't share any state across the processes makes this harder. As Marcus pointed out offline there is no risk of killing the device_forwarder daemon twice on a same device since there is exactly one shard for each device. So the only risk is to kill the host_forwarder daemon twice. Marcus suggested that we use the locked file's content to store the PPID of the process which initialized (i.e. killed) the host forwarder with its start time (to workaround the corner case where a process reuses a previous PPID). This sounds like the best option here. I'm just concerned about the corner case where the parent process uses the forwarder before one of its children does. This should not happen in practice I believe. The parent process should do only a very limited amount of work. I would not expect it to use the forwarder. What do you guys think?
thanks phillippe! I like moving all this logic into the forwarder, certainly hides all the details... yes: device forwarders can be freely killed (we never run more than one thing on one device), but I think there's still some more refinement here for the host case... afaict, this would make the shards run serially, but I may be missing something.. more inline: https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) hmm, I think this needs: fcntl.LOCK_EX | fcntl.LOCK_NB the scenario I'm thinking is: 1) first shard starts, acquires the ex-lock. 2) second shard starts, blocks until first shard ends. I guess it works fine, but entirely breaks the idea of sharding :) unfortunately, I think we need to: - shard N comes along, tries to acquire the lock. -- if it does NOT acquire the lock, life is good: a live-shard has already initialized, happily proceed! -- it it DOES acquire the lock ---- check the content of the file. ---- if it's NOT the same PPID, proceed with the first-time initialization. ---- if it IS the same PPID, stat the file and the ppid, and check: -------- file AFTER ppid, life is good: a sibling shard has initialized, just proceed. -------- file BEFORE ppid: unluckiest case! but doing first-time initialization here should be safe.
https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) Are you suggesting the first process holds the lock during its entire lifetime? Why can't you do a blocking exclusive lock, but then release it after you update the file? On 2013/07/12 11:48:26, bulach wrote: > hmm, I think this needs: > fcntl.LOCK_EX | fcntl.LOCK_NB > > the scenario I'm thinking is: > 1) first shard starts, acquires the ex-lock. > 2) second shard starts, blocks until first shard ends. > I guess it works fine, but entirely breaks the idea of sharding :) > > unfortunately, I think we need to: > - shard N comes along, tries to acquire the lock. > -- if it does NOT acquire the lock, life is good: a live-shard has already > initialized, happily proceed! > -- it it DOES acquire the lock > ---- check the content of the file. > ---- if it's NOT the same PPID, proceed with the first-time initialization. > ---- if it IS the same PPID, stat the file and the ppid, and check: > -------- file AFTER ppid, life is good: a sibling shard has initialized, just > proceed. > -------- file BEFORE ppid: unluckiest case! but doing first-time initialization > here should be safe.
https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) On 2013/07/12 21:56:56, frankf wrote: > Are you suggesting the first process holds the lock during its entire lifetime? > Why can't you do a blocking exclusive lock, but then release it after you update > the file? > > On 2013/07/12 11:48:26, bulach wrote: > > hmm, I think this needs: > > fcntl.LOCK_EX | fcntl.LOCK_NB > > > > the scenario I'm thinking is: > > 1) first shard starts, acquires the ex-lock. > > 2) second shard starts, blocks until first shard ends. > > I guess it works fine, but entirely breaks the idea of sharding :) > > > > unfortunately, I think we need to: > > - shard N comes along, tries to acquire the lock. > > -- if it does NOT acquire the lock, life is good: a live-shard has already > > initialized, happily proceed! > > -- it it DOES acquire the lock > > ---- check the content of the file. > > ---- if it's NOT the same PPID, proceed with the first-time initialization. > > ---- if it IS the same PPID, stat the file and the ppid, and check: > > -------- file AFTER ppid, life is good: a sibling shard has initialized, just > > proceed. > > -------- file BEFORE ppid: unluckiest case! but doing first-time > initialization > > here should be safe. > afaict, 48 below does hold the EX lock during the entire lifetime, no? because it's blocking, it serializes the shards. :) but yeah, holding an EX lock for shorter is indeed much better: on the cases I mentioned above, the very first one would fail if the first shard is alive and has acquired the lock but didn't fully initialize yet :( So something like: - acquire the EX lock -- read the ppid -- if empty, or ppid not the same, or older than ppid: ---- proceed with first-time initialization -- write the ppid - release the EX lock This would protect the shards from: 1) first shard initializes correctly, even if previous runs have failed. 2) other shards won't reset anything (even if the first one terminates earlier). 3) two shards will be serialized only during forwarder initialization. would that work?
https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) On 2013/07/15 09:32:30, bulach wrote: > On 2013/07/12 21:56:56, frankf wrote: > > Are you suggesting the first process holds the lock during its entire > lifetime? > > Why can't you do a blocking exclusive lock, but then release it after you > update > > the file? > > > > On 2013/07/12 11:48:26, bulach wrote: > > > hmm, I think this needs: > > > fcntl.LOCK_EX | fcntl.LOCK_NB > > > > > > the scenario I'm thinking is: > > > 1) first shard starts, acquires the ex-lock. > > > 2) second shard starts, blocks until first shard ends. > > > I guess it works fine, but entirely breaks the idea of sharding :) > > > > > > unfortunately, I think we need to: > > > - shard N comes along, tries to acquire the lock. > > > -- if it does NOT acquire the lock, life is good: a live-shard has already > > > initialized, happily proceed! > > > -- it it DOES acquire the lock > > > ---- check the content of the file. > > > ---- if it's NOT the same PPID, proceed with the first-time initialization. > > > ---- if it IS the same PPID, stat the file and the ppid, and check: > > > -------- file AFTER ppid, life is good: a sibling shard has initialized, > just > > > proceed. > > > -------- file BEFORE ppid: unluckiest case! but doing first-time > > initialization > > > here should be safe. > > > > afaict, 48 below does hold the EX lock during the entire lifetime, no? because > it's blocking, it serializes the shards. :) > > but yeah, holding an EX lock for shorter is indeed much better: on the cases I > mentioned above, the very first one would fail if the first shard is alive and > has acquired the lock but didn't fully initialize yet :( > > So something like: > - acquire the EX lock > -- read the ppid > -- if empty, or ppid not the same, or older than ppid: > ---- proceed with first-time initialization > -- write the ppid > - release the EX lock > > This would protect the shards from: > 1) first shard initializes correctly, even if previous runs have failed. > 2) other shards won't reset anything (even if the first one terminates earlier). > 3) two shards will be serialized only during forwarder initialization. > > > would that work? Wouldn't the lock be held only in a with statement? Line 48 should only open the file (that was my intention at least :)). The critical sections should be fast and I would not expect any contention. Therefore I don't think this would impact sharding performance especially if we don't forward thousands of ports every second :) What do you think?
https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) On 2013/07/15 09:44:29, pliard wrote: > On 2013/07/15 09:32:30, bulach wrote: > > On 2013/07/12 21:56:56, frankf wrote: > > > Are you suggesting the first process holds the lock during its entire > > lifetime? > > > Why can't you do a blocking exclusive lock, but then release it after you > > update > > > the file? > > > > > > On 2013/07/12 11:48:26, bulach wrote: > > > > hmm, I think this needs: > > > > fcntl.LOCK_EX | fcntl.LOCK_NB > > > > > > > > the scenario I'm thinking is: > > > > 1) first shard starts, acquires the ex-lock. > > > > 2) second shard starts, blocks until first shard ends. > > > > I guess it works fine, but entirely breaks the idea of sharding :) > > > > > > > > unfortunately, I think we need to: > > > > - shard N comes along, tries to acquire the lock. > > > > -- if it does NOT acquire the lock, life is good: a live-shard has already > > > > initialized, happily proceed! > > > > -- it it DOES acquire the lock > > > > ---- check the content of the file. > > > > ---- if it's NOT the same PPID, proceed with the first-time > initialization. > > > > ---- if it IS the same PPID, stat the file and the ppid, and check: > > > > -------- file AFTER ppid, life is good: a sibling shard has initialized, > > just > > > > proceed. > > > > -------- file BEFORE ppid: unluckiest case! but doing first-time > > > initialization > > > > here should be safe. > > > > > > > afaict, 48 below does hold the EX lock during the entire lifetime, no? because > > it's blocking, it serializes the shards. :) > > > > but yeah, holding an EX lock for shorter is indeed much better: on the cases I > > mentioned above, the very first one would fail if the first shard is alive and > > has acquired the lock but didn't fully initialize yet :( > > > > So something like: > > - acquire the EX lock > > -- read the ppid > > -- if empty, or ppid not the same, or older than ppid: > > ---- proceed with first-time initialization > > -- write the ppid > > - release the EX lock > > > > This would protect the shards from: > > 1) first shard initializes correctly, even if previous runs have failed. > > 2) other shards won't reset anything (even if the first one terminates > earlier). > > 3) two shards will be serialized only during forwarder initialization. > > > > > > would that work? > > Wouldn't the lock be held only in a with statement? Line 48 should only open the > file (that was my intention at least :)). The critical sections should be fast > and I would not expect any contention. Therefore I don't think this would impact > sharding performance especially if we don't forward thousands of ports every > second :) What do you think? of course! you're right, the lock would only take place on "with".. my bad, sorry... :) however... what happens once the Nth shard comes along after the first one (that acquired the lock and killed the host) has terminated? wouldn't it kill the forwarder, despite it still being potentially in use for [1, N-1] other shards?
https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/24001/build/android/pylib/forwa... build/android/pylib/forwarder.py:33: fcntl.flock(self._file, fcntl.LOCK_EX) On 2013/07/15 10:20:03, bulach wrote: > On 2013/07/15 09:44:29, pliard wrote: > > On 2013/07/15 09:32:30, bulach wrote: > > > On 2013/07/12 21:56:56, frankf wrote: > > > > Are you suggesting the first process holds the lock during its entire > > > lifetime? > > > > Why can't you do a blocking exclusive lock, but then release it after you > > > update > > > > the file? > > > > > > > > On 2013/07/12 11:48:26, bulach wrote: > > > > > hmm, I think this needs: > > > > > fcntl.LOCK_EX | fcntl.LOCK_NB > > > > > > > > > > the scenario I'm thinking is: > > > > > 1) first shard starts, acquires the ex-lock. > > > > > 2) second shard starts, blocks until first shard ends. > > > > > I guess it works fine, but entirely breaks the idea of sharding :) > > > > > > > > > > unfortunately, I think we need to: > > > > > - shard N comes along, tries to acquire the lock. > > > > > -- if it does NOT acquire the lock, life is good: a live-shard has > already > > > > > initialized, happily proceed! > > > > > -- it it DOES acquire the lock > > > > > ---- check the content of the file. > > > > > ---- if it's NOT the same PPID, proceed with the first-time > > initialization. > > > > > ---- if it IS the same PPID, stat the file and the ppid, and check: > > > > > -------- file AFTER ppid, life is good: a sibling shard has initialized, > > > just > > > > > proceed. > > > > > -------- file BEFORE ppid: unluckiest case! but doing first-time > > > > initialization > > > > > here should be safe. > > > > > > > > > > afaict, 48 below does hold the EX lock during the entire lifetime, no? > because > > > it's blocking, it serializes the shards. :) > > > > > > but yeah, holding an EX lock for shorter is indeed much better: on the cases > I > > > mentioned above, the very first one would fail if the first shard is alive > and > > > has acquired the lock but didn't fully initialize yet :( > > > > > > So something like: > > > - acquire the EX lock > > > -- read the ppid > > > -- if empty, or ppid not the same, or older than ppid: > > > ---- proceed with first-time initialization > > > -- write the ppid > > > - release the EX lock > > > > > > This would protect the shards from: > > > 1) first shard initializes correctly, even if previous runs have failed. > > > 2) other shards won't reset anything (even if the first one terminates > > earlier). > > > 3) two shards will be serialized only during forwarder initialization. > > > > > > > > > would that work? > > > > Wouldn't the lock be held only in a with statement? Line 48 should only open > the > > file (that was my intention at least :)). The critical sections should be fast > > and I would not expect any contention. Therefore I don't think this would > impact > > sharding performance especially if we don't forward thousands of ports every > > second :) What do you think? > > of course! you're right, the lock would only take place on "with".. my bad, > sorry... :) > > however... what happens once the Nth shard comes along after the first one (that > acquired the lock and killed the host) has terminated? wouldn't it kill the > forwarder, despite it still being potentially in use for [1, N-1] other shards? > If we use the locked file just as you said (i.e. including the ppid and the process start time), we would be covered, right? :) To be clear, the current patch set doesn't contain yet a working version (since it doesn't deal with ppid...) but this is planned :)
got, thanks! :) we'll need the ppid checking otherwise it'd break the sharded telemetry tests downstream.. but thankfully this will now be self-contained in forwarder.py itself!
On 2013/07/15 10:54:40, bulach wrote: > got, thanks! :) we'll need the ppid checking otherwise it'd break the sharded > telemetry tests downstream.. but thankfully this will now be self-contained in > forwarder.py itself! I have just realized that in case we don't use multi-processing for sharding (e.g. test_tunner) the PPID will be the same (=shell PID) for a while in a non-buildbot environment (e.g. on a developer's workstation). On buildbot I assume that there is a bunch of forks before the bot ends up running the test suite. Ideally we should have a way to detect programmatically if multi-processing sharding is used. My ugly suggestion is that we could set a global variable or an environment variable in bb_run_sharded_steps.py (I told you it was ugly :)) In that case we would use the PID rather than the PPID. What do you think?
sigh, good point! :( I think the python-driven is about to move away of multi-process: https://codereview.chromium.org/18444004/ the perf sharder (bb_run_sharded_steps.py) already forcefully pass --keep_test_server_ports option to all of its shards, so it wouldn't be too problematic to set an env variable there too..
PTAL :)
lgtm, thanks! just some nits and suggestions below, it looks this will be robust for all cases we need! https://codereview.chromium.org/18086004/diff/51001/build/android/bb_run_shar... File build/android/bb_run_sharded_steps.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/bb_run_shar... build/android/bb_run_sharded_steps.py:203: forwarder.Forwarder.use_multiprocessing() nit: UseMultiprocessing(), but more below! https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:189: self._InitHostOnceLocked() nit: maybe move 189 below, i.e., do all member initialization then start calling functions? https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:194: def _GetPid(): nit: maybe GetPidForLock() ? https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:214: current_pid = str(Forwarder._GetPid()) nit: maybe pid_for_lock? (current_pid is a big confusing..) https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:215: with open(Forwarder._LOCK_PATH, 'r') as pid_file: w+ ? https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:229: self._host_forwarder_path), 'Please build forwarder2' nit: I suppose 222-229 could be moved to the ctor itself? it'd be simpler to read this.. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:231: pid_file.close() pid_file.seek(0), and avoid re-opening it below. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:235: pid_file.flush() doesn't "with" take care of flushing?
https://codereview.chromium.org/18086004/diff/51001/build/android/adb_reverse... File build/android/adb_reverse_forwarder.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/adb_reverse... build/android/adb_reverse_forwarder.py:61: Forwarder.UnmapDevicePort(device_port, adb) You can use UnmapPorts() https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/base/... File build/android/pylib/base/base_test_runner.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/base/... build/android/pylib/base/base_test_runner.py:162: def ForwardPortsForHttpServer(self): So none of these methods should be public right? Some are protected and are used in the PerfTestRunner-derived classes downstream. Please follow the naming convention: https://www.corp.google.com/eng/doc/pyguide.xml#Naming https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:24: return open('/proc/%s/stat' % pid, 'r').readline().split(' ')[21] Can you use psutil: https://code.google.com/p/psutil/wiki/Documentation https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:140: def UnmapPorts(port_pairs, adb): To clarify my earlier comment: In many cases, you just want to unmap all the ports, so it makes sense to have a method that doesn't take the port as a parmater, and just uses the _device_to_host_port_map dict. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:205: def _InitHostOnceLocked(self): To be consistent remove the 'Once' from method names. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:218: (pid, process_start_time) = pid_with_start_time.split(':') where's process_start_time used? https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... File chrome/test/chromedriver/run_py_tests.py (left): https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... chrome/test/chromedriver/run_py_tests.py:146: ChromeDriverTest._forwarder = forwarder.Forwarder(ChromeDriverTest._adb, You need to rebase, KillHost was added here. https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... File chrome/test/chromedriver/test_environment.py (right): https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... chrome/test/chromedriver/test_environment.py:97: forwarder.Forwarder.Map( Same here. https://codereview.chromium.org/18086004/diff/51001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/chrome/adb_commands.py (right): https://codereview.chromium.org/18086004/diff/51001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/chrome/adb_commands.py:156: class Forwarder(object): Is this only used by telemetry? Why an extra wrapper?
Thanks guys! In addition to addressing your comments, I also slightly modified the device to host and host to device port maps in the Forwarder class. They were not using the device serial which can be problematic if we forward the same device port on different devices. https://codereview.chromium.org/18086004/diff/51001/build/android/adb_reverse... File build/android/adb_reverse_forwarder.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/adb_reverse... build/android/adb_reverse_forwarder.py:61: Forwarder.UnmapDevicePort(device_port, adb) On 2013/07/15 22:23:22, frankf wrote: > You can use UnmapPorts() Good point. https://codereview.chromium.org/18086004/diff/51001/build/android/bb_run_shar... File build/android/bb_run_sharded_steps.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/bb_run_shar... build/android/bb_run_sharded_steps.py:203: forwarder.Forwarder.use_multiprocessing() On 2013/07/15 17:51:43, bulach wrote: > nit: UseMultiprocessing(), but more below! Oops :) https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/base/... File build/android/pylib/base/base_test_runner.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/base/... build/android/pylib/base/base_test_runner.py:162: def ForwardPortsForHttpServer(self): On 2013/07/15 22:23:22, frankf wrote: > So none of these methods should be public right? Some are protected and are used > in the PerfTestRunner-derived classes downstream. Please follow the naming > convention: > > https://www.corp.google.com/eng/doc/pyguide.xml#Naming It seems that we are not completely following the guidelines by prefixing private methods with '_' rather than '__' (double underscore). This means that we have no way for distinguishing private from protected methods, right? I prefixed them with '_'. Let me know if this is not what you expected. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:24: return open('/proc/%s/stat' % pid, 'r').readline().split(' ')[21] On 2013/07/15 22:23:22, frankf wrote: > Can you use psutil: https://code.google.com/p/psutil/wiki/Documentation Yes, thanks. I thought that this was a third-party library. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:140: def UnmapPorts(port_pairs, adb): On 2013/07/15 22:23:22, frankf wrote: > To clarify my earlier comment: In many cases, you just want to unmap all the > ports, so it makes sense to have a method that doesn't take the port as a > parmater, and just uses the _device_to_host_port_map dict. Done. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:189: self._InitHostOnceLocked() On 2013/07/15 17:51:43, bulach wrote: > nit: maybe move 189 below, i.e., do all member initialization then start calling > functions? Good idea. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:194: def _GetPid(): On 2013/07/15 17:51:43, bulach wrote: > nit: maybe GetPidForLock() ? Done. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:205: def _InitHostOnceLocked(self): On 2013/07/15 22:23:22, frankf wrote: > To be consistent remove the 'Once' from method names. Done. Note that I meant 'only one time' by 'once' rather than 'after the lock was acquired' :) But I removed it anyway since it was ambiguous. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:214: current_pid = str(Forwarder._GetPid()) On 2013/07/15 17:51:43, bulach wrote: > nit: maybe pid_for_lock? (current_pid is a big confusing..) Done. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:215: with open(Forwarder._LOCK_PATH, 'r') as pid_file: On 2013/07/15 17:51:43, bulach wrote: > w+ ? I had a fun debugging session with these file modes :) 'w+' would overwrite the file if it already exists which would remove its content. I used the open syscall directly with explicit flags and then passed the file descriptor to fdopen. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:218: (pid, process_start_time) = pid_with_start_time.split(':') On 2013/07/15 22:23:22, frankf wrote: > where's process_start_time used? Great catch! This was a bug. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:229: self._host_forwarder_path), 'Please build forwarder2' On 2013/07/15 17:51:43, bulach wrote: > nit: I suppose 222-229 could be moved to the ctor itself? it'd be simpler to > read this.. Yes, good idea. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:231: pid_file.close() On 2013/07/15 17:51:43, bulach wrote: > pid_file.seek(0), and avoid re-opening it below. Done. https://codereview.chromium.org/18086004/diff/51001/build/android/pylib/forwa... build/android/pylib/forwarder.py:235: pid_file.flush() On 2013/07/15 17:51:43, bulach wrote: > doesn't "with" take care of flushing? Yeah, indeed. I used not to have this 'with' statement. https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... File chrome/test/chromedriver/run_py_tests.py (left): https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... chrome/test/chromedriver/run_py_tests.py:146: ChromeDriverTest._forwarder = forwarder.Forwarder(ChromeDriverTest._adb, On 2013/07/15 22:23:22, frankf wrote: > You need to rebase, KillHost was added here. Indeed :) https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... File chrome/test/chromedriver/test_environment.py (right): https://codereview.chromium.org/18086004/diff/51001/chrome/test/chromedriver/... chrome/test/chromedriver/test_environment.py:97: forwarder.Forwarder.Map( On 2013/07/15 22:23:22, frankf wrote: > Same here. Done. https://codereview.chromium.org/18086004/diff/51001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/chrome/adb_commands.py (right): https://codereview.chromium.org/18086004/diff/51001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/chrome/adb_commands.py:156: class Forwarder(object): On 2013/07/15 22:23:22, frankf wrote: > Is this only used by telemetry? Why an extra wrapper? I believe this is implementing an implicit interface. Some platforms like ChromeOS also implement port forwarding.
lgtm w/ nits. Thanks. https://codereview.chromium.org/18086004/diff/78003/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/78003/build/android/pylib/forwa... build/android/pylib/forwarder.py:143: if adb_serial == device_serial: Perhaps UnmapDevicePort should also do this check. https://codereview.chromium.org/18086004/diff/78003/build/android/pylib/forwa... build/android/pylib/forwarder.py:251: '%s:%s\n' % (pid_for_lock, str(_GetProcessStartTime(pid_for_lock)))) Make sure you don't get newline as part of process_start_time
lgtm, thanks! tiny nit: https://codereview.chromium.org/18086004/diff/92013/build/android/pylib/forwa... File build/android/pylib/forwarder.py (right): https://codereview.chromium.org/18086004/diff/92013/build/android/pylib/forwa... build/android/pylib/forwarder.py:207: if not serial_with_port in instance._device_to_host_port_map: nit: this shouldn't happen, so perhaps a logging.error here too?
Thanks Marcus! https://chromiumcodereview.appspot.com/18086004/diff/92013/build/android/pyli... File build/android/pylib/forwarder.py (right): https://chromiumcodereview.appspot.com/18086004/diff/92013/build/android/pyli... build/android/pylib/forwarder.py:207: if not serial_with_port in instance._device_to_host_port_map: On 2013/07/17 10:20:19, bulach wrote: > nit: this shouldn't happen, so perhaps a logging.error here too? Done.
Message was sent while issue was closed.
Committed patchset #10 manually as r212020 (presubmit successful).
This seems to be flaky :/ I will revert it and try locally with the same conditions (6 devices being sharded). On Wed, Jul 17, 2013 at 3:43 PM, <pliard@chromium.org> wrote: > Committed patchset #10 manually as r212020 (presubmit successful). > > https://codereview.chromium.**org/18086004/<https://codereview.chromium.org/1... >
The issue doesn't seem to be caused by excessive daemon kills (i.e. there is only a single host forwarder kill and one kill for each device as expected). So it would rather be a problem caused by premature/excessive port unmappings. On Wed, Jul 17, 2013 at 5:49 PM, Philippe Liard <pliard@google.com> wrote: > This seems to be flaky :/ I will revert it and try locally with the same > conditions (6 devices being sharded). > > > On Wed, Jul 17, 2013 at 3:43 PM, <pliard@chromium.org> wrote: > >> Committed patchset #10 manually as r212020 (presubmit successful). >> >> https://codereview.chromium.**org/18086004/<https://codereview.chromium.org/1... >> > > |