|
|
Created:
3 years, 7 months ago by Alexey Seren Modified:
3 years, 6 months ago CC:
chromium-reviews Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionFix process_singleton_posix handling of hostname changes not deleting a stale
lockfile or not killing a frozen browser.
First issue:
1. Chrome is run.
2. Hostname is changed.
3. Chrome crashes, leaving the lock file with the old hostname.
4. A different chrome runs with a different user-data-dir and just happens to
get the same pid as from step #1.
5. Chrome is run again with the original user-data-dir. It sees there is a lock
file with a different hostname, but sees that there is a chrome process with the
same pid, so it can reach the KillProcessByLockPath in the connect retry loop.
In this case, lockfile is not deleted. We need to delete lock file without killing anything.
Second issue:
1. Chrome is run.
2. Hostname is changed.
3. Chrome freezes, it's still listening on the socket with the old hostname, but
not responding to any messages.
4. Chrome is run again. It can connect to the socket, but times out waiting for
a response, hitting one of read timeout or write timeout KillProcessByLockPath
calls.
In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do.
CL fixes both of these issues.
R=jochen@chromium.org, mattm@chromium.org
BUG=722749
Review-Url: https://codereview.chromium.org/2880333004
Cr-Commit-Position: refs/heads/master@{#477342}
Committed: https://chromium.googlesource.com/chromium/src/+/67095448eb212b9f8e09f071515669d9618e7534
Patch Set 1 #Patch Set 2 : Added appropriate histograms check + rebase #Patch Set 3 : Fix issue with hung browser after hostname change #
Total comments: 4
Patch Set 4 : Renamed KillProcessByLockPath() parameter to is_connected_to_socket #
Messages
Total messages: 43 (21 generated)
will stamp once the CL looks good to gab@
On 2017/05/16 11:17:14, jochen wrote: > will stamp once the CL looks good to gab@ Oh boy, I really know nothing about POSIX's ProcessSingleton impl... Can you find a better reviewer from history of this file plz? Thanks!
Description was changed from ========== Unlink SingletonLock (created from another host) if user opted to unlock profile Currently following steps leads to unlinked SingletonLock (that can be a problem on farther process start): - profile is locked by process on another host - we start new instance of browser - browser finds and connects to singleton socket to send notification - for any reason browser failed to read/write to singleton socket - browser reads SingletonLock and finds that locking process is on another host - browser decides to create own ProcessSingleton This CL unlinks SingletonLock in this case R=jochen@chromium.org, gab@chromium.org BUG=722749 ========== to ========== Unlink SingletonLock (created from another host) if user opted to unlock profile Currently following steps leads to unlinked SingletonLock (that can be a problem on farther process start): - profile is locked by process on another host - we start new instance of browser - browser finds and connects to singleton socket to send notification - for any reason browser failed to read/write to singleton socket - browser reads SingletonLock and finds that locking process is on another host - browser decides to create own ProcessSingleton This CL unlinks SingletonLock in this case R=jochen@chromium.org, gab@chromium.org BUG=722749 ==========
aseren@yandex-team.ru changed reviewers: + mattm@chromium.org
aseren@yandex-team.ru changed reviewers: + eroman@chromium.org
PTAL
On 2017/05/17 20:03:08, Alexey Seren wrote: > PTAL Is this a theoretical fix, or are you actually able to trigger this condition? The CL description doesn't seem possible. If the profile is locked by instance on another host, connecting the socket would not succeed. (Not to say there shouldn't be an Unlink there, just want to understand what's happening fully.)
On 2017/05/18 02:26:07, mattm wrote: > On 2017/05/17 20:03:08, Alexey Seren wrote: > > PTAL > > Is this a theoretical fix, or are you actually able to trigger this condition? > The CL description doesn't seem possible. If the profile is locked by instance > on another host, connecting the socket would not succeed. (Not to say there > shouldn't be an Unlink there, just want to understand what's happening fully.) Hey mattm, Thank you for reviewing this CL! I haven't triggered this issue on real profile so it is rather theoretical fix. I thought that connection to such socket is ok, but write fails. See comments for process_singleton_posix.cc. // If writing to the socket fails, the hostname in the lock is checked to see if // another instance is running a different host using a shared filesystem (nfs, // etc.) If the hostname differs an error is displayed and the second process // exits. Otherwise the first process (if any) is killed and the second process // starts as normal.
On 2017/05/18 18:58:40, Alexey Seren wrote: > On 2017/05/18 02:26:07, mattm wrote: > > On 2017/05/17 20:03:08, Alexey Seren wrote: > > > PTAL > > > > Is this a theoretical fix, or are you actually able to trigger this condition? > > The CL description doesn't seem possible. If the profile is locked by instance > > on another host, connecting the socket would not succeed. (Not to say there > > shouldn't be an Unlink there, just want to understand what's happening fully.) > > Hey mattm, > Thank you for reviewing this CL! > > I haven't triggered this issue on real profile so it is rather theoretical fix. > I thought that connection to such socket is ok, but write fails. See comments > for process_singleton_posix.cc. > > // If writing to the socket fails, the hostname in the lock is checked to see if > // another instance is running a different host using a shared filesystem (nfs, > // etc.) If the hostname differs an error is displayed and the second process > // exits. Otherwise the first process (if any) is killed and the second process > // starts as normal. Hm, I think that comment is just being a bit loose with terminology. Connecting to a domain socket shouldn't succeed if nothing is listening. In any case, I think there is perhaps some pretty rare ways this could be reached, without actually involving a second host. (This is just from code examination, I haven't actually tested these.) First way: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, deleting the lockfile without killing anything would be correct. Second way: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. Unfortunately these are kind of conflicting cases. I guess, at least on Linux, it would be possible to see which user-data-dir the other chrome process is using by checking some /proc files. But this may be rare enough to not be worth the effort. So I think this CL is probably fine if you update the description.
Description was changed from ========== Unlink SingletonLock (created from another host) if user opted to unlock profile Currently following steps leads to unlinked SingletonLock (that can be a problem on farther process start): - profile is locked by process on another host - we start new instance of browser - browser finds and connects to singleton socket to send notification - for any reason browser failed to read/write to singleton socket - browser reads SingletonLock and finds that locking process is on another host - browser decides to create own ProcessSingleton This CL unlinks SingletonLock in this case R=jochen@chromium.org, gab@chromium.org BUG=722749 ========== to ========== Current implementation of ProcessSingleton has several issues on hostname changes. First issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, lockfile is not deleted. We need to delete lock file without killing anything. Second issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. CL fixes both of these issues. R=jochen@chromium.org, mattm@chromium.org BUG=722749 ==========
On 2017/05/23 17:15:48, mattm wrote: > Hm, I think that comment is just being a bit loose with terminology. Connecting > to a domain socket shouldn't succeed if nothing is listening. > > In any case, I think there is perhaps some pretty rare ways this could be > reached, without actually involving a second host. > (This is just from code examination, I haven't actually tested these.) > > First way: > > 1. Chrome is run. > 2. Hostname is changed. > 3. Chrome crashes, leaving the lock file with the old hostname. > 4. A different chrome runs with a different user-data-dir and just happens to > get the same pid as from step #1. > 5. Chrome is run again with the original user-data-dir. It sees there is a lock > file with a different hostname, but sees that there is a chrome process with the > same pid, so it can reach the KillProcessByLockPath in the connect retry loop. > > In this case, deleting the lockfile without killing anything would be correct. > > > Second way: > > 1. Chrome is run. > 2. Hostname is changed. > 3. Chrome freezes, it's still listening on the socket with the old hostname, but > not responding to any messages. > 4. Chrome is run again. It can connect to the socket, but times out waiting for > a response, hitting one of read timeout or write timeout KillProcessByLockPath > calls. > > In this case, we'd actually want to kill the frozen chrome in addition to > deleting the lockfile, which the code currently doesn't do. > > > Unfortunately these are kind of conflicting cases. I guess, at least on Linux, > it would be possible to see which user-data-dir the other chrome process is > using by checking some /proc files. But this may be rare enough to not be worth > the effort. > > > So I think this CL is probably fine if you update the description. Hey mattm, Thank you for detailed explanation of possible issues! I have found
Hey mattm, Thank you for detailed explanation of possible issues! I have found the way to address both of these issues. If we are able to connect to socket then we are definitely use the same profile and we can safely kill remote browser process.
https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... File chrome/browser/process_singleton.h (right): https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... chrome/browser/process_singleton.h:196: bool KillProcessByLockPath(bool is_connected); is_connected indicates whether we connected to socket or not https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... File chrome/browser/process_singleton_posix.cc (right): https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... chrome/browser/process_singleton_posix.cc:1099: if (!hostname.empty() && hostname != net::GetHostName() && !is_connected) { if is_connected is true then we have connected successfully so we are using the same profile with remote hung process. So we can safely kill it.
PTAL
thanks! Just one nit about the cl description. The description overall is good, but the first line of the description could maybe be a bit more descriptive about what is being changed. This CL is a bit hard to summarize in a single sentence, but maybe something like: "Fix process_singleton_posix handling of hostname changes not deleting a stale lockfile or not killing a frozen browser." It's a little on the long side but I think that's ok.
Description was changed from ========== Current implementation of ProcessSingleton has several issues on hostname changes. First issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, lockfile is not deleted. We need to delete lock file without killing anything. Second issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. CL fixes both of these issues. R=jochen@chromium.org, mattm@chromium.org BUG=722749 ========== to ========== Fix process_singleton_posix handling of hostname changes not deleting a stale lockfile or not killing a frozen browser. First issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, lockfile is not deleted. We need to delete lock file without killing anything. Second issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. CL fixes both of these issues. R=jochen@chromium.org, mattm@chromium.org BUG=722749 ==========
On 2017/06/02 00:21:06, mattm wrote: > thanks! Just one nit about the cl description. The description overall is good, > but the first line of the description could maybe be a bit more descriptive > about what is being changed. > > This CL is a bit hard to summarize in a single sentence, but maybe something > like: > "Fix process_singleton_posix handling of hostname changes not deleting a stale > lockfile or not killing a frozen browser." > > It's a little on the long side but I think that's ok. Done. Thank you!
lgtm
The CQ bit was checked by aseren@yandex-team.ru
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by aseren@yandex-team.ru to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by aseren@yandex-team.ru to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by aseren@yandex-team.ru
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
lgtm with nit https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... File chrome/browser/process_singleton.h (right): https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... chrome/browser/process_singleton.h:196: bool KillProcessByLockPath(bool is_connected); On 2017/05/29 at 11:33:07, Alexey Seren wrote: > is_connected indicates whether we connected to socket or not can you rename the parameter to is_connect_to_socket or something?
Done. Thank you! https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... File chrome/browser/process_singleton.h (right): https://codereview.chromium.org/2880333004/diff/40001/chrome/browser/process_... chrome/browser/process_singleton.h:196: bool KillProcessByLockPath(bool is_connected); On 2017/06/06 15:00:44, jochen wrote: > On 2017/05/29 at 11:33:07, Alexey Seren wrote: > > is_connected indicates whether we connected to socket or not > > can you rename the parameter to is_connect_to_socket or something? Done. Renamed to is_connected_to_socket.
The CQ bit was checked by aseren@yandex-team.ru
The patchset sent to the CQ was uploaded after l-g-t-m from jochen@chromium.org, mattm@chromium.org Link to the patchset: https://codereview.chromium.org/2880333004/#ps60001 (title: "Renamed KillProcessByLockPath() parameter to is_connected_to_socket")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 60001, "attempt_start_ts": 1496766846793790, "parent_rev": "0ff947cea6682a0cc4b8e7578b4d43534140e843", "commit_rev": "67095448eb212b9f8e09f071515669d9618e7534"}
Message was sent while issue was closed.
Description was changed from ========== Fix process_singleton_posix handling of hostname changes not deleting a stale lockfile or not killing a frozen browser. First issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, lockfile is not deleted. We need to delete lock file without killing anything. Second issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. CL fixes both of these issues. R=jochen@chromium.org, mattm@chromium.org BUG=722749 ========== to ========== Fix process_singleton_posix handling of hostname changes not deleting a stale lockfile or not killing a frozen browser. First issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome crashes, leaving the lock file with the old hostname. 4. A different chrome runs with a different user-data-dir and just happens to get the same pid as from step #1. 5. Chrome is run again with the original user-data-dir. It sees there is a lock file with a different hostname, but sees that there is a chrome process with the same pid, so it can reach the KillProcessByLockPath in the connect retry loop. In this case, lockfile is not deleted. We need to delete lock file without killing anything. Second issue: 1. Chrome is run. 2. Hostname is changed. 3. Chrome freezes, it's still listening on the socket with the old hostname, but not responding to any messages. 4. Chrome is run again. It can connect to the socket, but times out waiting for a response, hitting one of read timeout or write timeout KillProcessByLockPath calls. In this case, we'd actually want to kill the frozen chrome in addition to deleting the lockfile, which the code currently doesn't do. CL fixes both of these issues. R=jochen@chromium.org, mattm@chromium.org BUG=722749 Review-Url: https://codereview.chromium.org/2880333004 Cr-Commit-Position: refs/heads/master@{#477342} Committed: https://chromium.googlesource.com/chromium/src/+/67095448eb212b9f8e09f0715156... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://chromium.googlesource.com/chromium/src/+/67095448eb212b9f8e09f0715156... |