|
|
DescriptionAdd message from grandchild to parent to indicate successful host launch.
Previously, the grandparent process would assume the host started successfully
if the grandchild closed the pipe. However, this happens regardless of the
exit code of the grandchild process. This CL allows commands to be sent between
the processes in addition to log messages, and defines an explicit command to
indicate successful start-up; if the grandparent does not receive this command,
it indicates failure via a non-zero exit code.
BUG=332930
TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1)
Committed: https://crrev.com/4dc4509a5cf240192dfeaafebf1986c0753814f7
Cr-Commit-Position: refs/heads/master@{#375680}
Patch Set 1 #
Total comments: 6
Patch Set 2 : Reviewer feedback. #
Total comments: 1
Patch Set 3 : Remove timeout. #
Total comments: 3
Patch Set 4 : Reinstate timeout and increase 120s to match heartbeat_sender.cc #Messages
Total messages: 20 (7 generated)
Description was changed from ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; run linux_me2me_host.py --start; echo $? (should give 1) ========== to ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1) ==========
jamiewalch@chromium.org changed reviewers: + lambroslambrou@chromium.org
ptal
https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... File remoting/host/linux/linux_me2me_host.py (right): https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:716: Must be called by the parent process. Document the return value (maybe in a "Returns:" section). https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:817: os._exit(1) # pylint: disable=W0212 What will happen in the case of timeout after 30 seconds? I don't know if we fail and deregister the host if the timeout is exceeded? It has been known to sometimes take more than 30 seconds to start the desktop session and host process and successfully make an XMPP connection. I think we "fixed" this by increasing the DaemonControllerDelegateLinux timeout to 60s. But now DaemonControllerDelegateLinux no longer applies any timeout, so we could probably get away with increasing the timeout here from 30 to 60 seconds?
ptal https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... File remoting/host/linux/linux_me2me_host.py (right): https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:716: Must be called by the parent process. On 2016/02/10 19:43:32, Lambros wrote: > Document the return value (maybe in a "Returns:" section). Done. https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:817: os._exit(1) # pylint: disable=W0212 On 2016/02/10 19:43:32, Lambros wrote: > What will happen in the case of timeout after 30 seconds? > I don't know if we fail and deregister the host if the timeout is exceeded? According to the comment on line 733, the read loop will exit with EINTR, which I think propagates an exception to the top level, which will then exit with a non-zero code. I believe the web-app is responsible for unregistering the host in that case. > It has been known to sometimes take more than 30 seconds to start the desktop > session and host process and successfully make an XMPP connection. > I think we "fixed" this by increasing the DaemonControllerDelegateLinux timeout > to 60s. But now DaemonControllerDelegateLinux no longer applies any timeout, so > we could probably get away with increasing the timeout here from 30 to 60 > seconds? That feels like a separate bug. If you think that's the right call, I'm happy to do a follow-up CL for it. However, I'm also considering removing the wait logic from this script altogether, and adding code to the web-app to wait for a heartbeat before reporting a host as on-line. That will be cross platform, but will require polling, so it's less elegant than the solution we have right now for Linux. https://codereview.chromium.org/1685793003/diff/20001/remoting/host/linux/lin... File remoting/host/linux/linux_me2me_host.py (right): https://codereview.chromium.org/1685793003/diff/20001/remoting/host/linux/lin... remoting/host/linux/linux_me2me_host.py:708: if success: Unrelated to your comments--I forgot to include this last-minute change in the original CL.
lgtm, assuming you'll address my concern here or in a followup CL. https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... File remoting/host/linux/linux_me2me_host.py (right): https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:817: os._exit(1) # pylint: disable=W0212 On 2016/02/11 00:00:22, Jamie wrote: > On 2016/02/10 19:43:32, Lambros wrote: > > What will happen in the case of timeout after 30 seconds? > > I don't know if we fail and deregister the host if the timeout is exceeded? > > According to the comment on line 733, the read loop will exit with EINTR, which > I think propagates an exception to the top level, which will then exit with a > non-zero code. I believe the web-app is responsible for unregistering the host > in that case. > > > It has been known to sometimes take more than 30 seconds to start the desktop > > session and host process and successfully make an XMPP connection. > > I think we "fixed" this by increasing the DaemonControllerDelegateLinux > timeout > > to 60s. But now DaemonControllerDelegateLinux no longer applies any timeout, > so > > we could probably get away with increasing the timeout here from 30 to 60 > > seconds? > > That feels like a separate bug. If you think that's the right call, I'm happy to > do a follow-up CL for it. However, I'm also considering removing the wait logic > from this script altogether, and adding code to the web-app to wait for a > heartbeat before reporting a host as on-line. That will be cross platform, but > will require polling, so it's less elegant than the solution we have right now > for Linux. My reading of the old code is that, after the 30s timeout, the loop would exit with EINTR, and the exception is swallowed (if-statement on line 737 of original file). Then the process exits with code 0 and so the web-app assumes the host is registered, and therefore doesn't delete the registration. Now you're returning host_ready which is False in the case of timeout, you're exiting the process with code 1 and the web-app will now delete the host registration. So this CL introduces an extra chance of incorrect de-registration.
jamiewalch@chromium.org changed reviewers: + sergeyu@chromium.org
+Sergey to comment on the timeout change. https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... File remoting/host/linux/linux_me2me_host.py (right): https://codereview.chromium.org/1685793003/diff/1/remoting/host/linux/linux_m... remoting/host/linux/linux_me2me_host.py:817: os._exit(1) # pylint: disable=W0212 On 2016/02/11 21:46:02, Lambros wrote: > On 2016/02/11 00:00:22, Jamie wrote: > > On 2016/02/10 19:43:32, Lambros wrote: > > > What will happen in the case of timeout after 30 seconds? > > > I don't know if we fail and deregister the host if the timeout is exceeded? > > > > According to the comment on line 733, the read loop will exit with EINTR, > which > > I think propagates an exception to the top level, which will then exit with a > > non-zero code. I believe the web-app is responsible for unregistering the host > > in that case. > > > > > It has been known to sometimes take more than 30 seconds to start the > desktop > > > session and host process and successfully make an XMPP connection. > > > I think we "fixed" this by increasing the DaemonControllerDelegateLinux > > timeout > > > to 60s. But now DaemonControllerDelegateLinux no longer applies any timeout, > > so > > > we could probably get away with increasing the timeout here from 30 to 60 > > > seconds? > > > > That feels like a separate bug. If you think that's the right call, I'm happy > to > > do a follow-up CL for it. However, I'm also considering removing the wait > logic > > from this script altogether, and adding code to the web-app to wait for a > > heartbeat before reporting a host as on-line. That will be cross platform, but > > will require polling, so it's less elegant than the solution we have right now > > for Linux. > > My reading of the old code is that, after the 30s timeout, the loop would exit > with EINTR, and the exception is swallowed (if-statement on line 737 of original > file). Then the process exits with code 0 and so the web-app assumes the host is > registered, and therefore doesn't delete the registration. > > Now you're returning host_ready which is False in the case of timeout, you're > exiting the process with code 1 and the web-app will now delete the host > registration. So this CL introduces an extra chance of incorrect > de-registration. > Good point. I've removed the timeout from this script. https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... File remoting/host/linux/linux_me2me_host.py (left): https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... remoting/host/linux/linux_me2me_host.py:727: signal.alarm(30) Sergey, do you agree that this change makes sense? There's already a 2m timeout in heartbeat_sender.cc, so imposing a 30s one here seems wrong. The heartbeat_sender.cc timeout is better because it terminates the host process if it times out, which (with the rest of this CL) means that the user is notified in the web-app that something went wrong. This timeout, by contrast, just logs something that a typical user won't see and the web-app will suggest that all is well.
https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... File remoting/host/linux/linux_me2me_host.py (left): https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... remoting/host/linux/linux_me2me_host.py:727: signal.alarm(30) On 2016/02/12 18:32:12, Jamie wrote: > Sergey, do you agree that this change makes sense? There's already a 2m timeout > in heartbeat_sender.cc, so imposing a 30s one here seems wrong. The > heartbeat_sender.cc timeout is better because it terminates the host process if > it times out, which (with the rest of this CL) means that the user is notified > in the web-app that something went wrong. This timeout, by contrast, just logs > something that a typical user won't see and the web-app will suggest that all is > well. This timeout may still be useful when the host process misbehaves. If the host process never closes the pipe then wait_for_logs() will never exit. E.g. this is particularly problematic when booting the system, as the startup script will never quit and it may block the system from booting properly.
ptal https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... File remoting/host/linux/linux_me2me_host.py (left): https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... remoting/host/linux/linux_me2me_host.py:727: signal.alarm(30) On 2016/02/12 22:42:25, Sergey Ulanov wrote: > On 2016/02/12 18:32:12, Jamie wrote: > > Sergey, do you agree that this change makes sense? There's already a 2m > timeout > > in heartbeat_sender.cc, so imposing a 30s one here seems wrong. The > > heartbeat_sender.cc timeout is better because it terminates the host process > if > > it times out, which (with the rest of this CL) means that the user is notified > > in the web-app that something went wrong. This timeout, by contrast, just logs > > something that a typical user won't see and the web-app will suggest that all > is > > well. > > This timeout may still be useful when the host process misbehaves. If the host > process never closes the pipe then wait_for_logs() will never exit. E.g. this is > particularly problematic when booting the system, as the startup script will > never quit and it may block the system from booting properly. I've reinstated it but increased it to 120s to match the timeout in heartbeat_sender.cc. WDYT?
On 2016/02/12 23:03:04, Jamie wrote: > ptal > > https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... > File remoting/host/linux/linux_me2me_host.py (left): > > https://codereview.chromium.org/1685793003/diff/40001/remoting/host/linux/lin... > remoting/host/linux/linux_me2me_host.py:727: signal.alarm(30) > On 2016/02/12 22:42:25, Sergey Ulanov wrote: > > On 2016/02/12 18:32:12, Jamie wrote: > > > Sergey, do you agree that this change makes sense? There's already a 2m > > timeout > > > in heartbeat_sender.cc, so imposing a 30s one here seems wrong. The > > > heartbeat_sender.cc timeout is better because it terminates the host process > > if > > > it times out, which (with the rest of this CL) means that the user is > notified > > > in the web-app that something went wrong. This timeout, by contrast, just > logs > > > something that a typical user won't see and the web-app will suggest that > all > > is > > > well. > > > > This timeout may still be useful when the host process misbehaves. If the host > > process never closes the pipe then wait_for_logs() will never exit. E.g. this > is > > particularly problematic when booting the system, as the startup script will > > never quit and it may block the system from booting properly. > > I've reinstated it but increased it to 120s to match the timeout in > heartbeat_sender.cc. WDYT? Another option is to remove this wait-for-heartbeat logic (since we only have it on Linux) and move it to the web-app instead. It will need to be polling-based there, but would work cross-platform.
On 2016/02/12 23:08:15, Jamie wrote: > Another option is to remove this wait-for-heartbeat logic (since we only have it > on Linux) and move it to the web-app instead. It will need to be polling-based > there, but would work cross-platform. I think it's still useful on linux for headless setup.
lgtm
The CQ bit was checked by jamiewalch@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from lambroslambrou@chromium.org Link to the patchset: https://codereview.chromium.org/1685793003/#ps60001 (title: "Reinstate timeout and increase 120s to match heartbeat_sender.cc")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685793003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685793003/60001
Message was sent while issue was closed.
Description was changed from ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1) ========== to ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1) ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1) ========== to ========== Add message from grandchild to parent to indicate successful host launch. Previously, the grandparent process would assume the host started successfully if the grandchild closed the pipe. However, this happens regardless of the exit code of the grandchild process. This CL allows commands to be sent between the processes in addition to log messages, and defines an explicit command to indicate successful start-up; if the grandparent does not receive this command, it indicates failure via a non-zero exit code. BUG=332930 TEST=chmod -x remoting_me2me_host; linux_me2me_host.py --start; echo $? (should give 1) Committed: https://crrev.com/4dc4509a5cf240192dfeaafebf1986c0753814f7 Cr-Commit-Position: refs/heads/master@{#375680} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/4dc4509a5cf240192dfeaafebf1986c0753814f7 Cr-Commit-Position: refs/heads/master@{#375680} |