|
|
Created:
4 years, 1 month ago by Marijn Kruisselbrink Modified:
4 years ago Reviewers:
falken CC:
blink-worker-reviews_chromium.org, chromium-reviews, darin-cc_chromium.org, horo+watch_chromium.org, jam, jsbell+serviceworker_chromium.org, kinuko+watch, kinuko+serviceworker, michaeln, nhiroki, serviceworker-reviews, shimazu+serviceworker_chromium.org, tzik Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionPrevent a service worker from keeping itself alive by self postMessage.
This changes the timeout for message events when the message was sent from
a service worker to be the curent timeout of that service worker. So while
a service worker can still post message to itself, this won't allow it to
stay alive any longer than a single event would.
BUG=647943, 648836
Committed: https://crrev.com/f88b52ec1e4aebd7c1f28b13a58bfad92d04ba94
Cr-Commit-Position: refs/heads/master@{#436393}
Patch Set 1 #Patch Set 2 : fix unit test #
Total comments: 12
Patch Set 3 : Get rid of browser test. #Patch Set 4 : address further comments #Patch Set 5 : cleanup #Patch Set 6 : rebase #
Total comments: 1
Dependent Patchsets: Messages
Total messages: 33 (24 generated)
The CQ bit was checked by mek@chromium.org to run a CQ dry run
Patchset #1 (id:1) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
mek@chromium.org changed reviewers: + falken@chromium.org
My initial attempt at implementing linking the timeout of message events to the timeout of the service worker that send the message (when a message was send by a service worker). Not sure I'm entirely happy with this approach: - using the max-of-all-requests-that-were-ever-dispatched as timeout seems a bit odd, but the reason I'm doing it that way is that (especially with mojo) the FinishRequest of whatever event was being processed when a message was send can be handled before a postMessage call is processed, and we then still need to figure out what timeout to use for the message. - the CONTINUE_ON_TIMEOUT behavior doesn't seem entirely sensible intuitively. But that's probably more a general question about what exactly the behavior should be around timeouts and killing service workers. If a message (or background sync) even with a shorter timeout is the only event being processed by a service worker we probably still want to kill the worker as soon as the event expires. Currently we'd wait for an additional idle timeout to expire after the events expire though (and I think it might even be possible to end up in a situation with all events expired and no idle_time_ set). Maybe just changing the behavior of CONTINUE_ON_TIMEOUT to still kill the worker on timeout if no other events are outstanding would be enough, or maybe some more complicated logic is needed. - Not entirely happy with the test I wrote. Although it seems to work fine, and does test the behavior reasonably well. So maybe it's good enough. (As an aside, would it maybe make sense to split up service_worker_browsertest.cc in a couple of files? It currently seems to be a dumping ground for all service worker related browser tests).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by mek@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== Prevent a service worker from keeping itself alive by self postMessage. This changes the timeout for message events when the message was sent from a service worker to be the curent timeout of that service worker. So while a service worker can still post message to itself, this won't allow it to stay alive any longer than a single event would. BUG=647943 ========== to ========== Prevent a service worker from keeping itself alive by self postMessage. This changes the timeout for message events when the message was sent from a service worker to be the curent timeout of that service worker. So while a service worker can still post message to itself, this won't allow it to stay alive any longer than a single event would. BUG=647943,648836 ==========
I think this is a fine approach. I suppose the next steps will be to use a similar timeout clamping for foreign fetch? Agree that CONTINUE_ON_TIMEOUT didn't seem intuitive but it should be OK as long as the idle timeout is working. I'm wondering about: > and I think it might even be possible to end up in a situation with all events expired and no idle_time_ set) That seems really bad, and would defeat this mechanism. But if it's working as intended, the idle_time_ is always set when the worker starts, and never gets reset to null until the worker stops. Do you see something else? Overall the entire service worker timeout code seems like it should be simpler, that's out of the scope of this patch of course. > As an aside, would it maybe make sense to split up service_worker_browsertest.cc in a couple of files? Yes, I think there's just too much friction to add a new file so everyone piles tests into the same file. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_browsertest.cc (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_browsertest.cc:1261: const base::string16 messageMsg = base::ASCIIToUTF16("MESSAGE"); nit: activateMsg isn't used, and naming style should be kMessageMsg https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_browsertest.cc:1319: It'd be slightly nicer if this could verify that the worker STOPPED and isn't starting back up again, since STOPPING could still mean a StartWorker is queued up and waiting to restart. Do you know if the test can do that? https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_dispatcher_host.cc (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.cc:926: case SERVICE_WORKER_PROVIDER_FOR_CONTROLLER: { nit: maybe add some comment like "Clamp timeout to the sending worker's, to prevent postMessage from keeping workers alive forever." https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.cc:1104: sent_message_ports, source_info, callback, SERVICE_WORKER_ERROR_FAILED); Hm, TIMEOUT seems like a better error code here, since if we did barely have enough time and the worker took longer than 100 ms, we'd get a TIMEOUT error. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_dispatcher_host.h (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.h:193: const base::Optional<base::TimeDelta>& timeout, super nit: would it be better for timeout to come before |callback| for consistency with DispatchExtendableMessageEventAfterStartWorker? https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_version.cc (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_version.cc:574: max_request_expiration_time_ = expiration_time; It seems we never reset max to an earlier time (e.g., when this requests finishes), so this is really an upper bound on the expiration time of all outstanding requests. Maybe the comment suggested above should be more precise then. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_version.h (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_version.h:807: base::TimeTicks max_request_expiration_time_; nit: All these times are hard to keep track off so I'd prefer a comment like "the expiration time of the request with the latest expiration time".
The CQ bit was checked by mek@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/11/21 at 08:01:48, falken wrote: > I suppose the next steps will be to use a similar timeout clamping for foreign fetch? Yeah, working on that in https://codereview.chromium.org/2518523003 > Agree that CONTINUE_ON_TIMEOUT didn't seem intuitive but it should be OK as long as the idle timeout is working. I'm wondering about: One problem with CONTINUE_ON_TIMEOUT and IPC based events (such as currently the message event), is the code in EmbeddedWorkerRegistry::OnMessageReceived, which currently "Assume an unhandled message for a stopping worker is because the message was timed out and its handler removed prior to stopping.". With CONTINUE_ON_TIMEOUT it suddenly becomes reasonable to get messages for timed out requests where the worker isn't stopping... Not sure how to best deal with that. > > and I think it might even be possible to end up in a situation with all events expired and no idle_time_ set) > > That seems really bad, and would defeat this mechanism. But if it's working as intended, the idle_time_ is always set when the worker starts, and never gets reset to null until the worker stops. Do you see something else? Actually I was wrong. I was misreading some of the code, and misinterpreting test flakiness I was seeing in my attempt at adding a browsertest for this (where sometimes the worker wouldn't actually stop). So yeah, I'm pretty sure idle_time_ should be working correctly indeed. > Overall the entire service worker timeout code seems like it should be simpler, that's out of the scope of this patch of course. Agreed. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_browsertest.cc (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_browsertest.cc:1319: On 2016/11/21 at 08:01:48, falken ooo on Dec 1 wrote: > It'd be slightly nicer if this could verify that the worker STOPPED and isn't starting back up again, since STOPPING could still mean a StartWorker is queued up and waiting to restart. Do you know if the test can do that? Unfortunately I couldn't even figure out how to make the existing assertions not flakily fail (the test as written seemed to fail about 1% of the time) due to timing issues with how far the renderer progresses while the browser is doing things here... After trying all kinds of things I've just deleted this entire browser test, although it would be nice to have some kind of more end-to-end test than the unit tests... Just not sure how to reliably write one.. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_dispatcher_host.cc (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.cc:926: case SERVICE_WORKER_PROVIDER_FOR_CONTROLLER: { On 2016/11/21 at 08:01:48, falken ooo on Dec 1 wrote: > nit: maybe add some comment like "Clamp timeout to the sending worker's, to prevent postMessage from keeping workers alive forever." Done https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.cc:1104: sent_message_ports, source_info, callback, SERVICE_WORKER_ERROR_FAILED); On 2016/11/21 at 08:01:48, falken ooo on Dec 1 wrote: > Hm, TIMEOUT seems like a better error code here, since if we did barely have enough time and the worker took longer than 100 ms, we'd get a TIMEOUT error. Good point, done. https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_dispatcher_host.h (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_dispatcher_host.h:193: const base::Optional<base::TimeDelta>& timeout, On 2016/11/21 at 08:01:48, falken ooo on Dec 1 wrote: > super nit: would it be better for timeout to come before |callback| for consistency with DispatchExtendableMessageEventAfterStartWorker? Done https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... File content/browser/service_worker/service_worker_version.h (right): https://codereview.chromium.org/2506263005/diff/40001/content/browser/service... content/browser/service_worker/service_worker_version.h:807: base::TimeTicks max_request_expiration_time_; On 2016/11/21 at 08:01:48, falken ooo on Dec 1 wrote: > nit: All these times are hard to keep track off so I'd prefer a comment like "the expiration time of the request with the latest expiration time". Done
The CQ bit was checked by mek@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by mek@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
This patch lgtm. "One problem with CONTINUE_ON_TIMEOUT and IPC based events (such as currently the message event), is the code in EmbeddedWorkerRegistry::OnMessageReceived, which currently "Assume an unhandled message for a stopping worker is because the message was timed out and its handler removed prior to stopping.". With CONTINUE_ON_TIMEOUT it suddenly becomes reasonable to get messages for timed out requests where the worker isn't stopping... Not sure how to best deal with that." Good point. I wondering about the value and correctness of the whole BadMessageReceived in ServiceWorkerDispatcherHost::OnMessageReceived now. The contract of OnMessageReceived is to return false when you can't handle the message, so I don't understand why it's correct to BadMessageReceive when we get an IPC message we can't handle. Presumably there can be multiple filters and some other filter might handle the message. It might make sense to just delete that BadMessageReceived code? https://codereview.chromium.org/2506263005/diff/120001/content/browser/servic... File content/browser/service_worker/service_worker_version.h (right): https://codereview.chromium.org/2506263005/diff/120001/content/browser/servic... content/browser/service_worker/service_worker_version.h:799: // expiration times of finished requests. Thanks for the precise comment.
OK I now learned about ServiceWorkerDisaptcherHost::kFilteredMessageClasses, so the BadMessageReceived is not necessarily incorrect. It still might not be worth keeping it since it's difficult to keep track of all timed out requests. If we get many SWDH_NOT_HANDLED kills after this patch, we might disable the BadMessageReceived. Maybe for now we could add a comment to the BadMessageReceived call saying this can legitimately be triggered for timed out requests with CONTINUE_ON_TIMEOUT behavior.
On 2016/12/05 at 03:32:22, falken wrote: > OK I now learned about ServiceWorkerDisaptcherHost::kFilteredMessageClasses, so the BadMessageReceived is not necessarily incorrect. It still might not be worth keeping it since it's difficult to keep track of all timed out requests. If we get many SWDH_NOT_HANDLED kills after this patch, we might disable the BadMessageReceived. Maybe for now we could add a comment to the BadMessageReceived call saying this can legitimately be triggered for timed out requests with CONTINUE_ON_TIMEOUT behavior. I think I'll just leave the BadMessageReceived alone for now. I believe the CL to change message events to use mojo is pretty close to landing, so for this particular case the problem should go away soon. And eventually it'll all go away entirely, so probably indeed not worth the effort to try to fix "properly".
The CQ bit was checked by mek@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 120001, "attempt_start_ts": 1480962920875860, "parent_rev": "a857ef03e84feccbebd9b7b95945b613f8e7efad", "commit_rev": "1c3c0932c53a559b80c8f661b80c8b2f64ae970a"}
Message was sent while issue was closed.
Committed patchset #6 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== Prevent a service worker from keeping itself alive by self postMessage. This changes the timeout for message events when the message was sent from a service worker to be the curent timeout of that service worker. So while a service worker can still post message to itself, this won't allow it to stay alive any longer than a single event would. BUG=647943,648836 ========== to ========== Prevent a service worker from keeping itself alive by self postMessage. This changes the timeout for message events when the message was sent from a service worker to be the curent timeout of that service worker. So while a service worker can still post message to itself, this won't allow it to stay alive any longer than a single event would. BUG=647943,648836 Committed: https://crrev.com/f88b52ec1e4aebd7c1f28b13a58bfad92d04ba94 Cr-Commit-Position: refs/heads/master@{#436393} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/f88b52ec1e4aebd7c1f28b13a58bfad92d04ba94 Cr-Commit-Position: refs/heads/master@{#436393} |