|
|
Chromium Code Reviews|
Created:
4 years, 1 month ago by Charlie Harrison Modified:
4 years, 1 month ago CC:
chromium-reviews, jam, nasko+codewatch_chromium.org, darin-cc_chromium.org, creis+watch_chromium.org, site-isolation-reviews_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionKill vestigial transfer requests
When requests are marked as transferring, we mark them as protected in
the ResourceDispatcherHostImpl, so when their RFH goes away we won't
lose the request.
However, there are cases where the transfer can fail, and the navigation
abandoned, while the request on the IO thread is still marked as
protected.
This patch catches that case by checking for transferring navigation
handles in ~RFH, and cancelling them there.
Browser test written by lukasza@chromium.org
BUG=657195
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
Committed: https://crrev.com/f2466b80b0cd892d31fdf69a397905eaacbfb3b5
Cr-Commit-Position: refs/heads/master@{#433430}
Patch Set 1 #
Total comments: 1
Patch Set 2 : PostTask in ~NavigationHandle #Patch Set 3 : make is_transfering_ consistent #Patch Set 4 : test from lukasza #
Total comments: 10
Patch Set 5 : comments #
Messages
Total messages: 43 (29 generated)
Description was changed from ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. BUG=657195 ========== to ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ==========
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
csharrison@chromium.org changed reviewers: + creis@chromium.org
Hey Charlie, Here is a provisional solution for the thumbs up extensions problem. I'm struggling a bit to write a browsertest to catch this though. I'm thinking possibly of blackholing navigation requests from the renderer, so we can kill the RFH before it receives a navigation request. I'm not quite sure if that's possible though.
creis@chromium.org changed reviewers: + lukasza@chromium.org
[+lukasza] Many thanks for looking into this! This seems plausible to me. There's probably other ways for NavigationHandles to be deleted than RFH destruction, but hopefully they won't take a transferring request with them in the other cases? If that's right, LGTM once we add a test. On 2016/11/18 03:55:19, Charlie Harrison wrote: > Hey Charlie, > Here is a provisional solution for the thumbs up extensions problem. I'm > struggling a bit to write a browsertest to catch this though. I know lukasza@ was also trying to write a test for this-- maybe he can share his progress? > I'm thinking possibly of blackholing navigation requests from the renderer, so > we can kill the RFH before it receives a navigation request. I'm not quite sure > if that's possible though. Yeah, TestNavigationManager is the closest thing I can think of, and that's not quite the same thing. https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... File content/browser/frame_host/render_frame_host_impl.cc (right): https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... content/browser/frame_host/render_frame_host_impl.cc:207: if (transfer_request_id.child_id != -1) { Let's add a comment saying that we need to clean up any outstanding transfers since they don't otherwise get deleted in CancelRequestsForRoute when the frame goes away.
On 2016/11/18 04:52:24, Charlie Reis (away) wrote: > [+lukasza] > > Many thanks for looking into this! > > This seems plausible to me. There's probably other ways for NavigationHandles > to be deleted than RFH destruction, but hopefully they won't take a transferring > request with them in the other cases? > > If that's right, LGTM once we add a test. I'm fairly sure but it's hard to be 100%. Maybe the PostTask should be in ~NavigationHandleImpl? > > On 2016/11/18 03:55:19, Charlie Harrison wrote: > > Hey Charlie, > > Here is a provisional solution for the thumbs up extensions problem. I'm > > struggling a bit to write a browsertest to catch this though. > > I know lukasza@ was also trying to write a test for this-- maybe he can share > his progress? > > > I'm thinking possibly of blackholing navigation requests from the renderer, so > > we can kill the RFH before it receives a navigation request. I'm not quite > sure > > if that's possible though. > > Yeah, TestNavigationManager is the closest thing I can think of, and that's not > quite the same thing. Yeah we have to pause the request before we get the navigate message entirely. all the navigation throttles come after the request is picked up by the c/b/l stack. > > https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... > File content/browser/frame_host/render_frame_host_impl.cc (right): > > https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... > content/browser/frame_host/render_frame_host_impl.cc:207: if > (transfer_request_id.child_id != -1) { > Let's add a comment saying that we need to clean up any outstanding transfers > since they don't otherwise get deleted in CancelRequestsForRoute when the frame > goes away. Ack
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
Thank you very much for digging into this! On 2016/11/18 14:54:26, Charlie Harrison wrote: > On 2016/11/18 04:52:24, Charlie Reis (away) wrote: > > [+lukasza] > > > > Many thanks for looking into this! > > > > This seems plausible to me. There's probably other ways for NavigationHandles > > to be deleted than RFH destruction, but hopefully they won't take a > transferring > > request with them in the other cases? > > > > If that's right, LGTM once we add a test. > I'm fairly sure but it's hard to be 100%. Maybe the PostTask should be in > ~NavigationHandleImpl? +1 - cleaning up from within navigation handle's destructor sounds like a good idea to me. > > On 2016/11/18 03:55:19, Charlie Harrison wrote: > > > Hey Charlie, > > > Here is a provisional solution for the thumbs up extensions problem. I'm > > > struggling a bit to write a browsertest to catch this though. > > > > I know lukasza@ was also trying to write a test for this-- maybe he can share > > his progress? > > > > > I'm thinking possibly of blackholing navigation requests from the renderer, > so > > > we can kill the RFH before it receives a navigation request. I'm not quite > > sure > > > if that's possible though. > > > > Yeah, TestNavigationManager is the closest thing I can think of, and that's > not > > quite the same thing. > Yeah we have to pause the request before we get the navigate message entirely. > all > the navigation throttles come after the request is picked up by the c/b/l stack. > > So - what I was missing in my attempts to come up with a repro test was the requirement to destroy the navigation handle *while* is_transferring_ is true. I've talked with nasko@ and I think we have a promising plan for coming up with a browser test. I'll try to work on that and reply with status update later today. The plan is more or less to: 1. load a page that will trigger a transfer via a subframe 2. use TestNavigationManager to wait for WillProcessResponse and pause the navigation at that point 3. *post* a task that will destroy the frame (or the whole frame tree) 4. use TestNavigationManager to resume the navigation and wait for it to complete - at that point NavigationHandle will flip is_transferring_ to true and will start waiting for OnDidStartProvisionalLoad to flip it back to false. - Waiting in TestNavigationManager will spin up a message loop which should execute the task we posted in step3 and hopefully repro the problem. 5. do another navigation to the same URI and observe whether a 20 seconds delay happens > > > https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... > > File content/browser/frame_host/render_frame_host_impl.cc (right): > > > > > https://codereview.chromium.org/2508973006/diff/1/content/browser/frame_host/... > > content/browser/frame_host/render_frame_host_impl.cc:207: if > > (transfer_request_id.child_id != -1) { > > Let's add a comment saying that we need to clean up any outstanding transfers > > since they don't otherwise get deleted in CancelRequestsForRoute when the > frame > > goes away. > Ack
Thanks! I don't have too much time to work on the test today (I am network triager today and monday), so I appreciate your help there. I am working on figuring out why tests are failing currently with the post task in ~NavigationHandleImpl. It seems like in browser tests we can get the handle in the DID_COMMIT state and still have is_transfering_ be true. This seems wrong so I'll try to see why it's happening, as branching on state_ in the destructor feels like it shouldn't be necessary.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ========== to ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. Browser test written by lukasza@chromium.org BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ==========
Looks like bots are reasonably happy. lukasza@, I'll wait for your l-g-t-m before shipping.
For the record, updated fix still LGTM. Thanks for the test, Łukasz! One question about a flakiness concern, but it may be ok to land (depending on the answer). https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... File content/browser/site_per_process_browsertest.cc (right): https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8600: // hopefully before the transfer completes (i.e. before we receive "Hopefully" sounds flaky to me. :) If so, is it just that it will always succeed with the fix in place and usually (but not always) fail without the fix? Or is it not actually flaky?
nasko@chromium.org changed reviewers: + nasko@chromium.org
A couple of drive-by comments on the test. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... File content/browser/site_per_process_browsertest.cc (right): https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8571: FrameTreeDestroyedInMiddleOfTransfer) { Since this test is specifically targeting transfers, might be good to return early when running in PlzNavigate mode, as there are no transfers in that mode. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8614: // hit the 20 seconds delay before https://crbug.com/657195 was fixed. Why not check actually the time it takes? What if the machine is slow and the test timeout is actually set to more than 20 seconds?
lukasza@chromium.org changed reviewers: - nasko@chromium.org
LGTM It took me a while to understand why the calls to set_is_transferring(false) are needed in the 2 new places, but I understand them now and they indeed look necessary and correct. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... File content/browser/site_per_process_browsertest.cc (right): https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8600: // hopefully before the transfer completes (i.e. before we receive On 2016/11/19 00:03:14, Charlie Reis (away) wrote: > "Hopefully" sounds flaky to me. :) If so, is it just that it will always > succeed with the fix in place and usually (but not always) fail without the fix? > Or is it not actually flaky? Maybe "hopefully" is the wrong way here. The way things work today, we are guaranteed the right ordering (the ordering that used to trigger the bug): 1. Before we resume the navigation (on line 8610), we have not yet initiated the transfer / we have not yet sent an IPC to the new renderer. 2. Because of 1, the task we post on line 8602 is guaranteed to be queued by the message loop *before* the IPC response for DidStartProvisionalLoad.
https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... File content/browser/site_per_process_browsertest.cc (right): https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8571: FrameTreeDestroyedInMiddleOfTransfer) { On 2016/11/19 00:16:21, nasko wrote: > Since this test is specifically targeting transfers, might be good to return > early when running in PlzNavigate mode, as there are no transfers in that mode. That might be a good idea. csharrison@ - WDYT? Would you be able to make that change? https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8600: // hopefully before the transfer completes (i.e. before we receive On 2016/11/19 00:16:38, Łukasz Anforowicz wrote: > On 2016/11/19 00:03:14, Charlie Reis (away) wrote: > > "Hopefully" sounds flaky to me. :) If so, is it just that it will always > > succeed with the fix in place and usually (but not always) fail without the > fix? > > Or is it not actually flaky? > > Maybe "hopefully" is the wrong way here. I meant s/wrong way/wrong word/ > The way things work today, we are > guaranteed the right ordering (the ordering that used to trigger the bug): > > 1. Before we resume the navigation (on line 8610), we have not yet initiated the > transfer / we have not yet sent an IPC to the new renderer. > > 2. Because of 1, the task we post on line 8602 is guaranteed to be queued by the > message loop *before* the IPC response for DidStartProvisionalLoad. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8614: // hit the 20 seconds delay before https://crbug.com/657195 was fixed. On 2016/11/19 00:16:21, nasko wrote: > Why not check actually the time it takes? What if the machine is slow and the > test timeout is actually set to more than 20 seconds? Hmmmm... good point. OTOH, I am a bit wary of making the test depend on timing. FWIW, in that case the problem would still be caught by a DCHECK (see https://crbug.com/657195#c31).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks all https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... File content/browser/site_per_process_browsertest.cc (right): https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8571: FrameTreeDestroyedInMiddleOfTransfer) { On 2016/11/19 00:24:59, Łukasz Anforowicz wrote: > On 2016/11/19 00:16:21, nasko wrote: > > Since this test is specifically targeting transfers, might be good to return > > early when running in PlzNavigate mode, as there are no transfers in that > mode. > > That might be a good idea. csharrison@ - WDYT? Would you be able to make that > change? Done. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8600: // hopefully before the transfer completes (i.e. before we receive On 2016/11/19 00:24:59, Łukasz Anforowicz wrote: > On 2016/11/19 00:16:38, Łukasz Anforowicz wrote: > > On 2016/11/19 00:03:14, Charlie Reis (away) wrote: > > > "Hopefully" sounds flaky to me. :) If so, is it just that it will always > > > succeed with the fix in place and usually (but not always) fail without the > > fix? > > > Or is it not actually flaky? > > > > Maybe "hopefully" is the wrong way here. > I meant s/wrong way/wrong word/ > > > The way things work today, we are > > guaranteed the right ordering (the ordering that used to trigger the bug): > > > > 1. Before we resume the navigation (on line 8610), we have not yet initiated > the > > transfer / we have not yet sent an IPC to the new renderer. > > > > 2. Because of 1, the task we post on line 8602 is guaranteed to be queued by > the > > message loop *before* the IPC response for DidStartProvisionalLoad. > Updated the comment to be a bit more specific. https://codereview.chromium.org/2508973006/diff/60001/content/browser/site_pe... content/browser/site_per_process_browsertest.cc:8614: // hit the 20 seconds delay before https://crbug.com/657195 was fixed. On 2016/11/19 00:24:59, Łukasz Anforowicz wrote: > On 2016/11/19 00:16:21, nasko wrote: > > Why not check actually the time it takes? What if the machine is slow and the > > test timeout is actually set to more than 20 seconds? > > Hmmmm... good point. OTOH, I am a bit wary of making the test depend on timing. > FWIW, in that case the problem would still be caught by a DCHECK (see > https://crbug.com/657195#c31). I agree, I don't want to add timing to this suite. I confirmed that we AssertNoURLRequests() if this test fails but does not time out, so I will just add that as a comment.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by csharrison@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from lukasza@chromium.org, creis@chromium.org Link to the patchset: https://codereview.chromium.org/2508973006/#ps80001 (title: "comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 80001, "attempt_start_ts": 1479581356329980,
"parent_rev": ["1a094197d3018ce32c8fa6ad75c0192a0c1db28e", null], "commit_rev":
["fe97853c0d16b21c9e9864401c19f50058cc8f81", null]}
Message was sent while issue was closed.
Description was changed from ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. Browser test written by lukasza@chromium.org BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ========== to ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. Browser test written by lukasza@chromium.org BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001)
Message was sent while issue was closed.
Description was changed from ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. Browser test written by lukasza@chromium.org BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation ========== to ========== Kill vestigial transfer requests When requests are marked as transferring, we mark them as protected in the ResourceDispatcherHostImpl, so when their RFH goes away we won't lose the request. However, there are cases where the transfer can fail, and the navigation abandoned, while the request on the IO thread is still marked as protected. This patch catches that case by checking for transferring navigation handles in ~RFH, and cancelling them there. Browser test written by lukasza@chromium.org BUG=657195 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation Committed: https://crrev.com/f2466b80b0cd892d31fdf69a397905eaacbfb3b5 Cr-Commit-Position: refs/heads/master@{#433430} ==========
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/f2466b80b0cd892d31fdf69a397905eaacbfb3b5 Cr-Commit-Position: refs/heads/master@{#433430} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
