|
|
Chromium Code Reviews
DescriptionClean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete
Background: When a server advertises alternative services support,
JobController creates two Jobs (|main_job_| and |alternative_job_|).
The controller blocks |main_job_| when connecting the
|alternative_job_|. It resumes the |main_job_| when |alternative_job_|
fails or when the timeout happens. When both jobs are destroyed, the
JobController goes away.
Bug: We currently leak JobController in the situation where
|alternative_job_| succeeds and |main_job_| is blocked on alt job.
OnRequestComplete() will be called which then destroys the alt job but
not the main job. This causes MaybeNotifyFactoryOfCompletion() to not
clean up the controller.
Fix: This CL fixes this leak by resetting |main_job_| if it is blocked
(i.e. hasn't started connecting) when |alternative_job_| succeeds. This
CL adds a regression test for this.
Note that there is no leak when orphaned job completes, because
"job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a
|!request_| check so it's obvious that we always clean up controller in
the orphaned case. This CL adds a test to confirm that we did clean up
controller when orphaned job completes (which passes with and without
the change).
BUG=678768
Review-Url: https://codereview.chromium.org/2619583002
Cr-Commit-Position: refs/heads/master@{#442556}
Committed: https://chromium.googlesource.com/chromium/src/+/e368398a3a5357a8742f3829ef08cc45080fb808
Patch Set 1 #
Total comments: 3
Patch Set 2 : Attempt 2. Need to write tests #Patch Set 3 : self review and need tests #Patch Set 4 : self review and need tests #Patch Set 5 : Add one regression test and one functional test for the orphaned case #Patch Set 6 : self review #
Total comments: 12
Patch Set 7 : Matt's comments #
Total comments: 2
Patch Set 8 : one more find and replace #
Total comments: 8
Patch Set 9 : address wez@ comment #
Messages
Total messages: 56 (31 generated)
xunjieli@chromium.org changed reviewers: + mmenke@chromium.org, rch@chromium.org, zhongyi@chromium.org
I am not sure if this is the right way to solve the problem. It will at least fix the problem that I am seeing locally where there are a big number of blocked main jobs in Chrome's memory. PTAL. Thanks!
The CQ bit was checked by xunjieli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... File net/http/http_stream_factory_impl_job_controller_unittest.cc (right): https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl_job_controller_unittest.cc:426: resolver.pending_jobs()[0]->CompleteNow(net::OK); Note: The existing test fixture is a bit weird. QUIC proxy doesn't work yet, and here we use QUIC to speak to a non-QUIC proxy, which is prohibited. Therefore we will end up getting ERR_NO_SUPPORTED_PROXIES in HttpStreamFactoryImpl::Job::DoResolveProxyComplete for the alt job, but we forcefully call job_controller_->OnStreamReady() for the alt job in the test.. If we run the message loop, we will later get OnStreamFailed() for the alt job because of ERR_NO_SUPPORTED_PROXIES. I find this setup weird, but I don't know a way to work around it.
wez@chromium.org changed reviewers: + wez@chromium.org
https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl_job_controller.cc:149: main_job_.reset(); So IIUC the issue was that MaybeNotifyFactoryOfCompletion() would never notify the factory if the alternative job completed, because the main_job is still non-null, hence the JobController and Job, etc would be leaked? Can we make that clearer in the CL description?
I believe the old behavior here was deliberate, since we wanted to let the QUIC job continue, at least (Not sure if we also want to let the non-QUIC job continue). So if we want to keep the orphan job behavior, seems like we only want to keep around jobs that have actually started connecting. If we just do that, though, looks to me like there's another way to leak this: JobController::OnRequestComplete deletes the bound job and clears bound_job_. It looks like most JobController methods don't correctly clean things up when the non-bound job calls back into it in this case. Looking at the main one, JobController::OnStreamReady - it checks if "job_bound_ && bound_job_ != job" and if so, calls OnOrphanedJobComplete, but once bound_job_ has been cleared, it misses that path. It does call job->ReleaseStream(), so the stream gets destroyed, but I don't see anything that destroys the job or the JobController.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl_job_controller.cc:149: main_job_.reset(); On 2017/01/06 18:21:44, Wez wrote: > So IIUC the issue was that MaybeNotifyFactoryOfCompletion() would never notify > the factory if the alternative job completed, because the main_job is still > non-null, hence the JobController and Job, etc would be leaked? > > Can we make that clearer in the CL description? A big +1 to that. A longer CL description that covers the bug, how it manifested, and what edge cases are being resolved would be a great idea :)
Description was changed from ========== Reset Main and Alt Jobs in HttpStreamiImpl::JobController::OnRequestComplete() This CL fixes a leak where the alternative job succeeds and the main job is still sitting around because the main job is blocked by the alternative job. This CL adds a regression test. BUG=678768 ========== to ========== Reset Main and Alt Jobs in HttpStreamiImpl::JobController::OnRequestComplete() This CL fixes a leak where the alternative job succeeds and the main job is still sitting around because the main job is blocked by the alternative job. FIXME: WRITE BETTER DESCRIPTION This CL adds a regression test. BUG=678768 ==========
Description was changed from ========== Reset Main and Alt Jobs in HttpStreamiImpl::JobController::OnRequestComplete() This CL fixes a leak where the alternative job succeeds and the main job is still sitting around because the main job is blocked by the alternative job. FIXME: WRITE BETTER DESCRIPTION This CL adds a regression test. BUG=678768 ========== to ========== Destroy HttpStreamFactoryImpl::JobController when HttpStreamFactoryImpl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Destroy HttpStreamFactoryImpl::JobController when HttpStreamFactoryImpl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Destroy HttpStreamFactoryImpl::JobController when HttpStreamFactoryImpl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Destroy HttpStreamFactoryImpl::JobController when HttpStreamFactoryImpl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Cleanup HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Cleanup HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|. The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Cleanup HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Cleanup HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When servers advertise alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. There are two branches where we can leak JobController: (1) When |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This CL checks to see if the |main_job_| is blocked, if so, resets it since it hasn't started connecting. (2) When orphaned job completes after OnRequestComplete(), |bound_job_| was cleared, so we won't call OnOrphanedJobComplete() which is supposed to clean up the controller. This CL modifies the condition with which we call OnOrphanedJobComplete() to make sure that controller is cleaned up if |request_| has completed. This CL adds regression tests. BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so the condition is obvious that there is no leak. This CL adds a test to confirm that we did clean up controller when orphaned job completes. BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so the condition is obvious that there is no leak. This CL adds a test to confirm that we did clean up controller when orphaned job completes. BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so the condition is obvious that there is no leak. This CL adds a test to confirm that we did clean up controller when orphaned job completes. BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so the condition is obvious that there is no leak. This CL adds a test to confirm that we did clean up controller when orphaned job completes. BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case.This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case.This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case.This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
On 2017/01/06 18:43:58, mmenke wrote: > I believe the old behavior here was deliberate, since we wanted to let the QUIC > job continue, at least (Not sure if we also want to let the non-QUIC job > continue). > > So if we want to keep the orphan job behavior, seems like we only want to keep > around jobs that have actually started connecting. If we just do that, though, > looks to me like there's another way to leak this: > JobController::OnRequestComplete deletes the bound job and clears bound_job_. > It looks like most JobController methods don't correctly clean things up when > the non-bound job calls back into it in this case. Looking at the main one, > JobController::OnStreamReady - it checks if "job_bound_ && bound_job_ != job" > and if so, calls OnOrphanedJobComplete, but once bound_job_ has been cleared, it > misses that path. It does call job->ReleaseStream(), so the stream gets > destroyed, but I don't see anything that destroys the job or the JobController. Thanks everyone for the review and Matt especially for the ideas and helpful discussion! I have uploaded a new patch (#5) with a regression test. I also edited CL description. Matt: "job_bound_ && bound_job_ != job" will evaluate to true in the orphaned case because nullptr != job and |job_bound_| is never set back to false. I have modified the condition so it's clear and I added a test to confirm the behavior. PTAL. Thanks!
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. There is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case.This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
The CQ bit was checked by xunjieli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on it. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. Bug: We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. Fix: This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The non-test changes LGTM. I'm hoping rch or zhongyi will chime in and review the tests, so I don't have to figure out the test fixture. https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) Should this be: if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? If main_job_wait_time_ is not zero, it means the job hasn't been started yet (It's either waiting to be resumed, or still in the proxy resolution phase)...Assuming there's no case where we set it to some non-zero value after a job's been started. https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:368: if (!request_ || (job_bound_ && bound_job_ != job)) { Suggest a helper method here, IsJobOrphaned(job) const, and using it in place of all these checks.
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 18:37:20, mmenke wrote: > Should this be: > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > (It's either waiting to be resumed, or still in the proxy resolution > phase)...Assuming there's no case where we set it to some non-zero value after a > job's been started. In HttpStreamFactoryImpl::Job::DoInitConnectionImpl(), it's possible that we set main_job_wait_time_ to zero or some non-zero value. Is |main_job_is_blocked_| check not enough? It translates to the same idea right (job not started or waiting for resumed)?
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 18:52:31, xunjieli wrote: > On 2017/01/09 18:37:20, mmenke wrote: > > Should this be: > > > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > > (It's either waiting to be resumed, or still in the proxy resolution > > phase)...Assuming there's no case where we set it to some non-zero value after > a > > job's been started. > > In HttpStreamFactoryImpl::Job::DoInitConnectionImpl(), it's possible that we set > main_job_wait_time_ to zero or some non-zero value. Is |main_job_is_blocked_| > check not enough? It translates to the same idea right (job not started or > waiting for resumed)? It's not the same thing. We set main_job_is_blocked_ to false when we start the timer to actually start the blocked job. It seems weird to delay the start of a Job in favor of just letting the QUIC job complete, and when the QUIC job completes, starting the other job... If we're going to start the other job, regardless, why bother delaying it? It is enough to fix the leak, but I think the behavior is very weird and unexpected.
Thanks! https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 18:56:19, mmenke wrote: > On 2017/01/09 18:52:31, xunjieli wrote: > > On 2017/01/09 18:37:20, mmenke wrote: > > > Should this be: > > > > > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > > > (It's either waiting to be resumed, or still in the proxy resolution > > > phase)...Assuming there's no case where we set it to some non-zero value > after > > a > > > job's been started. > > > > In HttpStreamFactoryImpl::Job::DoInitConnectionImpl(), it's possible that we > set > > main_job_wait_time_ to zero or some non-zero value. Is |main_job_is_blocked_| > > check not enough? It translates to the same idea right (job not started or > > waiting for resumed)? > > It's not the same thing. We set main_job_is_blocked_ to false when we start the > timer to actually start the blocked job. It seems weird to delay the start of a > Job in favor of just letting the QUIC job complete, and when the QUIC job > completes, starting the other job... If we're going to start the other job, > regardless, why bother delaying it? > > It is enough to fix the leak, but I think the behavior is very weird and > unexpected. Done. Got it, Thanks! https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:368: if (!request_ || (job_bound_ && bound_job_ != job)) { On 2017/01/09 18:37:20, mmenke wrote: > Suggest a helper method here, IsJobOrphaned(job) const, and using it in place of > all these checks. Done.
https://codereview.chromium.org/2619583002/diff/120001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/120001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:305: if (!request_ || (job_bound_ && bound_job_ != job)) { IsJobOrphaned(job)
https://codereview.chromium.org/2619583002/diff/120001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/120001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:305: if (!request_ || (job_bound_ && bound_job_ != job)) { On 2017/01/09 19:20:22, mmenke wrote: > IsJobOrphaned(job) Done. Sorry about missing that one.
Thanks for helping fix the bug! https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 18:37:20, mmenke wrote: > Should this be: > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > (It's either waiting to be resumed, or still in the proxy resolution > phase)...Assuming there's no case where we set it to some non-zero value after a > job's been started. Umm, main_job_wait_time is only used to figure out how much time the main job should wait so that when resuming the main job, the timer is set appropriately, the wait time is usually set when QUIC job figures it out in InitConnection. |main_job_is_blocked_| is what we are looking at when we decide whether or not to block the main job if you refer to JobController::ShouldWait(). The case where |main_job_is_blocked_| = false, while we have non-zero wait time is that the JobController::ResumeMainJob() is posted but not executed due to the delayed task. I don't think we should reset main_job here as when ResumeMainJob is executed, the main_job_->Resume() will fire unless you canceled the posted task when resetting the main_job_. https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller_unittest.cc (right): https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller_unittest.cc:376: TEST_F(HttpStreamFactoryImplJobControllerTest, Cool! Thanks for those regression tests!
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 20:09:45, Zhongyi Shi wrote: > On 2017/01/09 18:37:20, mmenke wrote: > > Should this be: > > > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > > (It's either waiting to be resumed, or still in the proxy resolution > > phase)...Assuming there's no case where we set it to some non-zero value after > a > > job's been started. > > Umm, main_job_wait_time is only used to figure out how much time the main job > should wait so that when resuming the main job, the timer is set appropriately, > the wait time is usually set when QUIC job figures it out in InitConnection. > |main_job_is_blocked_| is what we are looking at when we decide whether or not > to block the main job if you refer to JobController::ShouldWait(). > > The case where |main_job_is_blocked_| = false, while we have non-zero wait time > is that the JobController::ResumeMainJob() is posted but not executed due to the > delayed task. I don't think we should reset main_job here as when ResumeMainJob > is executed, the main_job_->Resume() will fire unless you canceled the posted > task when resetting the main_job_. That strikes me as unexpected - why are we delaying starting the job, if not because we won't need the job if the QUIC job succeeds?
/me drives-by. https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:150: // Do not reset |alternative_job_| here and let it run for completion. nit: As written this means "do not (reset job and let it run)" - might be clearer to say "Allow |alternative_job_| to run to completion, rather than resetting it." https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:160: if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero())) nit: Could we wrap the condition after the && up into a simpler getter, e.g. is_resume_main_job_posted(), to make this more readable? https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:436: bool is_job_orphaned = IsJobOrphaned(job); nit: Do you still need the local variable, since you just check it twice below, or could you replace those checks with IsJobOrphaned calls? Or is this also a value that needs to be cached in case the Job is deleted during |request_| notification?
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 20:12:27, mmenke wrote: > On 2017/01/09 20:09:45, Zhongyi Shi wrote: > > On 2017/01/09 18:37:20, mmenke wrote: > > > Should this be: > > > > > > if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero()))? > > > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started yet > > > (It's either waiting to be resumed, or still in the proxy resolution > > > phase)...Assuming there's no case where we set it to some non-zero value > after > > a > > > job's been started. > > > > Umm, main_job_wait_time is only used to figure out how much time the main job > > should wait so that when resuming the main job, the timer is set > appropriately, > > the wait time is usually set when QUIC job figures it out in InitConnection. > > |main_job_is_blocked_| is what we are looking at when we decide whether or not > > to block the main job if you refer to JobController::ShouldWait(). > > > > The case where |main_job_is_blocked_| = false, while we have non-zero wait > time > > is that the JobController::ResumeMainJob() is posted but not executed due to > the > > delayed task. I don't think we should reset main_job here as when > ResumeMainJob > > is executed, the main_job_->Resume() will fire unless you canceled the posted > > task when resetting the main_job_. > > That strikes me as unexpected - why are we delaying starting the job, if not > because we won't need the job if the QUIC job succeeds? We are delaying the tcp job because we want to give priority to QUIC job in the race so that QUIC job can finish the handshake so that QUIC could succeed. https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory.cc... xunjie@: reading at the code, OnOrphanedJobComplete is called when the orphaned job completes. In the case where we set the main_job_is_blocked_ to false and post ResumeMainJob but executing yet, I am more concerned that if we reset the main_job_, when ResumeMainJob is executed, main_job_->Resume() will fire since the main_job is destroyed.
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 23:18:49, Zhongyi Shi wrote: > On 2017/01/09 20:12:27, mmenke wrote: > > On 2017/01/09 20:09:45, Zhongyi Shi wrote: > > > On 2017/01/09 18:37:20, mmenke wrote: > > > > Should this be: > > > > > > > > if (main_job_ && (main_job_is_blocked_ || > !main_job_wait_time_.is_zero()))? > > > > > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started > yet > > > > (It's either waiting to be resumed, or still in the proxy resolution > > > > phase)...Assuming there's no case where we set it to some non-zero value > > after > > > a > > > > job's been started. > > > > > > Umm, main_job_wait_time is only used to figure out how much time the main > job > > > should wait so that when resuming the main job, the timer is set > > appropriately, > > > the wait time is usually set when QUIC job figures it out in InitConnection. > > > |main_job_is_blocked_| is what we are looking at when we decide whether or > not > > > to block the main job if you refer to JobController::ShouldWait(). > > > > > > The case where |main_job_is_blocked_| = false, while we have non-zero wait > > time > > > is that the JobController::ResumeMainJob() is posted but not executed due to > > the > > > delayed task. I don't think we should reset main_job here as when > > ResumeMainJob > > > is executed, the main_job_->Resume() will fire unless you canceled the > posted > > > task when resetting the main_job_. > > > > That strikes me as unexpected - why are we delaying starting the job, if not > > because we won't need the job if the QUIC job succeeds? > > We are delaying the tcp job because we want to give priority to QUIC job in the > race so that QUIC job can finish the handshake so that QUIC could succeed. > > https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory.cc... So what is fundamentally different between cases where we've started the timer and cases we have not?
https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 23:25:12, mmenke wrote: > On 2017/01/09 23:18:49, Zhongyi Shi wrote: > > On 2017/01/09 20:12:27, mmenke wrote: > > > On 2017/01/09 20:09:45, Zhongyi Shi wrote: > > > > On 2017/01/09 18:37:20, mmenke wrote: > > > > > Should this be: > > > > > > > > > > if (main_job_ && (main_job_is_blocked_ || > > !main_job_wait_time_.is_zero()))? > > > > > > > > > > If main_job_wait_time_ is not zero, it means the job hasn't been started > > yet > > > > > (It's either waiting to be resumed, or still in the proxy resolution > > > > > phase)...Assuming there's no case where we set it to some non-zero value > > > after > > > > a > > > > > job's been started. > > > > > > > > Umm, main_job_wait_time is only used to figure out how much time the main > > job > > > > should wait so that when resuming the main job, the timer is set > > > appropriately, > > > > the wait time is usually set when QUIC job figures it out in > InitConnection. > > > > |main_job_is_blocked_| is what we are looking at when we decide whether or > > not > > > > to block the main job if you refer to JobController::ShouldWait(). > > > > > > > > The case where |main_job_is_blocked_| = false, while we have non-zero wait > > > time > > > > is that the JobController::ResumeMainJob() is posted but not executed due > to > > > the > > > > delayed task. I don't think we should reset main_job here as when > > > ResumeMainJob > > > > is executed, the main_job_->Resume() will fire unless you canceled the > > posted > > > > task when resetting the main_job_. > > > > > > That strikes me as unexpected - why are we delaying starting the job, if not > > > because we won't need the job if the QUIC job succeeds? > > > > We are delaying the tcp job because we want to give priority to QUIC job in > the > > race so that QUIC job can finish the handshake so that QUIC could succeed. > > > > > https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory.cc... > > So what is fundamentally different between cases where we've started the timer > and cases we have not? My understanding is: if we've started the timer, then the main job will start to connect if QUIC job doesn't complete in the grace period, and it becomes racing then. If we never started the timer, the QUIC job can run forward until it fails or pending and won't be interrupted.
LGTM! Thanks a lot for fixing this and adding new regression tests! https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:159: if (main_job_ && main_job_is_blocked_) On 2017/01/09 23:39:46, Zhongyi Shi wrote: > On 2017/01/09 23:25:12, mmenke wrote: > > On 2017/01/09 23:18:49, Zhongyi Shi wrote: > > > On 2017/01/09 20:12:27, mmenke wrote: > > > > On 2017/01/09 20:09:45, Zhongyi Shi wrote: > > > > > On 2017/01/09 18:37:20, mmenke wrote: > > > > > > Should this be: > > > > > > > > > > > > if (main_job_ && (main_job_is_blocked_ || > > > !main_job_wait_time_.is_zero()))? > > > > > > > > > > > > If main_job_wait_time_ is not zero, it means the job hasn't been > started > > > yet > > > > > > (It's either waiting to be resumed, or still in the proxy resolution > > > > > > phase)...Assuming there's no case where we set it to some non-zero > value > > > > after > > > > > a > > > > > > job's been started. > > > > > > > > > > Umm, main_job_wait_time is only used to figure out how much time the > main > > > job > > > > > should wait so that when resuming the main job, the timer is set > > > > appropriately, > > > > > the wait time is usually set when QUIC job figures it out in > > InitConnection. > > > > > |main_job_is_blocked_| is what we are looking at when we decide whether > or > > > not > > > > > to block the main job if you refer to JobController::ShouldWait(). > > > > > > > > > > The case where |main_job_is_blocked_| = false, while we have non-zero > wait > > > > time > > > > > is that the JobController::ResumeMainJob() is posted but not executed > due > > to > > > > the > > > > > delayed task. I don't think we should reset main_job here as when > > > > ResumeMainJob > > > > > is executed, the main_job_->Resume() will fire unless you canceled the > > > posted > > > > > task when resetting the main_job_. > > > > > > > > That strikes me as unexpected - why are we delaying starting the job, if > not > > > > because we won't need the job if the QUIC job succeeds? > > > > > > We are delaying the tcp job because we want to give priority to QUIC job in > > the > > > race so that QUIC job can finish the handshake so that QUIC could succeed. > > > > > > > > > https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory.cc... > > > > So what is fundamentally different between cases where we've started the timer > > and cases we have not? > > My understanding is: if we've started the timer, then the main job will start to > connect if QUIC job doesn't complete in the grace period, and it becomes racing > then. If we never started the timer, the QUIC job can run forward until it fails > or pending and won't be interrupted. Looks like both jobs and request will be reset here which means the JobController will then be destroyed. Since we used WeakPtr when post ResumeMainJob. This should have no problem. Sorry for introducing confusions.
Thanks everyone. I will check in this now. Please feel free to leave comments and I will address those as needed in follow-up CLs. https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:150: // Do not reset |alternative_job_| here and let it run for completion. On 2017/01/09 22:21:54, Wez wrote: > nit: As written this means "do not (reset job and let it run)" - might be > clearer to say "Allow |alternative_job_| to run to completion, rather than > resetting it." Done. https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:160: if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero())) On 2017/01/09 22:21:54, Wez wrote: > nit: Could we wrap the condition after the && up into a simpler getter, e.g. > is_resume_main_job_posted(), to make this more readable? The condition is used in exactly one place. I'll keep it as it is if you don't strongly about making a wrapper method. https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:436: bool is_job_orphaned = IsJobOrphaned(job); On 2017/01/09 22:21:54, Wez wrote: > nit: Do you still need the local variable, since you just check it twice below, > or could you replace those checks with IsJobOrphaned calls? Or is this also a > value that needs to be cached in case the Job is deleted during |request_| > notification? I think it's okay not to cache the check. Since I am no expert in this code, I don't want to introduce any unrelated behavior change accidentally. Let's keep it as it is.
The CQ bit was checked by xunjieli@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mmenke@chromium.org, zhongyi@chromium.org Link to the patchset: https://codereview.chromium.org/2619583002/#ps160001 (title: "address wez@ comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... File net/http/http_stream_factory_impl_job_controller.cc (right): https://codereview.chromium.org/2619583002/diff/140001/net/http/http_stream_f... net/http/http_stream_factory_impl_job_controller.cc:160: if (main_job_ && (main_job_is_blocked_ || !main_job_wait_time_.is_zero())) On 2017/01/10 01:31:54, xunjieli wrote: > On 2017/01/09 22:21:54, Wez wrote: > > nit: Could we wrap the condition after the && up into a simpler getter, e.g. > > is_resume_main_job_posted(), to make this more readable? > > The condition is used in exactly one place. I'll keep it as it is if you don't > strongly about making a wrapper method. My suggestion was for readability; in general where you're inferring a condition X through a complex conditional depending on Y and Z, it's helpful to wrap the condition in a getter, or assign the result to a suitably-named local variable, for clarity. Up to you, though.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
The CQ bit was checked by xunjieli@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 160001, "attempt_start_ts": 1484050264623800,
"parent_rev": "a3905548d975433047cc719565fe42393d6b022c", "commit_rev":
"e368398a3a5357a8742f3829ef08cc45080fb808"}
Message was sent while issue was closed.
Description was changed from ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. Bug: We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. Fix: This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 ========== to ========== Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. Bug: We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. Fix: This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG=678768 Review-Url: https://codereview.chromium.org/2619583002 Cr-Commit-Position: refs/heads/master@{#442556} Committed: https://chromium.googlesource.com/chromium/src/+/e368398a3a5357a8742f3829ef08... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as https://chromium.googlesource.com/chromium/src/+/e368398a3a5357a8742f3829ef08... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
