|
|
DescriptionFix two ContentHashFetcherTest.* tests' flakiness.
ContentHashFetcherJob::url_fetcher_ is always created on
ContentHashFetcherJob::creation_thread_ but it was possible for
it to be destroyed on blocking pool with the destruction of
ContentHashFetcherJob. This causes test flakiness, and this CL
makes sure we always destroy url_fetcher_ on correct thread.
As an optimization for the most common case, also destroy
ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier
than above). However, note that because jobs can be cancelled, we
still need to make sure to destroy url_fetcher_ on correct thread from
ContentHashFetcherJob destructor.
BUG=702300
Test=No visible change. Fixing test flakiness and potential ref issue in
content verification code.
To repro the flake, I have put my machine on heavy load (compiling
chromium without goma and -j 1024), then ran these tests repeatedly
with --gtest_repeat flag. Tests will be flaky without this change, but
they always pass with this CL.
Review-Url: https://codereview.chromium.org/2815243002
Cr-Commit-Position: refs/heads/master@{#465468}
Committed: https://chromium.googlesource.com/chromium/src/+/825099a68908767ecd35d79b0888887c4ffccd0b
Patch Set 1 #
Total comments: 5
Patch Set 2 : REWORK #
Total comments: 1
Messages
Total messages: 24 (15 generated)
Description was changed from ========== Fix two ContentHashFetcherTest.* test flakiness. It turns out that there was a ref to ContentVerifyJob in its URLFetcher instance. Clean that ref by deleting URLFetcher when ContentVerifyJob is done, specifically when either of 1. ContentVerifyJob is cancelled. 2. ContentVerifyJob is complete and it will call CompletionCallback. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ========== to ========== Fix two ContentHashFetcherTest.* tests' flakiness. It turns out that there was a ref to ContentVerifyJob in its URLFetcher instance. Clean that ref by deleting URLFetcher when ContentVerifyJob is done, specifically when either of 1. ContentVerifyJob is cancelled. 2. ContentVerifyJob is complete and it will call CompletionCallback. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ==========
lazyboy@chromium.org changed reviewers: + rdevlin.cronin@chromium.org
Hi Devlin, this ref detaching seems to make the two tests always pass now.
The CQ bit was checked by lazyboy@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... File extensions/browser/content_hash_fetcher.cc (right): https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... extensions/browser/content_hash_fetcher.cc:448: url_fetcher_.reset(); btw, this is the reset that's necessary for the test to pass.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... File extensions/browser/content_hash_fetcher.cc (right): https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... extensions/browser/content_hash_fetcher.cc:211: url_fetcher_.reset(); nit: We can get rid of cancelled_ now and use !url_fetcher_ instead. https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... extensions/browser/content_hash_fetcher.cc:448: url_fetcher_.reset(); On 2017/04/13 02:06:35, lazyboy wrote: > btw, this is the reset that's necessary for the test to pass. Where is the reference to ContentHashFetcherJob in URLFetcher? I only see it passed in as the URLFetcherDelegate (as a raw ptr, which wouldn't add a reference), and don't see any callbacks. What am I missing?
Thanks for checking. Reply below, I'll address the other comment once we settle on this... https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... File extensions/browser/content_hash_fetcher.cc (right): https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... extensions/browser/content_hash_fetcher.cc:448: url_fetcher_.reset(); On 2017/04/13 15:12:18, Devlin wrote: > On 2017/04/13 02:06:35, lazyboy wrote: > > btw, this is the reset that's necessary for the test to pass. > > Where is the reference to ContentHashFetcherJob in URLFetcher? I only see it > passed in as the URLFetcherDelegate (as a raw ptr, which wouldn't add a > reference), and don't see any callbacks. What am I missing? You're right, I got too excited yesterday and didn't properly check that the URLFetcherDelegate ptr is raw ptr :) The fix here occurs not because of ref, but because we kill url_fetcher_ early, when the task runner check here https://cs.chromium.org/chromium/src/net/url_request/url_fetcher_core.cc?rcl=... holds true: """ void URLFetcherCore::Stop() { if (delegate_task_runner_) // May be NULL in tests. DCHECK(delegate_task_runner_->RunsTasksOnCurrentThread()); ... """ This is the check that fails when the test fails. So, I need to reword the CL description... WDYT?
https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... File extensions/browser/content_hash_fetcher.cc (right): https://codereview.chromium.org/2815243002/diff/1/extensions/browser/content_... extensions/browser/content_hash_fetcher.cc:448: url_fetcher_.reset(); On 2017/04/13 16:06:24, lazyboy wrote: > On 2017/04/13 15:12:18, Devlin wrote: > > On 2017/04/13 02:06:35, lazyboy wrote: > > > btw, this is the reset that's necessary for the test to pass. > > > > Where is the reference to ContentHashFetcherJob in URLFetcher? I only see it > > passed in as the URLFetcherDelegate (as a raw ptr, which wouldn't add a > > reference), and don't see any callbacks. What am I missing? > > You're right, I got too excited yesterday and didn't properly check that the > URLFetcherDelegate ptr is raw ptr :) > The fix here occurs not because of ref, but because we kill url_fetcher_ early, > when the task runner check here > https://cs.chromium.org/chromium/src/net/url_request/url_fetcher_core.cc?rcl=... > holds true: > """ > void URLFetcherCore::Stop() { > if (delegate_task_runner_) // May be NULL in tests. > DCHECK(delegate_task_runner_->RunsTasksOnCurrentThread()); > ... > """ > This is the check that fails when the test fails. > So, I need to reword the CL description... WDYT? Hmm... It makes sense why that would solve the problem, but then the issue is that ContentHashFetcherJob is still potentially leaking. It *looks* like the only refs should be held by the callbacks it creates internally (which should all be cleaned up by the time it finishes) and by the ContentHashFetcher itself (which should release the ref in JobFinished()). Two thoughts: 1. If that's right, the ContentHashFetcherJob should be released, and is only not destructed immediately because of the DeleteSoon() call (as opposed to `delete`). If this is the case, using a base::RunLoop().RunUntilIdle() call at the end of the test (or even right after the job finishes) should solve the problem. It would be good to check that that's the case, since otherwise we may have a leak on our hands (which could be bad for a few reasons). 2. Assuming that the problem is just that job is living a bit too long, then we're left with two possible fixes: this, and the base::RunLoop().RunUntilIdle(). Unfortunately, I don't love either one, since they both seem targeted at fixing the test, rather than a potential problem in production (though I'm not sure this can ever actually happen in production? Perhaps not). This one deletes an object early because a thread is destructed in the test tear down, and the other ensures this whole object is deleted before tear down. I'm fine with either one, but we should probably add a comment explaining whichever path we choose.
Description was changed from ========== Fix two ContentHashFetcherTest.* tests' flakiness. It turns out that there was a ref to ContentVerifyJob in its URLFetcher instance. Clean that ref by deleting URLFetcher when ContentVerifyJob is done, specifically when either of 1. ContentVerifyJob is cancelled. 2. ContentVerifyJob is complete and it will call CompletionCallback. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ========== to ========== Fix two ContentHashFetcherTest.* tests' flakiness. ContentHashFetcherJob::url_fetcher_ is always created on ContentHashFetcherJob::creation_thread_ but it was possible for it to be destroyed on blocking pool with the destruction of ContentHashFetcherJob. This causes test flakiness, and this CL makes sure we always destroy url_fetcher_ on correct thread. As an optimization for the most common case, also destroy ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier than above). However, note that because jobs can be cancelled, we still need to make sure to destroy url_fetcher_ on correct thread from ContentHashFetcherJob destructor. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ==========
Description was changed from ========== Fix two ContentHashFetcherTest.* tests' flakiness. ContentHashFetcherJob::url_fetcher_ is always created on ContentHashFetcherJob::creation_thread_ but it was possible for it to be destroyed on blocking pool with the destruction of ContentHashFetcherJob. This causes test flakiness, and this CL makes sure we always destroy url_fetcher_ on correct thread. As an optimization for the most common case, also destroy ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier than above). However, note that because jobs can be cancelled, we still need to make sure to destroy url_fetcher_ on correct thread from ContentHashFetcherJob destructor. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ========== to ========== Fix two ContentHashFetcherTest.* tests' flakiness. ContentHashFetcherJob::url_fetcher_ is always created on ContentHashFetcherJob::creation_thread_ but it was possible for it to be destroyed on blocking pool with the destruction of ContentHashFetcherJob. This causes test flakiness, and this CL makes sure we always destroy url_fetcher_ on correct thread. As an optimization for the most common case, also destroy ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier than above). However, note that because jobs can be cancelled, we still need to make sure to destroy url_fetcher_ on correct thread from ContentHashFetcherJob destructor. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ==========
CL updated to destroy URLFetcher correctly, a little bit different than what we exactly discussed, but should be good to review. Thanks. https://codereview.chromium.org/2815243002/diff/20001/extensions/browser/cont... File extensions/browser/content_hash_fetcher.cc (right): https://codereview.chromium.org/2815243002/diff/20001/extensions/browser/cont... extensions/browser/content_hash_fetcher.cc:303: // Delete |url_fetcher_| once we no longer need it. As we discussed on vc, this is the deletion. However, I realized this isn't enough, because ContentHashFetcherJob can be Cancel()ed from any thread. Therefore I've also added DeleteSoon to cover all the cases. This change isn't a requirement anymore, but I thought it would be good to do. I've also thought of making ContentHashFetcherJob DeleteOnUI at first, but that wouldn't always work because we create ContentHashFetcherJob on dynamic creation_thread_, which can or cannot be UI I think.
The CQ bit was checked by lazyboy@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm, thanks for the great investigative work here!
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by lazyboy@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 20001, "attempt_start_ts": 1492567672658490, "parent_rev": "c961656d2eb17624ee36f3bde2617aa73fb6f246", "commit_rev": "825099a68908767ecd35d79b0888887c4ffccd0b"}
Message was sent while issue was closed.
Description was changed from ========== Fix two ContentHashFetcherTest.* tests' flakiness. ContentHashFetcherJob::url_fetcher_ is always created on ContentHashFetcherJob::creation_thread_ but it was possible for it to be destroyed on blocking pool with the destruction of ContentHashFetcherJob. This causes test flakiness, and this CL makes sure we always destroy url_fetcher_ on correct thread. As an optimization for the most common case, also destroy ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier than above). However, note that because jobs can be cancelled, we still need to make sure to destroy url_fetcher_ on correct thread from ContentHashFetcherJob destructor. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. ========== to ========== Fix two ContentHashFetcherTest.* tests' flakiness. ContentHashFetcherJob::url_fetcher_ is always created on ContentHashFetcherJob::creation_thread_ but it was possible for it to be destroyed on blocking pool with the destruction of ContentHashFetcherJob. This causes test flakiness, and this CL makes sure we always destroy url_fetcher_ on correct thread. As an optimization for the most common case, also destroy ContentHashFetcherJob::url_fetcher_ once we're done with it (earlier than above). However, note that because jobs can be cancelled, we still need to make sure to destroy url_fetcher_ on correct thread from ContentHashFetcherJob destructor. BUG=702300 Test=No visible change. Fixing test flakiness and potential ref issue in content verification code. To repro the flake, I have put my machine on heavy load (compiling chromium without goma and -j 1024), then ran these tests repeatedly with --gtest_repeat flag. Tests will be flaky without this change, but they always pass with this CL. Review-Url: https://codereview.chromium.org/2815243002 Cr-Commit-Position: refs/heads/master@{#465468} Committed: https://chromium.googlesource.com/chromium/src/+/825099a68908767ecd35d79b0888... ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://chromium.googlesource.com/chromium/src/+/825099a68908767ecd35d79b0888... |