|
|
Created:
10 years, 1 month ago by Zachary Kuznia Modified:
9 years, 7 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org, Alexander Potapenko, pam+watch_chromium.org, Paweł Hajdan Jr., stuartmorgan+watch_chromium.org, Timur Iskhodzhanov Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionUpdate CertVerifier to watch for the origin loop's destruction, so that
it doesn't crash if the SSLClientSocket is leaked.
BUG=chromium-os:8179
BUG=62314
TEST=None
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=65349
Patch Set 1 #Patch Set 2 : Code Review #
Total comments: 2
Patch Set 3 : Add a missing comment #Patch Set 4 : Remove the observer on cleanup #
Total comments: 8
Patch Set 5 : Code review fixes #Patch Set 6 : Remove braces. #
Total comments: 4
Patch Set 7 : Code review fixes #Patch Set 8 : Address valgrind issue #Patch Set 9 : Remove net_unittests gtest exclusions. #
Total comments: 8
Patch Set 10 : Code review fixes #
Total comments: 5
Patch Set 11 : Fix scoping on a lock #Patch Set 12 : Proper scoping of lock, static cast. #
Total comments: 2
Patch Set 13 : Remove null check #
Total comments: 1
Patch Set 14 : To mutable #
Messages
Total messages: 25 (0 generated)
The change looks good tome, but please wait for wtc@'s reviews. http://codereview.chromium.org/4299001/diff/2001/3001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/2001/3001#newcode99 net/base/cert_verifier.cc:99: virtual void WillDestroyCurrentMessageLoop() { Would be nice to add a comment like: // MessageLoop::DestructionObserver override.
http://codereview.chromium.org/4299001/diff/2001/3001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/2001/3001#newcode99 net/base/cert_verifier.cc:99: virtual void WillDestroyCurrentMessageLoop() { On 2010/11/02 06:57:02, satorux1 wrote: > Would be nice to add a comment like: > > // MessageLoop::DestructionObserver override. > Done.
Can we explicitly kill the SSLClientSocket on sync thread shutdown rather than letting it leak? Or at the least, use a MessageLoopProxy instead of a DestructionObserver? The problem with DestructionObserver is that there is no good way to express destruction observer ordering. If you're trying to do cleanup on shutdown, you can't clean things up that are supposed to be destroyed in a DestructionObserver, since there's no way to say "Make sure _my_ destruction observer which deletes Object X (which is used by Object A and B) is run _after_ the destruction observers for Objects A and B". The only choice is then to leak everything, which we choose not to do in Chrome since it gives valgrind a headache. For the IO thread, the ResourceDispatcherHost explicitly kills all its URLRequests on shutdown, and IOThread::CleanUp() kills URLFetchers, OCSP's URLRequests, and ProxyScriptFetcher's URLRequests. And after these are all dead, we will kill the URLRequestContext (not quite done yet, but the gone is moving in that direction). On Tue, Nov 2, 2010 at 7:03 AM, <zork@chromium.org> wrote: > > http://codereview.chromium.org/4299001/diff/2001/3001 > File net/base/cert_verifier.cc (right): > > http://codereview.chromium.org/4299001/diff/2001/3001#newcode99 > net/base/cert_verifier.cc:99: virtual void > WillDestroyCurrentMessageLoop() { > On 2010/11/02 06:57:02, satorux1 wrote: > >> Would be nice to add a comment like: >> > > // MessageLoop::DestructionObserver override. >> > > > Done. > > > http://codereview.chromium.org/4299001/show >
zork: I have some nits and a question about your change to the destructor below. Please get willchan's review as he knows MessageLoop::DestructionObserver much better than I do. willchan: I've added you as a reviewer for this CL. Thanks! http://codereview.chromium.org/4299001/diff/9001/10001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/9001/10001#newcode37 net/base/cert_verifier.cc:37: if (origin_loop_) { Nit: omit the curly braces {} for one-liner "if" bodies. http://codereview.chromium.org/4299001/diff/9001/10001#newcode96 net/base/cert_verifier.cc:96: origin_loop_ = NULL; Nit: move origin_loop_ = NULL; into the "if" statement. Make the same change to ~Request() at line 114 below. http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 net/base/cert_verifier.cc:109: ~Request() { Why is this change necessary? The destructor's code is very similar to the Cancel() method now. It would be tempting to just have ~Request() call Cancel(). But why doesn't ~Request() call Cancel() in the original code? There must be a reason for that... Perhaps the caller is supposed to Cancel a request before deleting it?
I should clarify that as a crash fix, using DestructionObserver (or better yet, MessageLoopProxy) is a reasonable temporary measure. But it doesn't change the fact that this code is just trying to make the leak not crash, and the leak itself needs to be fixed. On 2010/11/02 17:56:14, wtc wrote: > zork: I have some nits and a question about your change to > the destructor below. Please get willchan's review as he > knows MessageLoop::DestructionObserver much better than I do. > > willchan: I've added you as a reviewer for this CL. Thanks! > > http://codereview.chromium.org/4299001/diff/9001/10001 > File net/base/cert_verifier.cc (right): > > http://codereview.chromium.org/4299001/diff/9001/10001#newcode37 > net/base/cert_verifier.cc:37: if (origin_loop_) { > Nit: omit the curly braces {} for one-liner "if" bodies. > > http://codereview.chromium.org/4299001/diff/9001/10001#newcode96 > net/base/cert_verifier.cc:96: origin_loop_ = NULL; > Nit: move > origin_loop_ = NULL; > into the "if" statement. Make the same change to ~Request() > at line 114 below. > > http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 > net/base/cert_verifier.cc:109: ~Request() { > Why is this change necessary? > > The destructor's code is very similar to the Cancel() method > now. It would be tempting to just have ~Request() call > Cancel(). But why doesn't ~Request() call Cancel() in the > original code? There must be a reason for that... Perhaps > the caller is supposed to Cancel a request before deleting > it?
willchan: It doesn't look like we can use MessageLoopProxy, since it's created from a Chrome thread, and we have uses of SSLClientSocket in libjingle that are on non-Chrome threads. I'd also rather avoid trying to delete the SSLClientSocket on thread destruction, due to the ordering issues you mentioned. Since I'm only trying to prevent a leak from causing a crash, I don't want to add unneeded complexity to shutdown. http://codereview.chromium.org/4299001/diff/9001/10001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/9001/10001#newcode37 net/base/cert_verifier.cc:37: if (origin_loop_) { On 2010/11/02 17:56:14, wtc wrote: > Nit: omit the curly braces {} for one-liner "if" bodies. Done. http://codereview.chromium.org/4299001/diff/9001/10001#newcode96 net/base/cert_verifier.cc:96: origin_loop_ = NULL; On 2010/11/02 17:56:14, wtc wrote: > Nit: move > origin_loop_ = NULL; > into the "if" statement. Make the same change to ~Request() > at line 114 below. Done. http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 net/base/cert_verifier.cc:109: ~Request() { On 2010/11/02 17:56:14, wtc wrote: > Why is this change necessary? > > The destructor's code is very similar to the Cancel() method > now. It would be tempting to just have ~Request() call > Cancel(). But why doesn't ~Request() call Cancel() in the > original code? There must be a reason for that... Perhaps > the caller is supposed to Cancel a request before deleting > it? I considered just calling Cancel, but I avoided it because it changes the semantics, which I don't want to do. The reason we need to remove the observer is because there are times that the request isn't cancelled before its deleted, so this remains an observer of the thread destruction unless we remove it here.
As discussed offline, I'm fine with this as a temporary solution to prevent a leak from causing a crash. But we shouldn't be leaking the SSLClientSocket in the first place. That's the main bug. When that gets fixed, I'd prefer to remove this change to make it more obvious (shutdown crash reports) we're leaking stuff again so we know to fix it. The bug is either in the core part of the network stack (since the main external interface is via URLRequest, and we don't see URLRequest leaks that I know of) or in sync which uses the SSLClientSocket directly instead of via URLRequest.
LGTM. This code is very subtle, so I had to spend some time to remind myself how it works. http://codereview.chromium.org/4299001/diff/9001/10001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 net/base/cert_verifier.cc:109: ~Request() { On 2010/11/04 05:07:08, Zachary Kuznia wrote: > > I considered just calling Cancel, but I avoided it because it changes the > semantics, which I don't want to do. But the only difference between Cancel() and the destructor is that Cancel() also has verifier_ = NULL; It should be harmless for the destructor to set verifier_ to NULL. http://codereview.chromium.org/4299001/diff/18001/19001#newcode37 net/base/cert_verifier.cc:37: if (origin_loop_) { Is it possible for MessageLoop::current() to return NULL? http://codereview.chromium.org/4299001/diff/18001/19001#newcode112 net/base/cert_verifier.cc:112: origin_loop_->RemoveDestructionObserver(this); Nit: it's not necessary to set origin_loop_ to NULL in the destructor.
http://codereview.chromium.org/4299001/diff/9001/10001 File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 net/base/cert_verifier.cc:109: ~Request() { On 2010/11/04 22:34:11, wtc wrote: > On 2010/11/04 05:07:08, Zachary Kuznia wrote: > > > > I considered just calling Cancel, but I avoided it because it changes the > > semantics, which I don't want to do. > > But the only difference between Cancel() and the destructor > is that Cancel() also has > verifier_ = NULL; > It should be harmless for the destructor to set verifier_ > to NULL. Done. http://codereview.chromium.org/4299001/diff/18001/19001#newcode37 net/base/cert_verifier.cc:37: if (origin_loop_) { On 2010/11/04 22:34:11, wtc wrote: > Is it possible for MessageLoop::current() to return NULL? It can, if we're on a thread that doesn't have a message loop. http://codereview.chromium.org/4299001/diff/18001/19001#newcode112 net/base/cert_verifier.cc:112: origin_loop_->RemoveDestructionObserver(this); On 2010/11/04 22:34:11, wtc wrote: > Nit: it's not necessary to set origin_loop_ to NULL in the > destructor. Replaced with Cancel();
SSLClientSocketTest.Connect has started failing under Valgrind and ThreadSanitizer after this commit; see http://crbug.com/62314 Please watch the memory waterfall after commiting On 2010/11/05 08:15:15, Zachary Kuznia wrote: > http://codereview.chromium.org/4299001/diff/9001/10001 > File net/base/cert_verifier.cc (right): > > http://codereview.chromium.org/4299001/diff/9001/10001#newcode109 > net/base/cert_verifier.cc:109: ~Request() { > On 2010/11/04 22:34:11, wtc wrote: > > On 2010/11/04 05:07:08, Zachary Kuznia wrote: > > > > > > I considered just calling Cancel, but I avoided it because it changes the > > > semantics, which I don't want to do. > > > > But the only difference between Cancel() and the destructor > > is that Cancel() also has > > verifier_ = NULL; > > It should be harmless for the destructor to set verifier_ > > to NULL. > > Done. > > http://codereview.chromium.org/4299001/diff/18001/19001#newcode37 > net/base/cert_verifier.cc:37: if (origin_loop_) { > On 2010/11/04 22:34:11, wtc wrote: > > Is it possible for MessageLoop::current() to return NULL? > > It can, if we're on a thread that doesn't have a message loop. > > http://codereview.chromium.org/4299001/diff/18001/19001#newcode112 > net/base/cert_verifier.cc:112: origin_loop_->RemoveDestructionObserver(this); > On 2010/11/04 22:34:11, wtc wrote: > > Nit: it's not necessary to set origin_loop_ to NULL in the > > destructor. > > Replaced with Cancel();
My original patch DCHECKed on valgrind due to removing the observer on the wrong thread, and was reverted. I added a small change to ensure the observer is removed on origin_loop_'s thread. Please have another look. Thanks!
Regarding the net/base/ part - leaving the review to wtc. Please also remove the gtest exclude filters from tools/valgrind/gtest_exclude/net_unittests.gtest_mac.txt in this patch On 2010/11/09 08:04:22, Zachary Kuznia wrote: > My original patch DCHECKed on valgrind due to removing the observer on the wrong > thread, and was reverted. I added a small change to ensure the observer is > removed on origin_loop_'s thread. > > Please have another look. > > Thanks!
On 2010/11/09 09:17:55, Timur Iskhodzhanov wrote: > Regarding the net/base/ part - leaving the review to wtc. > > Please also remove the gtest exclude filters from > tools/valgrind/gtest_exclude/net_unittests.gtest_mac.txt > in this patch Done.
tools/valgrind part - LGTM please add the BUG=62314 line to the description On 2010/11/09 10:52:11, Zachary Kuznia wrote: > On 2010/11/09 09:17:55, Timur Iskhodzhanov wrote: > > Regarding the net/base/ part - leaving the review to wtc. > > > > Please also remove the gtest exclude filters from > > tools/valgrind/gtest_exclude/net_unittests.gtest_mac.txt > > in this patch > > Done.
zork: this is the first time I read code that uses the Traits of RefCountedThreadSafe, but after studying the CL carefully, I think I understand it correctly. Please check my comments below to verify my understanding. I believe it is possible to define Cancel() in the old way because Cancel() is only called on the origin thread. See my first comment below. http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:62: if (origin_loop_ && verifier_) { BUG: Is this change necessary? DoCallback will return immediately if verifier_ is NULL (see lines 76-77 below). verifier_ can only be accessed on the origin thread (see lines 126-127 below). I guess you're trying to avoid the PostTask call after the Request has been Canceled (because Cancel no longer sets origin_loop_ to NULL)? If so, we should fix this problem by changing Cancel() back to what it was in an earlier version of this CL: void Cancel() { verifier_ = NULL; AutoLock locked(origin_loop_lock_); if (origin_loop_) { origin_loop_->RemoveDestructionObserver(this); origin_loop_ = NULL; } } http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:105: ~Request() { Please document that if origin_loop_ is not NULL, Request must be destroyed on the origin thread, and RequestTraits/OnDestruct ensures that. Did I understand this point correctly? http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:107: if (origin_loop_) { Nit: omit the curly braces.
http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:112: void OnDestruct() const { I am concerned about this. DeleteSoon() will post a DeleteTask. The problem is, if this is happening in shutdown (which is the original bug, a shutdown crash), then |origin_loop_| may still be alive, but is stopping. So the task may end up in the loop, but it may not be run. In this case, the DeleteTask will get deleted, but it won't call delete on the Request, so it will get leaked and we will get valgrind errors. That's better than crashing, but it'll cause memory bots to go red :( Actually, there's a more serious bug here. |origin_loop_| is not locked. The DeleteSoon() call may crash as you may be calling this on a deleted object. You need to lock this call. Also, this OnDestruct() as written will post to |origin_loop_| even if we're already on the thread with the |origin_loop_|. That seems unnecessary and seems more likely to cause the race that leads to the pending DeleteTask being deleted that I explained above.
http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:62: if (origin_loop_ && verifier_) { On 2010/11/10 22:18:20, wtc wrote: > BUG: Is this change necessary? > > DoCallback will return immediately if verifier_ is NULL > (see lines 76-77 below). verifier_ can only be accessed on > the origin thread (see lines 126-127 below). > > I guess you're trying to avoid the PostTask call after > the Request has been Canceled (because Cancel no longer > sets origin_loop_ to NULL)? > > If so, we should fix this problem by changing Cancel() > back to what it was in an earlier version of this CL: > > void Cancel() { > verifier_ = NULL; > > AutoLock locked(origin_loop_lock_); > if (origin_loop_) { > origin_loop_->RemoveDestructionObserver(this); > origin_loop_ = NULL; > } > } I was trying to avoid a DCHECK if Cancel was called on a different thread, but if we don't have to worry about that, the original version is fine. Fixed. http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:105: ~Request() { On 2010/11/10 22:18:20, wtc wrote: > Please document that if origin_loop_ is not NULL, Request > must be destroyed on the origin thread, and > RequestTraits/OnDestruct ensures that. Did I understand this > point correctly? That's correct. Added a comment to OnDestruct. http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:107: if (origin_loop_) { On 2010/11/10 22:18:20, wtc wrote: > Nit: omit the curly braces. Moved back to Cancel(). http://codereview.chromium.org/4299001/diff/34001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:112: void OnDestruct() const { On 2010/11/10 22:33:17, willchan wrote: > I am concerned about this. DeleteSoon() will post a DeleteTask. The problem > is, if this is happening in shutdown (which is the original bug, a shutdown > crash), then |origin_loop_| may still be alive, but is stopping. So the task > may end up in the loop, but it may not be run. In this case, the DeleteTask > will get deleted, but it won't call delete on the Request, so it will get leaked > and we will get valgrind errors. That's better than crashing, but it'll cause > memory bots to go red :( > > Actually, there's a more serious bug here. |origin_loop_| is not locked. The > DeleteSoon() call may crash as you may be calling this on a deleted object. You > need to lock this call. > > Also, this OnDestruct() as written will post to |origin_loop_| even if we're > already on the thread with the |origin_loop_|. That seems unnecessary and seems > more likely to cause the race that leads to the pending DeleteTask being deleted > that I explained above. Added the lock, and changed the if() so that it's deleted immediately if called on the origin thread.
LGTM. Would be nice to get willchan's OK, but you can check this in first as I think you've addressed willchan's concern. http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:115: // thread. Using OnDestruct is called by RequestTraits to ensure this Nit: remove "Using".
http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:118: AutoLock locked(origin_loop_lock_); Won't this crash? On line 122, you delete this. When we exit the scope, |origin_loop_lock_| will already be deleted. So AutoLock will try to unlock a deleted lock. That's broken.
http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:118: AutoLock locked(origin_loop_lock_); willchan: you're right. Good catch. By the way, can a 'const' method delete 'this'? zork: please fix this before you check this in. You can try: void OnDestruct() const { { AutoLock locked(origin_loop_lock_); if (origin_loop_ && origin_loop_ != MessageLoop::current()) { origin_loop_->DeleteSoon(FROM_HERE, this); return; } } delete this; } Or you can call Lock() and Unlock() on origin_loop_lock_ manually.
http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:115: // thread. Using OnDestruct is called by RequestTraits to ensure this On 2010/11/11 22:24:11, wtc wrote: > Nit: remove "Using". Done. http://codereview.chromium.org/4299001/diff/42001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:118: AutoLock locked(origin_loop_lock_); On 2010/11/12 00:44:00, wtc wrote: > willchan: you're right. Good catch. By the way, can a > 'const' method delete 'this'? > > zork: please fix this before you check this in. You can > try: > void OnDestruct() const { > { > AutoLock locked(origin_loop_lock_); > if (origin_loop_ && origin_loop_ != MessageLoop::current()) { > origin_loop_->DeleteSoon(FROM_HERE, this); > return; > } > } > delete this; > } > > Or you can call Lock() and Unlock() on origin_loop_lock_ > manually. Fixed. You can actually delete 'this' in a const method, but you can't access the lock. I seem to have uploaded a stale file last time, I actually needed to add a const_cast to the Destruct() method below.
LGTM. willchan: could you do the honor of signing off on this CL? http://codereview.chromium.org/4299001/diff/51001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/51001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:38: if (origin_loop_) Let's remove this null pointer check for origin_loop_. If origin_loop_ is NULL, CertVerifier won't work because the origin thread won't be able to receive the result.
http://codereview.chromium.org/4299001/diff/51001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/51001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:38: if (origin_loop_) On 2010/11/15 23:57:15, wtc wrote: > Let's remove this null pointer check for origin_loop_. > If origin_loop_ is NULL, CertVerifier won't work because > the origin thread won't be able to receive the result. Done.
LGTM http://codereview.chromium.org/4299001/diff/57001/net/base/cert_verifier.cc File net/base/cert_verifier.cc (right): http://codereview.chromium.org/4299001/diff/57001/net/base/cert_verifier.cc#n... net/base/cert_verifier.cc:116: void OnDestruct() { Why is this non-const? Please make |origin_loop_lock_| mutable instead and remove the const_cast<>. |