|
|
Chromium Code Reviews
DescriptionTry to fix valgrind error in nss_ocsp.cc
I think valgrind error would happen when IO thread was terminated
before cert verification timed out.
So, observe IO thread destruction and once IO thread destructed,
don't post task to cancel the cert verication.
BUG=23437
TEST=none
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=32150
Patch Set 1 #
Total comments: 6
Patch Set 2 : fix wtc's comment #Messages
Total messages: 10 (0 generated)
Reviewers: wtc, Description: Try to fix valgrind error in nss_ocsp.cc I think valgrind error would happen when IO thread was terminated before cert verification timed out. So, observe IO thread destruction and once IO thread destructed, don't post task to cancel the cert verication. BUG=23437 TEST=none Please review this at http://codereview.chromium.org/390011 Affected files: M net/ocsp/nss_ocsp.cc
ukai: sorry about the delay in review. I've been busy this week. I need more time to review this CL because I'm not familiar with MessageLoop::DestructionObserver. I'm also asking eroman to help review this CL. eroman: I believe this is another instance of issue 22719 where CertVerifier is crashing in posting a task to the IO thread's message loop. We believe these bugs are caused by URLRequest leaks. I wonder if we should fix the root cause (URLRequest leaks), or add defensive measures like ukai's proposal. If you like ukai's approach, we should use it on CertVerifier, too. On Linux, net::CertVerifier::Request::DoVerify() calls the code in nss_ocsp.cc in the middle of cerificate verification. http://codereview.chromium.org/390011/diff/1/2 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/390011/diff/1/2#newcode128 Line 128: if (io_loop_) Style nit: please add curly braces {} around the io_loop_->PostTask call. Although it is a single statement, it spans three lines.
The code is easier to understand than I expected. I have two comments below. I'd also like to hear eroman's opinion. http://codereview.chromium.org/390011/diff/1/2 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/390011/diff/1/2#newcode118 Line 118: io_loop_->PostTask( Should we have a NULL check for io_loop_ here, too? http://codereview.chromium.org/390011/diff/1/2#newcode278 Line 278: io_loop_->RemoveDestructionObserver(this); I think we should do lines 278-279 after cancelling request_. Perhaps this is why you had to add a NULL check for io_loop_ to Cancel() at line 128 above because you set io_loop_ to NULL to early (at line 279)? Actually, it seems even better to move lines 278-279 to the destructor..
I don't think we should do this. This is just masking the bigger problem, which is that outstanding requests aren't being cancelled. As wtc points out, there has been a history of the browser leaking request contexts, due to holding references as part of urlrequest, urlfetcher, resourcemessagefilter, etc.. One thing I did notice while glancing the code, is during shutdown we are calling net::SetURLRequestContextForOCSP(NULL) on the UI thread. Even if this isn't a race, we should still be calling this from the IO thread for consistency. You can move it into ~ChromeURLRequestContext(), see how we do that for AppCacheService which has the same sort of cyclical relationship. If we need to issue cancels, we can do it in response to set_request_context(NULL).
REMOVER ----- Original Message ----- From: <eroman@chromium.org> To: <ukai@chromium.org>; <wtc@chromium.org> Cc: <chromium-reviews@googlegroups.com> Sent: Friday, November 13, 2009 10:12 PM Subject: [chromium-reviews] Re: Try to fix valgrind error in nss_ocsp.cc >I don't think we should do this. > > This is just masking the bigger problem, which is that outstanding > requests > aren't being cancelled. > > As wtc points out, there has been a history of the browser leaking request > contexts, due to holding references as part of urlrequest, urlfetcher, > resourcemessagefilter, etc.. > > > One thing I did notice while glancing the code, is during shutdown we are > calling net::SetURLRequestContextForOCSP(NULL) on the UI thread. > > Even if this isn't a race, we should still be calling this from the IO > thread > for consistency. You can move it into ~ChromeURLRequestContext(), see how > we do > that for AppCacheService which has the same sort of cyclical relationship. > > If we need to issue cancels, we can do it in response to > set_request_context(NULL). > > http://codereview.chromium.org/390011
> One thing I did notice while glancing the code, is during shutdown we are > calling net::SetURLRequestContextForOCSP(NULL) on the UI thread. For this, I am suggesting the following small change: http://codereview.chromium.org/391062.
http://codereview.chromium.org/390011/diff/1/2 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/390011/diff/1/2#newcode118 Line 118: io_loop_->PostTask( On 2009/11/13 23:42:05, wtc wrote: > Should we have a NULL check for io_loop_ here, too? I thought it would be with DCHECK(io_loop_) here, but changed to check as Cancel(). http://codereview.chromium.org/390011/diff/1/2#newcode128 Line 128: if (io_loop_) On 2009/11/13 23:34:27, wtc wrote: > Style nit: please add curly braces {} around the > io_loop_->PostTask call. Although it is a single statement, > it spans three lines. Done. http://codereview.chromium.org/390011/diff/1/2#newcode278 Line 278: io_loop_->RemoveDestructionObserver(this); On 2009/11/13 23:42:05, wtc wrote: > I think we should do lines 278-279 after cancelling request_. > Perhaps this is why you had to add a NULL check for io_loop_ > to Cancel() at line 128 above because you set io_loop_ to > NULL to early (at line 279)? Well, the reason to add a NULL check for io_loop_ to Cancel() at line 128 is io_loop_ may be NULL when IO loop is destructed while OCSPRequestSession is alive, that is, io_loop_ may be NULL in WillDestroyCurrentMessageLoop(). I think valgrind error in crbug.com/23437 catched the case OCSPRequestSession is freed, which calls OCSPRequestSession::Cancel() after IO loop is destructed, so it would touch some memory area in already destructed IO loop. > Actually, it seems even better to move lines 278-279 to > the destructor.. Done.
On 2009/11/14 00:12:57, eroman wrote: > I don't think we should do this. > > This is just masking the bigger problem, which is that outstanding requests > aren't being cancelled. > > As wtc points out, there has been a history of the browser leaking request > contexts, due to holding references as part of urlrequest, urlfetcher, > resourcemessagefilter, etc.. > > > One thing I did notice while glancing the code, is during shutdown we are > calling net::SetURLRequestContextForOCSP(NULL) on the UI thread. > > Even if this isn't a race, we should still be calling this from the IO thread > for consistency. You can move it into ~ChromeURLRequestContext(), see how we do > that for AppCacheService which has the same sort of cyclical relationship. > > If we need to issue cancels, we can do it in response to > set_request_context(NULL). Thanks for catching this. Yes, it should be fixed also. But I think original valgrind error is not related with request_context. I suppose crbug.com/23437 caused when it frees OCSP session after IO loop finished.
LGTM. Initially I was skeptical about this, but I have warmed up to the idea of registering a destruction observer. In fact, I would like to do the same sort of thing to some of our other network classes now :) (Trying to make sure consumers delete the network objects before destroying the IO thread has proved a losing battle). |
