|
|
Chromium Code Reviews|
Created:
11 years, 4 months ago by ukai Modified:
9 years, 7 months ago CC:
chromium-reviews_googlegroups.com, willchan no longer on Chromium Base URL:
svn://chrome-svn/chrome/trunk/src/ Visibility:
Public. |
DescriptionTry to fix crash in OCSP handlers.
Make sure OCSPRequestSession::Core is cancelled when OCSPRequestSession
is deleted as URLFetcher does.
Revert http://src.chromium.org/viewvc/chrome?view=rev&revision=23575
BUG=18907, 10911
TEST=none
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=23696
Patch Set 1 #
Total comments: 2
Patch Set 2 : '' #
Total comments: 2
Patch Set 3 : '' #
Total comments: 4
Patch Set 4 : '' #
Total comments: 4
Patch Set 5 : '' #
Total comments: 2
Patch Set 6 : '' #Messages
Total messages: 14 (0 generated)
http://codereview.chromium.org/165362/diff/1/2 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/165362/diff/1/2#newcode254 Line 254: delete request_; is it necessary to delete the URLRequest while holding this lock? can you instead just set request_ to null while holding the lock? it is better to do less work while holding locks. for example, it is nice to avoid calling into the heap and doing expensive things like ~URLRequest inside a lock if possible.
Thanks for review. http://codereview.chromium.org/165362/diff/1/2 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/165362/diff/1/2#newcode254 Line 254: delete request_; On 2009/08/12 06:48:41, darin wrote: > is it necessary to delete the URLRequest while holding this lock? can you > instead just set request_ to null while holding the lock? > > it is better to do less work while holding locks. for example, it is nice to > avoid calling into the heap and doing expensive things like ~URLRequest inside a > lock if possible. I see. just set request_ to null while holding the lock.
Fumitoshi, I don't understand why this fix works. Could you explain why using a lock helps? Note that in some places you're still accessing request_ without locking. For example, see OnReadCompleted. The two main methods that you added locking to both contain DCHEKs to assert that they can only be called by the IO thread. (You also added locking to the Started method, but that method is trivial.) So I don't understand why locking matters when the code is executing on the same thread. I suspect that the problem may be: delete request_; request_ = NULL; Perhaps the URLRequest destructor calls back to its delegate. I wonder if we can fix this crash by simply reversing those two lines: URLRequest* request = request_; request_ = NULL; delete request; http://codereview.chromium.org/165362/diff/1001/1002 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/165362/diff/1001/1002#newcode258 Line 258: delete request; Nit: it seems nicer to call cv_.Signal() as soon as we exit the critical section, so I suggest reordering lines 258 and 259. http://codereview.chromium.org/165362/diff/1001/1002#newcode321 Line 321: // |lock_| protects |finished_| and |request_|. I just moving the request_ member down, right above finished_, to make it clear that it is under the protection of lock_.
Thanks for review. I'm back from vacation. On 2009/08/13 00:04:33, wtc wrote: > Fumitoshi, > > I don't understand why this fix works. Could you explain > why using a lock helps? Note that in some places you're > still accessing request_ without locking. For example, > see OnReadCompleted. > > The two main methods that you added locking to both > contain DCHEKs to assert that they can only be called > by the IO thread. (You also added locking to the > Started method, but that method is trivial.) So I > don't understand why locking matters when the code is > executing on the same thread. Ah, I see. From stack trace at http://code.google.com/p/chromium/issues/detail?id=18907, it looks URLRequest has been changed between #3 0x08889283 in OnReadCompleted (this=0xf1a2d180, request=0xf1a2d280, bytes_read=0) at /usr/local/google/home/wtc/chrome2/src/net/ocsp/nss_ocsp.cc:255 and #2 0x088b42e2 in ~URLRequest (this=0xbb56140) at /usr/local/google/home/wtc/chrome2/src/net/url_request/url_request.cc:72 0xf1a2d280 -> 0xbb56140 Do you have any idea what is happening? Anyway, I reread chrome/browser/net/url_fetcher.{h,cc} and found it issues core_->Stop() that post a task CancelURLRequest, when URLFetcher is deleted, so I add core_->Cancel() when OCSPRequestSession is deleted. Do you think this will solve the crash? > I suspect that the problem may be: > delete request_; > request_ = NULL; > > Perhaps the URLRequest destructor calls back to its > delegate. I wonder if we can fix this crash by simply > reversing those two lines: > > URLRequest* request = request_; > request_ = NULL; > delete request; > > http://codereview.chromium.org/165362/diff/1001/1002 > File net/ocsp/nss_ocsp.cc (right): > > http://codereview.chromium.org/165362/diff/1001/1002#newcode258 > Line 258: delete request; > Nit: it seems nicer to call cv_.Signal() as soon as we > exit the critical section, so I suggest reordering lines > 258 and 259. > > http://codereview.chromium.org/165362/diff/1001/1002#newcode321 > Line 321: // |lock_| protects |finished_| and |request_|. > I just moving the request_ member down, right above > finished_, to make it clear that it is under the protection > of lock_.
Hi Fumitoshi, Welcome back! I hope you had a good vacation. On 2009/08/18 09:28:48, ukai wrote: > > From stack trace at http://code.google.com/p/chromium/issues/detail?id=18907, it > looks URLRequest has been changed between > > #3 0x08889283 in OnReadCompleted (this=0xf1a2d180, request=0xf1a2d280, > bytes_read=0) > at /usr/local/google/home/wtc/chrome2/src/net/ocsp/nss_ocsp.cc:255 > > and > > #2 0x088b42e2 in ~URLRequest (this=0xbb56140) > at /usr/local/google/home/wtc/chrome2/src/net/url_request/url_request.cc:72 > > 0xf1a2d280 -> 0xbb56140 > Do you have any idea what is happening? I don't know why. It's very strange. > Anyway, I reread chrome/browser/net/url_fetcher.{h,cc} and found it issues > core_->Stop() that post a task CancelURLRequest, when URLFetcher is deleted, so > I add core_->Cancel() when OCSPRequestSession is deleted. > Do you think this will solve the crash? Yes. I did some debugging today before I read your message, and I came to the same conclusion. More on this in my next message.
http://codereview.chromium.org/165362/diff/3001/4001 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/165362/diff/3001/4001#newcode149 Line 149: DCHECK(!request_); I like this DCHECK. http://codereview.chromium.org/165362/diff/3001/4001#newcode252 Line 252: URLRequest* request = request_; I believe this change is not necessary and won't make a difference. Please undo it. http://codereview.chromium.org/165362/diff/3001/4001#newcode296 Line 296: request_->Cancel(); I also noticed that this is a difference between our Core and URLFetcher::Core. However, the first thing the URLRequest destructor does it to call Cancel(), so this request_->Cancel() call is redundant. http://codereview.chromium.org/165362/diff/3001/4001#newcode334 Line 334: core_->Cancel(); Yes, my debugging today came to the same conclusion that this is the key to the solution. I found that the problem may have to do with the ocsp_req_ backpointer in Core. We need a way to ensure that after the OCSPRequestSession destructor is called, Core never uses ocsp_req_. In URLFetcher, this is accomplished by setting delegate_ to NULL in URLFetcher::Core::Stop(), because the backpointer fetcher_ is only used when delegate_ is not NULL. So in addition to adding this core_->Cancel() call, we may need to do something to prevent core_ from using the ocsp_req_ backpointer. At least, I think our Core::Cancel() should set ocsp_req_ to NULL to force a crash if ocsp_req_ is used by mistake.
Fumitoshi, One way to prevent Core from using the ocsp_req_ backpointer after OCSPRequestSession is deleted is to not have the ocsp_req_ member in the first place. Since Core uses ocsp_req_ only for two things: ocsp_req_->http_request_method() ocsp_req_->timeout() we can replace the ocsp_req_ member by http_request_method_ and timeout_ members. Once we do this, then you'll see that OCSPRequestSession and Core are almost the same -- OCSPRequestSession just holds a reference to Core. So I believe we can get rid of the Core class, and change OCSPFree to just release a reference to OCSPRequestSession. I hope you understand what I'm proposing... Feel free to do the collapsing of OCSPRequestSession and Core in a follow-up CL because it may make this CL very complicated to obscure the crash fix.
Thanks for review. Just set ocsp_req_ to NULL when Cancel, and I'll try collapse OCSPRequestSession and Core in another CL. On 2009/08/19 02:24:05, wtc wrote: > Fumitoshi, > > One way to prevent Core from using the ocsp_req_ backpointer > after OCSPRequestSession is deleted is to not have the > ocsp_req_ member in the first place. Since Core uses > ocsp_req_ only for two things: > ocsp_req_->http_request_method() > ocsp_req_->timeout() > we can replace the ocsp_req_ member by http_request_method_ > and timeout_ members. > > Once we do this, then you'll see that OCSPRequestSession > and Core are almost the same -- OCSPRequestSession just > holds a reference to Core. So I believe we can get rid > of the Core class, and change OCSPFree to just release > a reference to OCSPRequestSession. > > I hope you understand what I'm proposing... > > Feel free to do the collapsing of OCSPRequestSession > and Core in a follow-up CL because it may make this > CL very complicated to obscure the crash fix.
LGTM. You can check in with the following changes.
Note that setting ocsp_req_ to NULL in Core::Cancel()
is not thread-safe as I explained below. Note: URLFetcher
doesn't have this problem because it sets and tests
delegate_ on the same thread (the "delegate thread").
An alternative solution is to replace the ocsp_req_ member
by these two members:
std::string http_request_method_;
base::TimeDelta timeout_;
and initialize them in the Core constructor like this:
url_(req->url()),
http_request_method_(req->http_request_method()),
timeout_(req->timeout()),
I think this is a better solution. The drawback is
that in rare cases (if the IO thread is slow) it may
start and then cancel a URLRequest unnecessarilly.
Your solution can avoid that with the
if (!ocsp_req_) // Cancelled.
return;
check that I suggested below.
http://codereview.chromium.org/165362/diff/5002/6003
File net/ocsp/nss_ocsp.cc (right):
http://codereview.chromium.org/165362/diff/5002/6003#newcode181
Line 181: ocsp_req_ = NULL;
Please add a comment to note that this is not thread-safe
because we are using ocsp_req_ on the IO thread in
StartURLRequest() whereas the Cancel() method is not on
the UI thread.
http://codereview.chromium.org/165362/diff/5002/6003#newcode186
Line 186: bool Wait() {
Just FYI: I verified that this method will never be
called after ocsp_req_ is deleted.
http://codereview.chromium.org/165362/diff/5002/6003#newcode266
Line 266: void StartURLRequest() {
This method may be called after ocsp_req_ is deleted.
It rarely happens, but it may happen. (This can happen
if you make ocsp_req_->timeout() returns base::TimeDelta(),
i.e., a zero timeout. Please try that.)
So this method needs to do
if (!ocsp_req_) // Cancelled.
return;
http://codereview.chromium.org/165362/diff/5002/6003#newcode300
Line 300: // URLRequest destructor will call request_->Cancel() first.
Note: on second thought, I think we should add
request_->Cancel();
here, so that our code is the same as URLFetcher.
This is because URLFetcher's code has been heavily
tested.
On 2009/08/19 04:01:52, wtc wrote: > LGTM. You can check in with the following changes. Thanks for review. I take alternative solution to use http_request_method_ and timeout_ instead of ocsp_req_. > > Note that setting ocsp_req_ to NULL in Core::Cancel() > is not thread-safe as I explained below. Note: URLFetcher > doesn't have this problem because it sets and tests > delegate_ on the same thread (the "delegate thread"). > > An alternative solution is to replace the ocsp_req_ member > by these two members: > > std::string http_request_method_; > base::TimeDelta timeout_; > > and initialize them in the Core constructor like this: > > url_(req->url()), > http_request_method_(req->http_request_method()), > timeout_(req->timeout()), > > I think this is a better solution. The drawback is > that in rare cases (if the IO thread is slow) it may > start and then cancel a URLRequest unnecessarilly. > Your solution can avoid that with the > if (!ocsp_req_) // Cancelled. > return; > check that I suggested below. > > http://codereview.chromium.org/165362/diff/5002/6003 > File net/ocsp/nss_ocsp.cc (right): > > http://codereview.chromium.org/165362/diff/5002/6003#newcode181 > Line 181: ocsp_req_ = NULL; > Please add a comment to note that this is not thread-safe > because we are using ocsp_req_ on the IO thread in > StartURLRequest() whereas the Cancel() method is not on > the UI thread. > > http://codereview.chromium.org/165362/diff/5002/6003#newcode186 > Line 186: bool Wait() { > Just FYI: I verified that this method will never be > called after ocsp_req_ is deleted. > > http://codereview.chromium.org/165362/diff/5002/6003#newcode266 > Line 266: void StartURLRequest() { > This method may be called after ocsp_req_ is deleted. > It rarely happens, but it may happen. (This can happen > if you make ocsp_req_->timeout() returns base::TimeDelta(), > i.e., a zero timeout. Please try that.) > > So this method needs to do > if (!ocsp_req_) // Cancelled. > return; > > http://codereview.chromium.org/165362/diff/5002/6003#newcode300 > Line 300: // URLRequest destructor will call request_->Cancel() first. > Note: on second thought, I think we should add > request_->Cancel(); > here, so that our code is the same as URLFetcher. > This is because URLFetcher's code has been heavily > tested.
LGTM! http://codereview.chromium.org/165362/diff/6006/5006 File net/ocsp/nss_ocsp.cc (right): http://codereview.chromium.org/165362/diff/6006/5006#newcode297 Line 297: // URLRequest destructor will call request_->Cancel() first. Please add request_->Cancel(); here so that we're the same as URLFetcher's code. It's good to be consistent. http://codereview.chromium.org/165362/diff/6006/5006#newcode305 Line 305: base::TimeDelta timeout_; Please add a comment // The timeout for OCSP
Thanks! Fixed both. On 2009/08/19 04:22:13, wtc wrote: > LGTM! > > http://codereview.chromium.org/165362/diff/6006/5006 > File net/ocsp/nss_ocsp.cc (right): > > http://codereview.chromium.org/165362/diff/6006/5006#newcode297 > Line 297: // URLRequest destructor will call request_->Cancel() first. > Please add > request_->Cancel(); > here so that we're the same as URLFetcher's code. > > It's good to be consistent. > > http://codereview.chromium.org/165362/diff/6006/5006#newcode305 > Line 305: base::TimeDelta timeout_; > Please add a comment // The timeout for OCSP
LGTM. Hopefully this will fix the crash. Unfortunately I can't explain the strange change of 'request_' value in the call stack in the bug report with this finding. |
