|
|
Created:
8 years, 11 months ago by enal Modified:
8 years, 11 months ago CC:
chromium-reviews, hclam+watch_chromium.org, ddorwin+watch_chromium.org, fischman+watch_chromium.org, jam, acolwell+watch_chromium.org, annacc+watch_chromium.org, dpranke-watch+content_chromium.org, joi+watch-content_chromium.org, darin-cc_chromium.org, vrk (LEFT CHROMIUM), scherkus (not reviewing), ihf+watch_chromium.org Base URL:
http://src.chromium.org/svn/trunk/src/ Visibility:
Public. |
DescriptionSafe and reliable fix for 2 race conditions.
(We may want to revisit the issue later, implementing more performant one...)
BUG=107933
TEST=Everything should work, and there would be no more crashes in AudioDevice.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=117464
Patch Set 1 #Patch Set 2 : '' #
Total comments: 4
Messages
Total messages: 18 (0 generated)
Chris, can you please take a look at the fix. We may want to return to the issue in the future and implement something better, but right now I want fix to be in the canary so we can find out if that fixes all AudioDevice crashes.
Hi Eugene, thanks for taking the time to debug this and discover the causes. I'm not very keen about this particular approach to fixing the problem. I've just looked quickly at this right now, so I'll take a little more time tomorrow to look more deeply into it. One thing which comes to mind is that the AudioDevice class is ref-counted (base::RefCountedThreadSafe) so can you think of a way to simply add appropriate references to "this" if there are particular async events which need to happen before it can be deleted (and then remove the ref once the event has occurred)? I've found that using reference counting can usually solve this type of problem. On 2012/01/06 01:58:12, enal wrote: > Chris, can you please take a look at the fix. > > We may want to return to the issue in the future and implement something better, > but right now I want fix to be in the canary so we can find out if that fixes > all AudioDevice crashes.
I was driving home and thinking about the change, and now I believe there are only 2 problems, not 3 -- yes, AudioDevice is ref-counted, so there is no problem in the destructor. (There was very similar problem in AudioOutputController, it is not ref-counted, and it really had issue in the destructor, that is why I was confused...) I'll send updated CL tomorrow morning. It would be tiny. On Thu, Jan 5, 2012 at 6:32 PM, <crogers@google.com> wrote: > Hi Eugene, thanks for taking the time to debug this and discover the causes. > > I'm not very keen about this particular approach to fixing the problem. > I've > just looked quickly at this right now, so I'll take a little more time > tomorrow > to look more deeply into it. One thing which comes to mind is that the > AudioDevice class is ref-counted (base::RefCountedThreadSafe) so can you > think > of a way to simply add appropriate references to "this" if there are > particular > async events which need to happen before it can be deleted (and then remove > the > ref once the event has occurred)? I've found that using reference counting > can > usually solve this type of problem. > > > On 2012/01/06 01:58:12, enal wrote: >> >> Chris, can you please take a look at the fix. > > >> We may want to return to the issue in the future and implement something > > better, >> >> but right now I want fix to be in the canary so we can find out if that >> fixes >> all AudioDevice crashes. > > > > > http://codereview.chromium.org/9112029/
Just wanted to clarify some things - sorry if this is already very clear to you: AudioDevice is a ref-counted object, but there still could be problems with early deletion if we're not taking care to "ref" and "deref" it when dealing with async completion events like we have here. I haven't looked at the details of this bug to know if it's a problem or not. Saying this a different way, just because AudioDevice *is* a ref-counted object doesn't automatically protect us from these types of problems. On 2012/01/06 03:01:52, enal1 wrote: > I was driving home and thinking about the change, and now I believe > there are only 2 problems, not 3 -- yes, AudioDevice is ref-counted, > so there is no problem in the destructor. (There was very similar > problem in AudioOutputController, it is not ref-counted, and it really > had issue in the destructor, that is why I was confused...) > > I'll send updated CL tomorrow morning. It would be tiny. > > On Thu, Jan 5, 2012 at 6:32 PM, <mailto:crogers@google.com> wrote: > > Hi Eugene, thanks for taking the time to debug this and discover the causes. > > > > I'm not very keen about this particular approach to fixing the problem. > > I've > > just looked quickly at this right now, so I'll take a little more time > > tomorrow > > to look more deeply into it. One thing which comes to mind is that the > > AudioDevice class is ref-counted (base::RefCountedThreadSafe) so can you > > think > > of a way to simply add appropriate references to "this" if there are > > particular > > async events which need to happen before it can be deleted (and then remove > > the > > ref once the event has occurred)? I've found that using reference counting > > can > > usually solve this type of problem. > > > > > > On 2012/01/06 01:58:12, enal wrote: > >> > >> Chris, can you please take a look at the fix. > > > > > >> We may want to return to the issue in the future and implement something > > > > better, > >> > >> but right now I want fix to be in the canary so we can find out if that > >> fixes > >> all AudioDevice crashes. > > > > > > > > > > http://codereview.chromium.org/9112029/
Verified that AudioDevice has proper ref counting, so there is no race condition in the destructor. Got rid of my attempt to fix it. There are still race conditions because of of time limit in the wait. For now just removed it.
http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... File content/renderer/media/audio_device.cc (right): http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... content/renderer/media/audio_device.cc:120: completion.Wait(); Is there a case where the time-out is actually needed? Shijing put that in there, so we need his opinion about it. As an alternative to this, we could move the "completion" variable into a member variable instead of stack-based to avoid it going away when this method finished.
On 2012/01/06 18:26:28, Chris Rogers wrote: > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > File content/renderer/media/audio_device.cc (right): > > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > content/renderer/media/audio_device.cc:120: completion.Wait(); > Is there a case where the time-out is actually needed? Shijing put that in > there, so we need his opinion about it. As an alternative to this, we could > move the "completion" variable into a member variable instead of stack-based to > avoid it going away when this method finished. (Adding Shijing as reviewer) As I wrote in the CL description, I don't want to do extensive changes, I just want to make it as simple and reliable as possible, and left big changes for later (and hopefully for somebody else, not me). Shutting audio thread before closing the stream may cause hang, at least on Windows. SyncSocket::Send() would eventually block on WriteFile() if audio thread does not empty the pipe from its side -- NaCl reported such issue. I definitely can reproduce such behavior by modifying the code to replace PostTask() by PostDelayedTask() with delay of several seconds, and getting rid of event. We would be more robust when I change the way AudioRendererImpl handles end of stream -- right now it signals the event only when it is called, and that means we need AudioRendererImpl::Render() to be called frequently enough. That means that there is lot of sockets send/receive near end of stream, and we can hit exactly problem I mentioned. I am planning to change the code to post delayed task, after that we would not need to pass length of the buffer, and callbacks would become less frequent.
The reason of adding a timeout for the WaitableEvent is to avoid blocking the Stop() too much in case IO thread is super busy. The initial implementation was using a member for the event, but was changed to be a local variable during the review. Obviously it is not a good change if the ShutdownOnIOThread can happen after the variable goes out of scope. It is also clearly written that Stop() must be called before deleting the AudioDevice instance, so we may need to make sure ShutdownOnIOThread won't happen after the destructor. So removing the timeout makes sense to me. Regarding to the hang problem, it has been known for a while that the socket can't be closed while it is waiting for receiving data. And we have been working on it. Tommi has a CL which supports the cancelable SyncSocket, http://codereview.chromium.org/8965053/. It may take some time for him to land the patch because it is not a trivial CL, but the problem will be fixed when it is landed.
drive-by On 2012/01/09 09:43:33, xians1 wrote: > The reason of adding a timeout for the WaitableEvent is to avoid blocking the > Stop() too much in case IO thread is super busy. FWIW, this rationale is almost always indicative of a deeper design problem. TimedWait can only be used correctly if a false return value is handled explicitly (such as by running it inside a loop that terminates when TimedWait returns true, or by forcing the event being waited for by other means). LOG(ERROR) doesn't count as "handling it explicitly" :) Put another way, code following a TimedWait could go *before* the wait if didn't care whether the event was signaled or not. If that's not the case, and it really belongs after the wait, then that's an indication that it relies on the signal having been delivered or that a timeout is "good enough". I can't think of a case I've seen where the timeout was "good enough".
In any case, let's check in that fix, and after that anybody can work on better one... On Mon, Jan 9, 2012 at 10:01 AM, <fischman@chromium.org> wrote: > drive-by > > On 2012/01/09 09:43:33, xians1 wrote: >> >> The reason of adding a timeout for the WaitableEvent is to avoid blocking >> the >> Stop() too much in case IO thread is super busy. > > > FWIW, this rationale is almost always indicative of a deeper design problem. > TimedWait can only be used correctly if a false return value is handled > explicitly (such as by running it inside a loop that terminates when > TimedWait > returns true, or by forcing the event being waited for by other means). > LOG(ERROR) doesn't count as "handling it explicitly" :) > > Put another way, code following a TimedWait could go *before* the wait if > didn't > care whether the event was signaled or not. If that's not the case, and it > really belongs after the wait, then that's an indication that it relies on > the > signal having been delivered or that a timeout is "good enough". I can't > think > of a case I've seen where the timeout was "good enough". > > > http://codereview.chromium.org/9112029/
ping... On Mon, Jan 9, 2012 at 2:07 PM, Eugene Nalimov <enal@google.com> wrote: > In any case, let's check in that fix, and after that anybody can work > on better one... > > On Mon, Jan 9, 2012 at 10:01 AM, <fischman@chromium.org> wrote: >> drive-by >> >> On 2012/01/09 09:43:33, xians1 wrote: >>> >>> The reason of adding a timeout for the WaitableEvent is to avoid blocking >>> the >>> Stop() too much in case IO thread is super busy. >> >> >> FWIW, this rationale is almost always indicative of a deeper design problem. >> TimedWait can only be used correctly if a false return value is handled >> explicitly (such as by running it inside a loop that terminates when >> TimedWait >> returns true, or by forcing the event being waited for by other means). >> LOG(ERROR) doesn't count as "handling it explicitly" :) >> >> Put another way, code following a TimedWait could go *before* the wait if >> didn't >> care whether the event was signaled or not. If that's not the case, and it >> really belongs after the wait, then that's an indication that it relies on >> the >> signal having been delivered or that a timeout is "good enough". I can't >> think >> of a case I've seen where the timeout was "good enough". >> >> >> http://codereview.chromium.org/9112029/
This seems fine if Shijing and Henrik are comfortable with it.
Shijing is on vacation but Tommi has been involved in these discussions so I am adding him as a reviewer instead. You can't expect any input from Shijing this month.
http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... File content/renderer/media/audio_device.cc (right): http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... content/renderer/media/audio_device.cc:120: completion.Wait(); On 2012/01/06 18:26:28, Chris Rogers wrote: > Is there a case where the time-out is actually needed? Shijing put that in > there, so we need his opinion about it. As an alternative to this, we could > move the "completion" variable into a member variable instead of stack-based to > avoid it going away when this method finished. What about moving the call to ShutDownAudioThread to be the last step inside ShutDownOnIOThread? That way we don't need the completion event at all. This is assuming that the caller of Stop doesn't expect everything to be fully stopped when Stop() returns. http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... content/renderer/media/audio_device.cc:191: if (!stream_id_) { nit: this method can be simplified by removing this early return and reversing the if: if (stream_id_) { filter_->RemoveDelegate(stream_id_); Send(new AudioHostMsg_CloseStream(stream_id_); stream_id_ = 0; } if (completion) completion->Signal(); http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... content/renderer/media/audio_device.cc:341: // Close the socket to terminate the main thread function in the add a check that makes sure we're not currently on the audio thread.
On 2012/01/12 13:13:07, tommi wrote: > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > File content/renderer/media/audio_device.cc (right): > > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > content/renderer/media/audio_device.cc:120: completion.Wait(); > On 2012/01/06 18:26:28, Chris Rogers wrote: > > Is there a case where the time-out is actually needed? Shijing put that in > > there, so we need his opinion about it. As an alternative to this, we could > > move the "completion" variable into a member variable instead of stack-based > to > > avoid it going away when this method finished. > > What about moving the call to ShutDownAudioThread to be the last step inside > ShutDownOnIOThread? That way we don't need the completion event at all. > > This is assuming that the caller of Stop doesn't expect everything to be fully > stopped when Stop() returns. > > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > content/renderer/media/audio_device.cc:191: if (!stream_id_) { > nit: this method can be simplified by removing this early return and reversing > the if: > > if (stream_id_) { > filter_->RemoveDelegate(stream_id_); > Send(new AudioHostMsg_CloseStream(stream_id_); > stream_id_ = 0; > } > > if (completion) > completion->Signal(); > > http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio... > content/renderer/media/audio_device.cc:341: // Close the socket to terminate the > main thread function in the > add a check that makes sure we're not currently on the audio thread. I don't want to make Stop() asynchronous right now. I am not sure all clients would be Ok if audio thread start asking for more data after Stop() returned. Can we just check in simple fix for race condition / use of uninitialized memory right now, and let original author or any volunteer to make big changes later? We had more than enough new crashes recently, I want some stability... Thanks, Eugene
sure. I'll volunteer. lgtm. On Thu, Jan 12, 2012 at 6:27 PM, <enal@chromium.org> wrote: > On 2012/01/12 13:13:07, tommi wrote: > > http://codereview.chromium.**org/9112029/diff/6001/content/** > renderer/media/audio_device.cc<http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio_device.cc> > >> File content/renderer/media/audio_**device.cc (right): >> > > > http://codereview.chromium.**org/9112029/diff/6001/content/** > renderer/media/audio_device.**cc#newcode120<http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio_device.cc#newcode120> > >> content/renderer/media/audio_**device.cc:120: completion.Wait(); >> On 2012/01/06 18:26:28, Chris Rogers wrote: >> > Is there a case where the time-out is actually needed? Shijing put >> that in >> > there, so we need his opinion about it. As an alternative to this, we >> could >> > move the "completion" variable into a member variable instead of >> stack-based >> to >> > avoid it going away when this method finished. >> > > What about moving the call to ShutDownAudioThread to be the last step >> inside >> ShutDownOnIOThread? That way we don't need the completion event at all. >> > > This is assuming that the caller of Stop doesn't expect everything to be >> fully >> stopped when Stop() returns. >> > > > http://codereview.chromium.**org/9112029/diff/6001/content/** > renderer/media/audio_device.**cc#newcode191<http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio_device.cc#newcode191> > >> content/renderer/media/audio_**device.cc:191: if (!stream_id_) { >> nit: this method can be simplified by removing this early return and >> reversing >> the if: >> > > if (stream_id_) { >> filter_->RemoveDelegate(**stream_id_); >> Send(new AudioHostMsg_CloseStream(**stream_id_); >> stream_id_ = 0; >> } >> > > if (completion) >> completion->Signal(); >> > > > http://codereview.chromium.**org/9112029/diff/6001/content/** > renderer/media/audio_device.**cc#newcode341<http://codereview.chromium.org/9112029/diff/6001/content/renderer/media/audio_device.cc#newcode341> > >> content/renderer/media/audio_**device.cc:341: // Close the socket to >> terminate >> > the > >> main thread function in the >> add a check that makes sure we're not currently on the audio thread. >> > > I don't want to make Stop() asynchronous right now. I am not sure all > clients > would be Ok if audio thread start asking for more data after Stop() > returned. > > Can we just check in simple fix for race condition / use of uninitialized > memory > right now, and let original author or any volunteer to make big changes > later? > We had more than enough new crashes recently, I want some stability... > > Thanks, > Eugene > > http://codereview.chromium.**org/9112029/<http://codereview.chromium.org/9112... >
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/enal@chromium.org/9112029/6001
Change committed as 117464 |