|
|
Chromium Code Reviews|
Created:
4 years, 4 months ago by watk Modified:
4 years, 4 months ago CC:
chromium-reviews, feature-media-reviews_chromium.org, piman+watch_chromium.org, posciak+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionConvert AVDAs thread hang detection to be timer based
Previously AVDAManager counted the number of tasks pending on the construction
thread and considered the thread hung when that number exceeded a threshold.
Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task
takes longer than 800ms it's considered hung. This mechanism should be more
robust because we don't have to worry about counting the tasks correctly.
BUG=638406
Committed: https://crrev.com/d8e0dfd15ced352e5a4128553397609fb08f4c00
Cr-Commit-Position: refs/heads/master@{#413527}
Patch Set 1 #
Total comments: 2
Patch Set 2 : timeout #
Total comments: 4
Patch Set 3 : Addressed comments. Added thread stopping back. #
Total comments: 5
Patch Set 4 : Renamed to IsThreadLikelyHung #Patch Set 5 : rename IsAutodetection... #
Total comments: 4
Patch Set 6 : clarified comment #
Messages
Total messages: 34 (11 generated)
watk@chromium.org changed reviewers: + liberato@chromium.org
WDYT?
i rather like it. might pull it out of avda, though. -fl https://codereview.chromium.org/2245333004/diff/1/media/gpu/android_video_dec... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/1/media/gpu/android_video_dec... media/gpu/android_video_decode_accelerator.cc:188: base::Unretained(this), task), not sure if this should be unretained. seems like it needs to hold a reference. https://codereview.chromium.org/2245333004/diff/1/media/gpu/android_video_dec... media/gpu/android_video_decode_accelerator.cc:199: base::Unretained(this), task), same question.
dalecurtis@chromium.org changed reviewers: + dalecurtis@chromium.org
Have you looked at MessageLoop::TaskObserver? WillProcessTask/DidProcessTask?
On 2016/08/16 23:59:17, DaleCurtis wrote: > Have you looked at MessageLoop::TaskObserver? WillProcessTask/DidProcessTask? Yeah, that works if we use a watchdog timeout mechanism right? We can't implement the task count mechanism with that interface. That's worth considering again, but GpuWatchdog is pretty complicated :( If only some (smallish) subset of devices hang, perhaps can we blacklist those to software MediaCodec and not bother with hang detection?
On 2016/08/17 00:08:12, watk wrote: > On 2016/08/16 23:59:17, DaleCurtis wrote: > > Have you looked at MessageLoop::TaskObserver? WillProcessTask/DidProcessTask? What about in WillProcessTask we save the current time. In DidProcessTask we clear it. IsAVDAThreadHung() checks to see if there's a task being processed. If it's been running for more than some timeout consider it hung. We might get a couple of false positives around Suspend(), but false positives have low impact.
Yeah, that's what I was thinking. You only need to worry about suspend on desktop platforms, on Android there's only a recommendation or a process kill IIUC.
On 2016/08/17 01:00:42, DaleCurtis wrote: > Yeah, that's what I was thinking. You only need to worry about suspend on > desktop platforms, on Android there's only a recommendation or a process kill > IIUC. Oh ok, cool. I uploaded a prototype of that.
lg2m - defer to frank on final review. https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:212: if (task_start_time_.has_value()) { is_null() is sufficient, ticks never starts at zero in practice.
https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:257: thread_avda_instances_.erase(avda); if we end up not resetting below, then i think that |thread_avda_instances_| can go away entirely, along with StopThread. https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:261: // TODO: post a task to stop the thread. not sure that posting a task to stop the thread is all that simple. seems like it'd be hard to guarantee that the thread is re-started properly if the shutdown is racing with a future post that believes that it's already started. https://codereview.chromium.org/2245333004/diff/20001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:262: hang_detector_.Reset(); i don't think that it should reset here. for example, if we destroy the only avda instance while an async codec config is hung, then we don't want to start posting tasks again. if it does eventually complete, it'll reset itself normally even if the "AndReply" part cancels due to the invalidated weak ptr.
Description was changed from ========== Proof of concept task runner wrapper BUG= ========== to ========== Convert AVDAs thread hang detection to be timer based Previously AVDAManager counted the number of tasks pending on the construction thread and considered the thread hung when that number exceeded a threshold. Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task takes longer than 800ms it's considered hung. This mechanism should be more robust because we don't have to worry about counting the tasks correctly. BUG=638406 ==========
Patchset #3 (id:40001) has been deleted
I went back and forth on whether to leak the thread or not.. According to my measurements, the gpu thread private dirty memory increases by ~50kb after creating and using the thread. So it's pretty low, and seems a reasonable trade-off to leak it. LMK what you think.
https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:206: bool ThreadLikelyHung() { IsThreadLikelyHung? https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:259: construction_thread_.task_runner()->PostTaskAndReply( Clever! https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:1141: } Shouldn't this be based on the IsLikelyHung method now?
https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:206: bool ThreadLikelyHung() { On 2016/08/19 22:56:03, DaleCurtis wrote: > IsThreadLikelyHung? Done. https://codereview.chromium.org/2245333004/diff/60001/media/gpu/android_video... media/gpu/android_video_decode_accelerator.cc:1141: } On 2016/08/19 22:56:03, DaleCurtis wrote: > Shouldn't this be based on the IsLikelyHung method now? Yep, the IsCodecAutodetectionProbablySafe() on l.1117 above calls it.
lgtm, is it worth having both methods though?
On 2016/08/19 23:24:34, DaleCurtis wrote: > lgtm, is it worth having both methods though? I could just rename IsAutoDetectionProbablySafe to IsThreadLikelyHung? And AVDAManager::IsThreadLikelyHung calls hang_detector_.IsThreadLikelyHung?
On 2016/08/19 at 23:30:38, watk wrote: > On 2016/08/19 23:24:34, DaleCurtis wrote: > > lgtm, is it worth having both methods though? > > I could just rename IsAutoDetectionProbablySafe to IsThreadLikelyHung? And AVDAManager::IsThreadLikelyHung calls hang_detector_.IsThreadLikelyHung? Seems cleaner to me, but up to you.
On 2016/08/20 00:04:20, DaleCurtis wrote: > On 2016/08/19 at 23:30:38, watk wrote: > > On 2016/08/19 23:24:34, DaleCurtis wrote: > > > lgtm, is it worth having both methods though? > > > > I could just rename IsAutoDetectionProbablySafe to IsThreadLikelyHung? And > AVDAManager::IsThreadLikelyHung calls hang_detector_.IsThreadLikelyHung? > > Seems cleaner to me, but up to you. Agreed, done.
lgtm. https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... media/gpu/android_video_decode_accelerator.cc:241: weak_this_factory_.InvalidateWeakPtrs(); i had no idea that one could ask for weak refs after doing this. very neat. https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... media/gpu/android_video_decode_accelerator.cc:258: // will stay alive until next time an AVDA tries to stop it. it's a little subtle that the thread can't hang between the DoNothing and the reply. i think that it can't, since it would require an avda instance to StartThread, and cancel the reply in the process. might want to comment that.
The CQ bit was checked by watk@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks, I've tested this locally by PlatformThread::Sleep()ing on the AVDA thread, but it would be great if you could try it with the same test case you used when you introduced hang detection Frank. https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... File media/gpu/android_video_decode_accelerator.cc (right): https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... media/gpu/android_video_decode_accelerator.cc:241: weak_this_factory_.InvalidateWeakPtrs(); On 2016/08/21 01:52:30, liberato wrote: > i had no idea that one could ask for weak refs after doing this. very neat. Yeah, handy! https://codereview.chromium.org/2245333004/diff/100001/media/gpu/android_vide... media/gpu/android_video_decode_accelerator.cc:258: // will stay alive until next time an AVDA tries to stop it. On 2016/08/21 01:52:30, liberato wrote: > it's a little subtle that the thread can't hang between the DoNothing and the > reply. i think that it can't, since it would require an avda instance to > StartThread, and cancel the reply in the process. > > might want to comment that. Done (I think).
On 2016/08/22 18:20:34, watk wrote: > Thanks, I've tested this locally by PlatformThread::Sleep()ing on the AVDA > thread, but it would be great if you could try it with the same test case you > used when you introduced hang detection Frank. it works well. failed over to software when needed, and switched back to hardware if i restarted mediaserver via adb. didn't see any false positives from codec configure taking too long.
On 2016/08/22 19:49:07, liberato wrote: > On 2016/08/22 18:20:34, watk wrote: > > Thanks, I've tested this locally by PlatformThread::Sleep()ing on the AVDA > > thread, but it would be great if you could try it with the same test case you > > used when you introduced hang detection Frank. > > it works well. failed over to software when needed, and switched back to > hardware if i restarted mediaserver via adb. didn't see any false positives > from codec configure taking too long. Awesome, I appreciate it.
The CQ bit was unchecked by watk@chromium.org
The CQ bit was checked by watk@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dalecurtis@chromium.org, liberato@chromium.org Link to the patchset: https://codereview.chromium.org/2245333004/#ps120001 (title: "clarified comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Convert AVDAs thread hang detection to be timer based Previously AVDAManager counted the number of tasks pending on the construction thread and considered the thread hung when that number exceeded a threshold. Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task takes longer than 800ms it's considered hung. This mechanism should be more robust because we don't have to worry about counting the tasks correctly. BUG=638406 ========== to ========== Convert AVDAs thread hang detection to be timer based Previously AVDAManager counted the number of tasks pending on the construction thread and considered the thread hung when that number exceeded a threshold. Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task takes longer than 800ms it's considered hung. This mechanism should be more robust because we don't have to worry about counting the tasks correctly. BUG=638406 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== Convert AVDAs thread hang detection to be timer based Previously AVDAManager counted the number of tasks pending on the construction thread and considered the thread hung when that number exceeded a threshold. Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task takes longer than 800ms it's considered hung. This mechanism should be more robust because we don't have to worry about counting the tasks correctly. BUG=638406 ========== to ========== Convert AVDAs thread hang detection to be timer based Previously AVDAManager counted the number of tasks pending on the construction thread and considered the thread hung when that number exceeded a threshold. Now we use a MessageLoop::TaskObserver to watch how long tasks take. If a task takes longer than 800ms it's considered hung. This mechanism should be more robust because we don't have to worry about counting the tasks correctly. BUG=638406 Committed: https://crrev.com/d8e0dfd15ced352e5a4128553397609fb08f4c00 Cr-Commit-Position: refs/heads/master@{#413527} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/d8e0dfd15ced352e5a4128553397609fb08f4c00 Cr-Commit-Position: refs/heads/master@{#413527} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
