|
|
DescriptionA deadlock could happen in the following situation:
1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is
called. This posts a task to destroy the capture device to the "device
thread", which eventually leads to this destructor being called (on the
"device thread").
2. While things on the "device thread" are running asynchronously,
BrowserMainLoop is being destroyed. It's destructor waits for the "device
thread" (audio thread) to complete.
3. |io_task_runner_| does not accept the posted task anymore.
4. |event.Wait()| will wait indefinitely causing the shutdown to hang.
This deadlock occurred on build bot linux_android_rel_ng in CL
https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5
device running KitKat.
In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on
Chrome OS, but it might happen there as well, since the situation does not
appear to be platform-dependent.
BUG=706186
TEST=
content_browsertests --gtest_filter="VideoCaptureBrowserTest.*"
content_unittests --gtest_filter="*Video*"
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Review-Url: https://codereview.chromium.org/2786503002
Cr-Commit-Position: refs/heads/master@{#460531}
Committed: https://chromium.googlesource.com/chromium/src/+/c64c69fce0b29af200419ab2f9df75e3d8e693d5
Patch Set 1 #
Messages
Total messages: 17 (11 generated)
Description was changed from ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" ========== to ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ==========
The CQ bit was checked by chfremer@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
chfremer@chromium.org changed reviewers: + sandersd@chromium.org
sandersd@: PTAL This is currently blocking the landing of CL https://codereview.chromium.org/2772963002/ because it causes a test failure (timeout).
chfremer@chromium.org changed reviewers: + emircan@chromium.org, mcasas@chromium.org
mcasas@, emircan@: FYI
Description was changed from ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ==========
Note that this isn't a complete fix, it's still possible to successfully post a task that never runs. I'm not sure exactly what circumstances can lead to that, so I don't know how common it is. (I'm assuming very rare or never for this specific case.) lgtm.
On 2017/03/28 23:21:41, sandersd wrote: > Note that this isn't a complete fix, it's still possible to successfully post a > task that never runs. I'm not sure exactly what circumstances can lead to that, > so I don't know how common it is. (I'm assuming very rare or never for this > specific case.) > > > lgtm. Thanks for the quick turnaround. Agreed that this isn't a complete fix. I guess for a full fix, we would have to change the design. But hopefully this CL can at least provide some level of mitigation for now.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...) win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by chfremer@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 1, "attempt_start_ts": 1490820202396650, "parent_rev": "9efba0cd0d357c433d878635e673d5de4a2f4be1", "commit_rev": "c64c69fce0b29af200419ab2f9df75e3d8e693d5"}
Message was sent while issue was closed.
Description was changed from ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== A deadlock could happen in the following situation: 1. As part of program shutdown, VideoCaptureManager::StopCaptureForClient() is called. This posts a task to destroy the capture device to the "device thread", which eventually leads to this destructor being called (on the "device thread"). 2. While things on the "device thread" are running asynchronously, BrowserMainLoop is being destroyed. It's destructor waits for the "device thread" (audio thread) to complete. 3. |io_task_runner_| does not accept the posted task anymore. 4. |event.Wait()| will wait indefinitely causing the shutdown to hang. This deadlock occurred on build bot linux_android_rel_ng in CL https://codereview.chromium.org/2772963002/. It reproduces locally on a Nexus 5 device running KitKat. In production, class GpuJpegDecodeAcceleratorHost is currenlty only used on Chrome OS, but it might happen there as well, since the situation does not appear to be platform-dependent. BUG=706186 TEST= content_browsertests --gtest_filter="VideoCaptureBrowserTest.*" content_unittests --gtest_filter="*Video*" CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2786503002 Cr-Commit-Position: refs/heads/master@{#460531} Committed: https://chromium.googlesource.com/chromium/src/+/c64c69fce0b29af200419ab2f9df... ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://chromium.googlesource.com/chromium/src/+/c64c69fce0b29af200419ab2f9df... |