|
|
DescriptionAndroid: Fix random crash in HW encode accelerator
There are reports of random Chrome crash due to HW encode accelerator
on Android.
Possible reason may be: currently we'll reset client factory at
encoding error and notify client. But before client stops encoding
(calling Destroy() here), if there is any pending frame delivered to
encoder, it will fail to get the pointer of client factory.
Here we try to leave the client factory life managment to client and
only notify the encoding error to client in error handling.
BUG=676987
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Committed: https://crrev.com/9bb17e2cba3cdac48a775f9375d23fd8f0ac7365
Cr-Commit-Position: refs/heads/master@{#440864}
Patch Set 1 #Patch Set 2 : address comments #
Total comments: 8
Patch Set 3 : address comments #
Messages
Total messages: 36 (24 generated)
Description was changed from ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding, if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 ========== to ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding, if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ==========
The CQ bit was checked by braveyao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding, if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ==========
Description was changed from ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
braveyao@chromium.org changed reviewers: + watk@chromium.org
Hi, watk@, please take a look at this.
On 2016/12/27 23:16:39, braveyao wrote: > Hi, watk@, please take a look at this. Hmm, either WeakPtrFactory is being used incorrectly or I'm really confused.. There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But it looks to me that it's never true? The only time client_ptr_factory_->GetWeakPtr() will be false is if the factory was initialized with a null pointer, right? And we don't need to handle that case. The two important problems are: 1) don't call client methods after Destroy(), which is the point of the weakptrfactory. 2) return early from all methods after calling NotifyError() (because we're in an error state). I think this is what resetting the client_ptr_factory was supposed to achieve, but it failed.
On 2016/12/27 23:54:36, watk wrote: > On 2016/12/27 23:16:39, braveyao wrote: > > Hi, watk@, please take a look at this. > > Hmm, either WeakPtrFactory is being used incorrectly or I'm really confused.. > > There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But it > looks to me that it's never true? > > The only time client_ptr_factory_->GetWeakPtr() will be false is if the factory > was initialized with a null pointer, right? And we don't need to handle that > case. > > The two important problems are: > 1) don't call client methods after Destroy(), which is the point of the > weakptrfactory. > 2) return early from all methods after calling NotifyError() (because we're in > an error state). I think this is what resetting the client_ptr_factory was > supposed to achieve, but it failed. Hi watk@, thanks so much for the comments! Below is some of my findings leading to this cl: 1) if I add "client_ptr_factory_.reset()" before "if (!client_ptr_factory_->GetWeakPtr())", then "if (!client_ptr_factory_->GetWeakPtr())" will cause a crash with a similar stack. So I suppose "if (!client_ptr_factory_->GetWeakPtr())" is not the correct way for such protection. 2) After "Destroy()" there will be no more encoding. So I suppose the problem happens before "Destroy()" is called (but I can't verify it without reproduction.) 3) At first, I tried the method to check "client_ptr_factory_.get()"(or maybe "client_ptr_factory_.HasWeakPtrs()" is better?) for each method called in "DoIOTask()". Then I think it may be better to let client control the life cycle of client_ptr_factory_" as this cl does. Please help to suggest which one is better ^_^
On 2016/12/28 00:08:51, braveyao wrote: > On 2016/12/27 23:54:36, watk wrote: > > On 2016/12/27 23:16:39, braveyao wrote: > > > Hi, watk@, please take a look at this. > > > > Hmm, either WeakPtrFactory is being used incorrectly or I'm really confused.. > > > > There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But it > > looks to me that it's never true? > > > > The only time client_ptr_factory_->GetWeakPtr() will be false is if the > factory > > was initialized with a null pointer, right? And we don't need to handle that > > case. > > > > The two important problems are: > > 1) don't call client methods after Destroy(), which is the point of the > > weakptrfactory. > > 2) return early from all methods after calling NotifyError() (because we're in > > an error state). I think this is what resetting the client_ptr_factory was > > supposed to achieve, but it failed. > > Hi watk@, thanks so much for the comments! Below is some of my findings leading > to this cl: > > 1) if I add "client_ptr_factory_.reset()" before "if > (!client_ptr_factory_->GetWeakPtr())", then "if > (!client_ptr_factory_->GetWeakPtr())" will cause a crash with a similar stack. > So I suppose "if (!client_ptr_factory_->GetWeakPtr())" is not the correct way > for such protection. > > 2) After "Destroy()" there will be no more encoding. So I suppose the problem > happens before "Destroy()" is called (but I can't verify it without > reproduction.) > > 3) At first, I tried the method to check "client_ptr_factory_.get()"(or maybe > "client_ptr_factory_.HasWeakPtrs()" is better?) for each method called in > "DoIOTask()". Then I think it may be better to let client control the life cycle > of client_ptr_factory_" as this cl does. Please help to suggest which one is > better ^_^ IMO what we should do is add a member variable e.g., bool error_occurred_. Set it in RETURN_ON_FAILURE, and replace the if (client_ptr_factory_->GetWeakPtr()) with it. Then we should only call client_ptr_factory_->GetWeakPtr() when we're posting a task that might need to be canceled, which is much clearer to me. And we only need to call client_ptr_factory_.reset() from Destroy(). WDYT? This is all assuming I've understood the intent of the original code :)
On 2016/12/28 00:16:35, watk wrote: > On 2016/12/28 00:08:51, braveyao wrote: > > On 2016/12/27 23:54:36, watk wrote: > > > On 2016/12/27 23:16:39, braveyao wrote: > > > > Hi, watk@, please take a look at this. > > > > > > Hmm, either WeakPtrFactory is being used incorrectly or I'm really > confused.. > > > > > > There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But > it > > > looks to me that it's never true? > > > > > > The only time client_ptr_factory_->GetWeakPtr() will be false is if the > > factory > > > was initialized with a null pointer, right? And we don't need to handle that > > > case. > > > > > > The two important problems are: > > > 1) don't call client methods after Destroy(), which is the point of the > > > weakptrfactory. > > > 2) return early from all methods after calling NotifyError() (because we're > in > > > an error state). I think this is what resetting the client_ptr_factory was > > > supposed to achieve, but it failed. > > > > Hi watk@, thanks so much for the comments! Below is some of my findings > leading > > to this cl: > > > > 1) if I add "client_ptr_factory_.reset()" before "if > > (!client_ptr_factory_->GetWeakPtr())", then "if > > (!client_ptr_factory_->GetWeakPtr())" will cause a crash with a similar stack. > > So I suppose "if (!client_ptr_factory_->GetWeakPtr())" is not the correct way > > for such protection. > > > > 2) After "Destroy()" there will be no more encoding. So I suppose the problem > > happens before "Destroy()" is called (but I can't verify it without > > reproduction.) > > > > 3) At first, I tried the method to check "client_ptr_factory_.get()"(or maybe > > "client_ptr_factory_.HasWeakPtrs()" is better?) for each method called in > > "DoIOTask()". Then I think it may be better to let client control the life > cycle > > of client_ptr_factory_" as this cl does. Please help to suggest which one is > > better ^_^ > > IMO what we should do is add a member variable e.g., bool error_occurred_. Set > it in RETURN_ON_FAILURE, and replace the if (client_ptr_factory_->GetWeakPtr()) > with it. > > Then we should only call client_ptr_factory_->GetWeakPtr() when we're posting a > task that might need to be canceled, which is much clearer to me. And we only > need to call client_ptr_factory_.reset() from Destroy(). > > WDYT? This is all assuming I've understood the intent of the original code :) Actually, one more thing calling InvalidateWeakPtrs() is clearer than reset() in Destroy as well.
The CQ bit was checked by braveyao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by braveyao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Hi watk@, thanks so much for the suggestion. PTAL. BTW: InvalidateWeakPtrs() causes other problem at launching chromium with Debug building. So keep reset() method here.
https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... File media/gpu/android_video_encode_accelerator.cc (right): https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:50: if (client_ptr_factory_->GetWeakPtr()) { \ It's not possible for this to be null AFAICT. Should probably change it to if (!error_occurred). https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:154: client_ptr_factory_.reset(new base::WeakPtrFactory<Client>(client)); I'd suggest adding a DCHECK(client) or if (!client) return false; to make it clear that client can't be null. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:155: error_occurred_ = false; Initialize() cannot be called more than once so this is unnecessary. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:289: error_occurred_ = false; No need to reset this, we're deleting below. (The .reset() above is also unnecessary, but I can see it adding clarity).
The CQ bit was checked by braveyao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
All done. Thanks again for the insight! PTAL. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... File media/gpu/android_video_encode_accelerator.cc (right): https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:50: if (client_ptr_factory_->GetWeakPtr()) { \ On 2016/12/28 01:01:51, watk wrote: > It's not possible for this to be null AFAICT. Should probably change it to if > (!error_occurred). Done. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:154: client_ptr_factory_.reset(new base::WeakPtrFactory<Client>(client)); On 2016/12/28 01:01:51, watk wrote: > I'd suggest adding a DCHECK(client) or if (!client) return false; to make it > clear that client can't be null. Done. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:155: error_occurred_ = false; On 2016/12/28 01:01:51, watk wrote: > Initialize() cannot be called more than once so this is unnecessary. Done. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video... media/gpu/android_video_encode_accelerator.cc:289: error_occurred_ = false; On 2016/12/28 01:01:51, watk wrote: > No need to reset this, we're deleting below. (The .reset() above is also > unnecessary, but I can see it adding clarity). Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
The CQ bit was checked by braveyao@chromium.org
The CQ bit was unchecked by braveyao@chromium.org
The CQ bit was checked by braveyao@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1482947772467370, "parent_rev": "91d867f59194c9197523f67a7bc44388d552d681", "commit_rev": "151ced0e7df1b1b1461c94381c1fa104c088fbe7"}
Message was sent while issue was closed.
Description was changed from ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2601003002 ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2601003002 ========== to ========== Android: Fix random crash in HW encode accelerator There are reports of random Chrome crash due to HW encode accelerator on Android. Possible reason may be: currently we'll reset client factory at encoding error and notify client. But before client stops encoding (calling Destroy() here), if there is any pending frame delivered to encoder, it will fail to get the pointer of client factory. Here we try to leave the client factory life managment to client and only notify the encoding error to client in error handling. BUG=676987 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Committed: https://crrev.com/9bb17e2cba3cdac48a775f9375d23fd8f0ac7365 Cr-Commit-Position: refs/heads/master@{#440864} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/9bb17e2cba3cdac48a775f9375d23fd8f0ac7365 Cr-Commit-Position: refs/heads/master@{#440864} |