Issue 2601003002: Android: Fix random crash in HW encode accelerator

braveyao

Description was changed from ========== Android: Fix random crash in HW encode accelerator There are ...

3 years, 12 months ago (2016-12-27 22:09:49 UTC) #1

braveyao

The CQ bit was checked by braveyao@chromium.org to run a CQ dry run

3 years, 12 months ago (2016-12-27 22:10:03 UTC) #2

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2601003002/1

3 years, 12 months ago (2016-12-27 22:10:18 UTC) #3

braveyao

Description was changed from ========== Android: Fix random crash in HW encode accelerator There are ...

3 years, 12 months ago (2016-12-27 22:23:04 UTC) #4

braveyao

Description was changed from ========== Android: Fix random crash in HW encode accelerator There are ...

3 years, 12 months ago (2016-12-27 22:23:28 UTC) #5

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 12 months ago (2016-12-27 23:08:22 UTC) #6

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 12 months ago (2016-12-27 23:08:22 UTC) #7

braveyao

braveyao@chromium.org changed reviewers: + watk@chromium.org

3 years, 12 months ago (2016-12-27 23:16:38 UTC) #8

watk

On 2016/12/27 23:16:39, braveyao wrote: > Hi, watk@, please take a look at this. Hmm, ...

3 years, 12 months ago (2016-12-27 23:54:36 UTC) #10

braveyao

On 2016/12/27 23:54:36, watk wrote: > On 2016/12/27 23:16:39, braveyao wrote: > > Hi, watk@, ...

3 years, 12 months ago (2016-12-28 00:08:51 UTC) #11

watk

On 2016/12/28 00:08:51, braveyao wrote: > On 2016/12/27 23:54:36, watk wrote: > > On 2016/12/27 ...

3 years, 12 months ago (2016-12-28 00:16:35 UTC) #12

On 2016/12/28 00:08:51, braveyao wrote:
> On 2016/12/27 23:54:36, watk wrote:
> > On 2016/12/27 23:16:39, braveyao wrote:
> > > Hi, watk@, please take a look at this.
> > 
> > Hmm, either WeakPtrFactory is being used incorrectly or I'm really
confused..
> > 
> > There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But
it
> > looks to me that it's never true?
> > 
> > The only time client_ptr_factory_->GetWeakPtr() will be false is if the
> factory
> > was initialized with a null pointer, right? And we don't need to handle that
> > case.
> > 
> > The two important problems are:
> > 1) don't call client methods after Destroy(), which is the point of the
> > weakptrfactory.
> > 2) return early from all methods after calling NotifyError() (because we're
in
> > an error state). I think this is what resetting the client_ptr_factory was
> > supposed to achieve, but it failed.
> 
> Hi watk@, thanks so much for the comments! Below is some of my findings
leading
> to this cl:
> 
> 1) if I add "client_ptr_factory_.reset()" before "if
> (!client_ptr_factory_->GetWeakPtr())", then "if
> (!client_ptr_factory_->GetWeakPtr())" will cause a crash with a similar stack.
> So I suppose "if (!client_ptr_factory_->GetWeakPtr())" is not the correct way
> for such protection.
> 
> 2) After "Destroy()" there will be no more encoding. So I suppose the problem
> happens before "Destroy()" is called (but I can't verify it without
> reproduction.)
> 
> 3) At first, I tried the method to check "client_ptr_factory_.get()"(or maybe
> "client_ptr_factory_.HasWeakPtrs()" is better?) for each method called in
> "DoIOTask()". Then I think it may be better to let client control the life
cycle
> of client_ptr_factory_" as this cl does. Please help to suggest which one is
> better ^_^

IMO what we should do is add a member variable e.g., bool error_occurred_. Set
it in RETURN_ON_FAILURE, and replace the if (client_ptr_factory_->GetWeakPtr())
with it.

Then we should only call client_ptr_factory_->GetWeakPtr() when we're posting a
task that might need to be canceled, which is much clearer to me. And we only
need to call client_ptr_factory_.reset() from Destroy().

WDYT? This is all assuming I've understood the intent of the original code :)

watk

On 2016/12/28 00:16:35, watk wrote: > On 2016/12/28 00:08:51, braveyao wrote: > > On 2016/12/27 ...

3 years, 12 months ago (2016-12-28 00:36:09 UTC) #13

On 2016/12/28 00:16:35, watk wrote:
> On 2016/12/28 00:08:51, braveyao wrote:
> > On 2016/12/27 23:54:36, watk wrote:
> > > On 2016/12/27 23:16:39, braveyao wrote:
> > > > Hi, watk@, please take a look at this.
> > > 
> > > Hmm, either WeakPtrFactory is being used incorrectly or I'm really
> confused..
> > > 
> > > There are multiple checks for if (!client_ptr_factory_->GetWeakPtr()). But
> it
> > > looks to me that it's never true?
> > > 
> > > The only time client_ptr_factory_->GetWeakPtr() will be false is if the
> > factory
> > > was initialized with a null pointer, right? And we don't need to handle
that
> > > case.
> > > 
> > > The two important problems are:
> > > 1) don't call client methods after Destroy(), which is the point of the
> > > weakptrfactory.
> > > 2) return early from all methods after calling NotifyError() (because
we're
> in
> > > an error state). I think this is what resetting the client_ptr_factory was
> > > supposed to achieve, but it failed.
> > 
> > Hi watk@, thanks so much for the comments! Below is some of my findings
> leading
> > to this cl:
> > 
> > 1) if I add "client_ptr_factory_.reset()" before "if
> > (!client_ptr_factory_->GetWeakPtr())", then "if
> > (!client_ptr_factory_->GetWeakPtr())" will cause a crash with a similar
stack.
> > So I suppose "if (!client_ptr_factory_->GetWeakPtr())" is not the correct
way
> > for such protection.
> > 
> > 2) After "Destroy()" there will be no more encoding. So I suppose the
problem
> > happens before "Destroy()" is called (but I can't verify it without
> > reproduction.)
> > 
> > 3) At first, I tried the method to check "client_ptr_factory_.get()"(or
maybe
> > "client_ptr_factory_.HasWeakPtrs()" is better?) for each method called in
> > "DoIOTask()". Then I think it may be better to let client control the life
> cycle
> > of client_ptr_factory_" as this cl does. Please help to suggest which one is
> > better ^_^
> 
> IMO what we should do is add a member variable e.g., bool error_occurred_. Set
> it in RETURN_ON_FAILURE, and replace the if
(client_ptr_factory_->GetWeakPtr())
> with it.
> 
> Then we should only call client_ptr_factory_->GetWeakPtr() when we're posting
a
> task that might need to be canceled, which is much clearer to me. And we only
> need to call client_ptr_factory_.reset() from Destroy().
> 
> WDYT? This is all assuming I've understood the intent of the original code :)

Actually, one more thing calling InvalidateWeakPtrs() is clearer than reset() in
Destroy as well.

braveyao

The CQ bit was checked by braveyao@chromium.org to run a CQ dry run

3 years, 12 months ago (2016-12-28 00:36:33 UTC) #14

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2601003002/1

3 years, 12 months ago (2016-12-28 00:36:46 UTC) #15

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 12 months ago (2016-12-28 00:39:26 UTC) #16

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 12 months ago (2016-12-28 00:39:26 UTC) #17

braveyao

The CQ bit was checked by braveyao@chromium.org to run a CQ dry run

3 years, 12 months ago (2016-12-28 00:49:13 UTC) #18

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2601003002/20001

3 years, 12 months ago (2016-12-28 00:49:28 UTC) #19

braveyao

Hi watk@, thanks so much for the suggestion. PTAL. BTW: InvalidateWeakPtrs() causes other problem at ...

3 years, 12 months ago (2016-12-28 00:51:58 UTC) #20

watk

https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video_encode_accelerator.cc File media/gpu/android_video_encode_accelerator.cc (right): https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video_encode_accelerator.cc#newcode50 media/gpu/android_video_encode_accelerator.cc:50: if (client_ptr_factory_->GetWeakPtr()) { \ It's not possible for this ...

3 years, 12 months ago (2016-12-28 01:01:51 UTC) #21

braveyao

The CQ bit was checked by braveyao@chromium.org to run a CQ dry run

3 years, 12 months ago (2016-12-28 01:17:55 UTC) #22

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2601003002/40001

3 years, 12 months ago (2016-12-28 01:18:07 UTC) #23

braveyao

All done. Thanks again for the insight! PTAL. https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video_encode_accelerator.cc File media/gpu/android_video_encode_accelerator.cc (right): https://codereview.chromium.org/2601003002/diff/20001/media/gpu/android_video_encode_accelerator.cc#newcode50 media/gpu/android_video_encode_accelerator.cc:50: if ...

3 years, 12 months ago (2016-12-28 01:28:19 UTC) #24

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 12 months ago (2016-12-28 02:20:14 UTC) #25

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 12 months ago (2016-12-28 02:20:15 UTC) #26

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2601003002/40001

3 years, 11 months ago (2016-12-28 17:56:18 UTC) #31

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1482947772467370, "parent_rev": "91d867f59194c9197523f67a7bc44388d552d681", "commit_rev": "151ced0e7df1b1b1461c94381c1fa104c088fbe7"}

3 years, 11 months ago (2016-12-28 18:00:02 UTC) #32

commit-bot: I haz the power

Description was changed from ========== Android: Fix random crash in HW encode accelerator There are ...

3 years, 11 months ago (2016-12-28 18:00:29 UTC) #33

commit-bot: I haz the power

Description was changed from ========== Android: Fix random crash in HW encode accelerator There are ...

3 years, 11 months ago (2017-01-02 15:49:27 UTC) #35

commit-bot: I haz the power

3 years, 11 months ago (2017-01-02 15:49:28 UTC) #36

Message was sent while issue was closed.

Patchset 3 (id:??) landed as
https://crrev.com/9bb17e2cba3cdac48a775f9375d23fd8f0ac7365
Cr-Commit-Position: refs/heads/master@{#440864}

Issue 2601003002: Android: Fix random crash in HW encode accelerator (Closed)

Description

Patch Set 1 #

Patch Set 2 : address comments #

Patch Set 3 : address comments #

Messages