Issue 2077613002: Verify if GPU message pump is signaled when it hangs in WaitForWork

stanisc

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-16 22:49:06 UTC) #1

stanisc

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-16 23:00:02 UTC) #2

Description was changed from

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
even handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be waken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

BUG=
==========

to

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
even handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be waken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

BUG=620904
==========

stanisc

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-16 23:03:21 UTC) #3

Description was changed from

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
even handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be waken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

BUG=620904
==========

to

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
even handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be waken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.
BUG=620904
==========

stanisc

stanisc@chromium.org changed reviewers: + brucedawson@chromium.org, dcheng@chromium.org, kbr@chromium.org

4 years, 6 months ago (2016-06-16 23:10:21 UTC) #4

stanisc

Please note that the intention is to revert this change in a few days once ...

4 years, 6 months ago (2016-06-16 23:10:22 UTC) #5

brucedawson

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-16 23:12:56 UTC) #6

Description was changed from

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
even handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be waken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.
BUG=620904
==========

to

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

brucedawson

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-16 23:13:13 UTC) #7

Description was changed from

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

to

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

brucedawson

Description was changed from ========== Verify if GPU message pump is signaled when it hangs ...

4 years, 6 months ago (2016-06-16 23:16:55 UTC) #8

Description was changed from

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

to

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

brucedawson

I'm not actually convinced that we should revert this. It seems like a valid bit ...

4 years, 6 months ago (2016-06-16 23:19:12 UTC) #9

brucedawson

BTW, VsChromium search of my repo says that 'signal' is used more frequently - signall ...

4 years, 6 months ago (2016-06-16 23:33:44 UTC) #10

Ken Russell (switch to Gerrit)

lgtm from my standpoint. cc'ing jbauman for any additional comments.

4 years, 6 months ago (2016-06-17 00:04:29 UTC) #11

jbauman

jbauman@chromium.org changed reviewers: + jbauman@chromium.org

4 years, 6 months ago (2016-06-17 00:08:33 UTC) #12

brucedawson

kbr@, jbauman@, how do you feel about removing the mentions of reverting immediately?

4 years, 6 months ago (2016-06-17 00:42:37 UTC) #14

Ken Russell (switch to Gerrit)

On 2016/06/17 00:42:37, brucedawson wrote: > kbr@, jbauman@, how do you feel about removing the ...

4 years, 6 months ago (2016-06-17 00:59:04 UTC) #15

jbauman

On 2016/06/17 00:59:04, Ken Russell wrote: > On 2016/06/17 00:42:37, brucedawson wrote: > > kbr@, ...

4 years, 6 months ago (2016-06-17 01:11:46 UTC) #16

stanisc

Description was changed from ========== Verify if GPU message pump is signalled when it hangs ...

4 years, 6 months ago (2016-06-17 18:16:36 UTC) #17

Description was changed from

==========
Verify if GPU message pump is signalled when it hangs in WaitForWork

This is a temporary diagnostic change that I am going to
revert once we get some crash dumps back.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is woken up to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be woken up and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

to

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be awaken and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

stanisc

Fixed spelling. dcheng@, need your OWNER approval for changes in base. https://codereview.chromium.org/2077613002/diff/20001/base/message_loop/message_loop.h File base/message_loop/message_loop.h (right): ...

4 years, 6 months ago (2016-06-17 18:19:09 UTC) #18

brucedawson

lgtm, just a minor comment question. https://codereview.chromium.org/2077613002/diff/40001/base/message_loop/message_loop.h File base/message_loop/message_loop.h (right): https://codereview.chromium.org/2077613002/diff/40001/base/message_loop/message_loop.h#newcode394 base/message_loop/message_loop.h:394: // TODO (stanisc): ...

4 years, 6 months ago (2016-06-17 18:24:44 UTC) #19

dcheng

Sorry for not replying yet. I was at BlinkOn during the day and I'm sleepy ...

4 years, 6 months ago (2016-06-17 20:23:46 UTC) #20

stanisc

The patchset sent to the CQ was uploaded after l-g-t-m from kbr@chromium.org, jbauman@chromium.org Link to ...

4 years, 6 months ago (2016-06-21 00:01:41 UTC) #23

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2077613002/40001

4 years, 6 months ago (2016-06-21 00:02:32 UTC) #24

commit-bot: I haz the power

Description was changed from ========== Verify if GPU message pump is signaled when it hangs ...

4 years, 6 months ago (2016-06-21 01:34:28 UTC) #25

Message was sent while issue was closed.

Description was changed from

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be awaken and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

to

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be awaken and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

commit-bot: I haz the power

Description was changed from ========== Verify if GPU message pump is signaled when it hangs ...

4 years, 6 months ago (2016-06-21 01:37:56 UTC) #27

Message was sent while issue was closed.

Description was changed from

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that: 
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are 
multiple waiters only one of them would be awaken and
reset the event, and the other one would 
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would 
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904
==========

to

==========
Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that:
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are
multiple waiters only one of them would be awaken and
reset the event, and the other one would
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG=620904

Committed: https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c
Cr-Commit-Position: refs/heads/master@{#400871}
==========

commit-bot: I haz the power

Patchset 3 (id:??) landed as https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c Cr-Commit-Position: refs/heads/master@{#400871}

4 years, 6 months ago (2016-06-21 01:37:57 UTC) #28

stanisc

4 years, 2 months ago (2016-10-06 20:37:05 UTC) #29

Message was sent while issue was closed.

A revert of this CL (patchset #3 id:40001) has been created in
https://codereview.chromium.org/2396093003/ by stanisc@chromium.org.

The reason for reverting is: This check isn't needed anymore and in practice it
is negative in 100% of crash dumps.
After some additional research I realized that that was
a false negative and that this check doesn't work as
expected with auto-reset events. I confirmed that an
auto-reset event gets promptly reset back to non-signaled 
when it gets signaled as long as there is at least one
thread already waiting on it. That is the case even when
when the target thread is never scheduled to run. The
check would work with a manual-reset event but apparently 
it is useless in the case of an auto-reset event..

Message was sent while issue was closed.

Patchset #4 (id:60001) has been deleted

Issue 2077613002: Verify if GPU message pump is signaled when it hangs in WaitForWork (Closed)

Description

Patch Set 1 #

Patch Set 2 : Limit to OS_WIN #

Patch Set 3 : Fixed spelling #

Messages