|
|
DescriptionVerify if GPU message pump is signaled when it hangs in WaitForWork
This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.
We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that:
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait
While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are
multiple waiters only one of them would be awaken and
reset the event, and the other one would
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.
This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would
give us a clue of whether the situation described above
is actually happening.
Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.
BUG=620904
Committed: https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c
Cr-Commit-Position: refs/heads/master@{#400871}
Patch Set 1 #Patch Set 2 : Limit to OS_WIN #
Total comments: 2
Patch Set 3 : Fixed spelling #
Total comments: 1
Messages
Total messages: 30 (14 generated)
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork BUG= ========== to ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. BUG= ==========
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. BUG= ========== to ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. BUG=620904 ==========
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. BUG=620904 ========== to ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
stanisc@chromium.org changed reviewers: + brucedawson@chromium.org, dcheng@chromium.org, kbr@chromium.org
Please note that the intention is to revert this change in a few days once we get a few crash dumps back. kbr@chromium.org: Please review changes in GPU watchdog dcheng@chromium.org: Please review changes in base brucedawson@chromium.org: Take a look if you have time
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
Description was changed from ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
I'm not actually convinced that we should revert this. It seems like a valid bit of diagnostic information to collect immediately before crashing. Thoughts? Code looks straightforward. https://codereview.chromium.org/2077613002/diff/20001/base/message_loop/messa... File base/message_loop/message_loop.h (right): https://codereview.chromium.org/2077613002/diff/20001/base/message_loop/messa... base/message_loop/message_loop.h:399: bool MessagePumpWasSignalled(); It looks like the US spelling is signaled (one 'l'), so maybe change to that? If so then change the CL description also, and the comments.
BTW, VsChromium search of my repo says that 'signal' is used more frequently - signall is just 1,236 out of 10,003 hits, so 12.3%.
lgtm from my standpoint. cc'ing jbauman for any additional comments.
jbauman@chromium.org changed reviewers: + jbauman@chromium.org
lgtm
kbr@, jbauman@, how do you feel about removing the mentions of reverting immediately?
On 2016/06/17 00:42:37, brucedawson wrote: > kbr@, jbauman@, how do you feel about removing the mentions of reverting > immediately? Fine by me to leave this in if you think it'll be useful, since the code's only executed when the GPU process is tearing itself down.
On 2016/06/17 00:59:04, Ken Russell wrote: > On 2016/06/17 00:42:37, brucedawson wrote: > > kbr@, jbauman@, how do you feel about removing the mentions of reverting > > immediately? > > Fine by me to leave this in if you think it'll be useful, since the code's only > executed when the GPU process is tearing itself down. Yeah, I don't have a problem with leaving this in indefinitely, though if the result is never interesting we might as well remove it at that point. So, it'd probably make sense to remove the mention of reverting immediately, though keeping the bug # in the comment makes sense.
Description was changed from ========== Verify if GPU message pump is signalled when it hangs in WaitForWork This is a temporary diagnostic change that I am going to revert once we get some crash dumps back. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be woken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a diagnostic change. The code introduced in this change runs only when GPU process is about to terminate with a deliberate crash. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is awaken to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be awaken and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
Fixed spelling. dcheng@, need your OWNER approval for changes in base. https://codereview.chromium.org/2077613002/diff/20001/base/message_loop/messa... File base/message_loop/message_loop.h (right): https://codereview.chromium.org/2077613002/diff/20001/base/message_loop/messa... base/message_loop/message_loop.h:399: bool MessagePumpWasSignalled(); On 2016/06/16 23:19:12, brucedawson wrote: > It looks like the US spelling is signaled (one 'l'), so maybe change to that? If > so then change the CL description also, and the comments. Done.
lgtm, just a minor comment question. https://codereview.chromium.org/2077613002/diff/40001/base/message_loop/messa... File base/message_loop/message_loop.h (right): https://codereview.chromium.org/2077613002/diff/40001/base/message_loop/messa... base/message_loop/message_loop.h:394: // TODO (stanisc): crbug.com/596190: Remove this after the signaling issue Is this comment about removal still needed? That is, should this still be a TODO comment? Bug number is still good. Applies to other comments also.
Sorry for not replying yet. I was at BlinkOn during the day and I'm sleepy / flying back to the US on Saturday. I'm happy to look on Monday, but if this is urgent, you may want to find another //base OWNER.
LGTM
The CQ bit was checked by stanisc@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from kbr@chromium.org, jbauman@chromium.org Link to the patchset: https://codereview.chromium.org/2077613002/#ps40001 (title: "Fixed spelling")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2077613002/40001
Message was sent while issue was closed.
Description was changed from ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a diagnostic change. The code introduced in this change runs only when GPU process is about to terminate with a deliberate crash. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is awaken to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be awaken and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a diagnostic change. The code introduced in this change runs only when GPU process is about to terminate with a deliberate crash. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is awaken to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be awaken and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a diagnostic change. The code introduced in this change runs only when GPU process is about to terminate with a deliberate crash. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is awaken to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be awaken and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 ========== to ========== Verify if GPU message pump is signaled when it hangs in WaitForWork This is a diagnostic change. The code introduced in this change runs only when GPU process is about to terminate with a deliberate crash. We a getting a number of crashes triggered by GPU hang in MessagePumpForGpu::WaitForWork. There is already some instrumentation that indicates that: a) MessagePumpForGpu::WaitForWork is sitting in MsgWaitForMultipleObjectsEx for longer than 15 seconds b) MessagePumpForGpu::ScheduleWork is called after WaitForWork enters the wait, sometimes several seconds after, and SetEvent must be called (at least we grab the timestamp right before calling SetEvent method) c) The event is set but it doesn't wake up the wait While it is possible for a thread that is awaken to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the event handle might be recycled and there might be some other code that has closed but hasn't nulled out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be awaken and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that would explain the hang. This code would allow GPU watchdog to check whether the event was set at the time of the crash. This would give us a clue of whether the situation described above is actually happening. Extra bonus: the investigation would also explain if Renderer hangs in MessagePumpDefault::Run are caused by the same issue. BUG=620904 Committed: https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c Cr-Commit-Position: refs/heads/master@{#400871} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c Cr-Commit-Position: refs/heads/master@{#400871}
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.chromium.org/2396093003/ by stanisc@chromium.org. The reason for reverting is: This check isn't needed anymore and in practice it is negative in 100% of crash dumps. After some additional research I realized that that was a false negative and that this check doesn't work as expected with auto-reset events. I confirmed that an auto-reset event gets promptly reset back to non-signaled when it gets signaled as long as there is at least one thread already waiting on it. That is the case even when when the target thread is never scheduled to run. The check would work with a manual-reset event but apparently it is useless in the case of an auto-reset event..
Message was sent while issue was closed.
Patchset #4 (id:60001) has been deleted |