|
|
DescriptionAbort if BrowserMainLoops fails to start a browser thread
BUG=390853
Committed: https://crrev.com/1762c8f9f6dd980da99bea1c3ab5f04201ab4884
Cr-Commit-Position: refs/heads/master@{#295501}
Patch Set 1 #
Total comments: 4
Patch Set 2 : add a CHECK in UserInputMonitorLinux #Patch Set 3 : LOG(FATAL) in BrowserMainLoop #Messages
Total messages: 34 (10 generated)
jiayl@chromium.org changed reviewers: + darin@chromium.org
darin@, Could you take a look?
jiayl@chromium.org changed reviewers: - darin@chromium.org
jiayl@chromium.org changed reviewers: + creis@chromium.org
Charlie, Could you take a look?
creis@chromium.org changed reviewers: + jam@chromium.org
[+jam] This seems like it would just paper over a bigger problem. What would happen if the UserInputMonitor wasn't created in that case? John, do you know how one of those threads might be null in this part of startup? It appears we're crashing on that line.
On 2014/09/10 17:34:23, Charlie Reis wrote: > [+jam] > > This seems like it would just paper over a bigger problem. What would happen if > the UserInputMonitor wasn't created in that case? > > John, do you know how one of those threads might be null in this part of > startup? It appears we're crashing on that line. The places using the UserInputMonitor already does NULL check: https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... So I think we are fine even if the code gets executed.
On 2014/09/10 19:42:30, jiayl wrote: > On 2014/09/10 17:34:23, Charlie Reis wrote: > > [+jam] > > > > This seems like it would just paper over a bigger problem. What would happen > if > > the UserInputMonitor wasn't created in that case? > > > > John, do you know how one of those threads might be null in this part of > > startup? It appears we're crashing on that line. > > The places using the UserInputMonitor already does NULL check: > https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... > > So I think we are fine even if the code gets executed. That only makes sure we don't crash, though. There still won't be a UserInputMonitor in cases we might have expected it in the past, causing the feature to regress. I don't know the UserInputMonitor code well, so I'll defer if someone else takes a look. It seems like we should understand whether it's expected for those threads to be null at this place in the code or not, if that's the cause of the crash.
On 2014/09/10 23:24:31, Charlie Reis wrote: > On 2014/09/10 19:42:30, jiayl wrote: > > On 2014/09/10 17:34:23, Charlie Reis wrote: > > > [+jam] > > > > > > This seems like it would just paper over a bigger problem. What would > happen > > if > > > the UserInputMonitor wasn't created in that case? > > > > > > John, do you know how one of those threads might be null in this part of > > > startup? It appears we're crashing on that line. > > > > The places using the UserInputMonitor already does NULL check: > > > https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... > > > > So I think we are fine even if the code gets executed. > > That only makes sure we don't crash, though. There still won't be a > UserInputMonitor in cases we might have expected it in the past, causing the > feature to regress. > > I don't know the UserInputMonitor code well, so I'll defer if someone else takes > a look. It seems like we should understand whether it's expected for those > threads to be null at this place in the code or not, if that's the cause of the > crash. It seems to me that IO/UI message loops can be NULL only when shutdown has started, so there should be no feature regression because it's going to exit soon and will not run any feature code.
On 2014/09/10 23:48:03, jiayl wrote: > On 2014/09/10 23:24:31, Charlie Reis wrote: > > On 2014/09/10 19:42:30, jiayl wrote: > > > On 2014/09/10 17:34:23, Charlie Reis wrote: > > > > [+jam] > > > > > > > > This seems like it would just paper over a bigger problem. What would > > happen > > > if > > > > the UserInputMonitor wasn't created in that case? > > > > > > > > John, do you know how one of those threads might be null in this part of > > > > startup? It appears we're crashing on that line. > > > > > > The places using the UserInputMonitor already does NULL check: > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... > > > > > > So I think we are fine even if the code gets executed. > > > > That only makes sure we don't crash, though. There still won't be a > > UserInputMonitor in cases we might have expected it in the past, causing the > > feature to regress. > > > > I don't know the UserInputMonitor code well, so I'll defer if someone else > takes > > a look. It seems like we should understand whether it's expected for those > > threads to be null at this place in the code or not, if that's the cause of > the > > crash. > > It seems to me that IO/UI message loops can be NULL only when shutdown has > started, so there should be no feature regression because it's going to exit > soon and will not run any feature code. That's sounds plausible, but it's hard to confirm that theory if we just put in a null check. I've suggested an alternative below that I'd be happy with. https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... content/browser/browser_main_loop.cc:1049: if (io_thread_->message_loop_proxy().get() != NULL && If we want to confirm that the cause of this crash is early shutdown, then perhaps a better way is to check Thread::IsRunning() here instead of doing a null check on message_loop_proxy. Then we'll still notice (via crash reports) if some other unknown issue is present. https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... content/browser/browser_main_loop.cc:1052: io_thread_->message_loop_proxy(), main_thread_->message_loop_proxy()); Can this use task_runner() instead of message_loop_proxy()? I noticed that the latter is deprecated according to thread.h, and the Create call already has the right contract for it.
https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... content/browser/browser_main_loop.cc:1049: if (io_thread_->message_loop_proxy().get() != NULL && On 2014/09/11 23:10:30, Charlie Reis wrote: > If we want to confirm that the cause of this crash is early shutdown, then > perhaps a better way is to check Thread::IsRunning() here instead of doing a > null check on message_loop_proxy. Then we'll still notice (via crash reports) > if some other unknown issue is present. I tested it with your suggestion, but IsRunning() is false even during normal startup when the TaskRunner is not NULL. https://codereview.chromium.org/549303003/diff/1/content/browser/browser_main... content/browser/browser_main_loop.cc:1052: io_thread_->message_loop_proxy(), main_thread_->message_loop_proxy()); On 2014/09/11 23:10:30, Charlie Reis wrote: > Can this use task_runner() instead of message_loop_proxy()? I noticed that the > latter is deprecated according to thread.h, and the Create call already has the > right contract for it. Will do.
the check is useless. check how BrowserThreadsStarted is called. the IO thread must have been created beforehand in CreateThreads.
On 2014/09/12 04:21:47, jam wrote: > the check is useless. check how BrowserThreadsStarted is called. the IO thread > must have been created beforehand in CreateThreads. What if BrowserMainLoop::ShutdownThreadsAndCleanUp is called before the startup is done? This check (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/br...) indicates that it may happen.
On 2014/09/12 15:56:36, jiayl wrote: > On 2014/09/12 04:21:47, jam wrote: > > the check is useless. check how BrowserThreadsStarted is called. the IO thread > > must have been created beforehand in CreateThreads. > > What if BrowserMainLoop::ShutdownThreadsAndCleanUp is called before the startup > is done? This check > (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/br...) > indicates that it may happen. Ping jam.
On 2014/09/15 23:18:00, jiayl wrote: > On 2014/09/12 15:56:36, jiayl wrote: > > On 2014/09/12 04:21:47, jam wrote: > > > the check is useless. check how BrowserThreadsStarted is called. the IO > thread > > > must have been created beforehand in CreateThreads. > > > > What if BrowserMainLoop::ShutdownThreadsAndCleanUp is called before the > startup > > is done? This check > > > (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/br...) > > indicates that it may happen. > > Ping jam. sorry for the delay (overloaded with CY) that line is only for android, which runs this code asynchronously. on non-android, the code is run synchronously
jiayl@chromium.org changed reviewers: + vrk@chromium.org - creis@chromium.org, jam@chromium.org
jiayl@chromium.org changed reviewers: + tommi@chromium.org - vrk@chromium.org
tommi Please take a look. I think it may happen in the rare case that the IO thread exits by the time UserInputMonitor is created on the main thread.
Is this to track down an issue on Android or only on Linux? From what I can tell, on linux, the tasks are lined up and executed directly thereafter from BrowserMainLoop::CreateStartupTasks(), so that doesn't seem like it would be possible. BrowserThreadsStarted should always run immediately after creating the threads and I don't see how the io_thread_ variable could ever be set to NULL... have you figured out how that could happen?
On 2014/09/17 21:10:01, tommi wrote: > Is this to track down an issue on Android or only on Linux? > From what I can tell, on linux, the tasks are lined up and executed directly > thereafter from BrowserMainLoop::CreateStartupTasks(), so that doesn't seem like > it would be possible. > BrowserThreadsStarted should always run immediately after creating the threads > and I don't see how the io_thread_ variable could ever be set to NULL... have > you figured out how that could happen? OK, I take my last comment back: there seems no way to exit the IO thread while starting up. But Thread::StartWithOptions may fail and BrowserMainLoop::CreateThreads does not check the result: https://code.google.com/p/chromium/codesearch#chromium/src/base/threading/thr... I think we should check the result and do something, maybe CHECK in CreateThreads. Wdyt?
What about listening to the WillDestroyCurrentMessageLoop event for the IO thread/loop and clear the pointer if it hits before destructing the input monitor?
On 2014/09/17 22:09:25, tommi wrote: > What about listening to the WillDestroyCurrentMessageLoop event for the IO > thread/loop and clear the pointer if it hits before destructing the input > monitor? But the TaskRunner/MessageLoopProxy passed to UserInputMonitor is NULL. Also it's not affecting only UserInputMonitor, but everything depending on IO message loop. The browser is running in a really bad state if StartWithOptions failed.
OK, lgtm. If the pointer is NULL then that seems like an important thing to understand. It sounds unlikely to me that the IO thread can't be created for some reason but I guess we'll see what the process state is when we start to get crash dumps.
jiayl@chromium.org changed reviewers: + jam@chromium.org
jiayl@chromium.org changed reviewers: - jam@chromium.org
jiayl@chromium.org changed reviewers: + creis@chromium.org, jam@chromium.org
Adding creis@ for owner's review, since I'm not sure if jam@ is available to take a look.
LGTM
The CQ bit was checked by jiayl@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/549303003/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as 8eeef2aed447daeeda1114706450dadb4b06c36f
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/1762c8f9f6dd980da99bea1c3ab5f04201ab4884 Cr-Commit-Position: refs/heads/master@{#295501} |