|
|
Created:
3 years, 5 months ago by hugoh_UTC2 Modified:
3 years, 5 months ago CC:
chromium-reviews, jam, darin-cc_chromium.org, piman+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionAvoid race at shutdown between browser process' IO thread and GPU thread
Background:
- When the GPU thread is *in-process* it runs inside the browser process.
- GpuChildThread is a ChildThreadImpl and a IPC::Listener.
- When GpuChildThread sees an error (or shutdown) of its IPC channel,
it quits its message loop (= stops the thread).
Problem:
The browser process' IOThread calls InProcessGpuThread::StopSoon()
while the in-process GPU thread destructs itself.
When InProcessGpuThread::StopSoon asks for task_runner() the in-process
GPU thread has - sometimes - already destructed itself and its
MessageQueue. The IOThread then sees task_runner() == null and crashes:
[FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr.
0 base::debug::StackTrace::StackTrace()
1 base::debug::StackTrace::StackTrace()
2 logging::LogMessage::~LogMessage()
3 base::Thread::StopSoon()
4 base::Thread::Stop()
5 content::InProcessGpuThread::~InProcessGpuThread()
6 content::InProcessGpuThread::~InProcessGpuThread()
7 content::GpuProcessHost::~GpuProcessHost()
8 content::GpuProcessHost::~GpuProcessHost()
9 content::BrowserChildProcessHostImpl::TerminateAll()
10 content::BrowserProcessSubThread::IOThreadPreCleanUp()
11 content::BrowserProcessSubThread::CleanUp()
12 base::Thread::ThreadMain()
13 base::(anonymous namespace)::ThreadFunc()
Solution:
If a thread runs in the browser process, only Thread::Stop should
stop its message loop. Otherwise, QuitWhenIdle could race Thread::Stop.
BUG=258935
Review-Url: https://codereview.chromium.org/2973723003
Cr-Commit-Position: refs/heads/master@{#485895}
Committed: https://chromium.googlesource.com/chromium/src/+/f1c501fa3022cc7de41048cce085a3e0ade8d225
Patch Set 1 : Change destruction order #Patch Set 2 : Move CHECK to parent class #Patch Set 3 : Only let Thread::Stop stop child threads inside the browser process #Messages
Total messages: 34 (16 generated)
hugoh@opera.com changed reviewers: + jam@chromium.org, piman@chromium.org
PTAL
Description was changed from ========== Avoid race at shutdown between browser's IO thread and in-process GPU thread Background: When the GPU thread is in-process it runs inside the browser process. Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. This means that for some timings, when InProcessGpuThread::StopSoon(), running on the IOThread, asks for task_runner() the GPU thread has already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crash: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: Destruct InProcessGpuThread before its host process, i.e. before its BrowserChildProcessHostImpl. This ensures that the IOThread still can reach a non-null task_runner() when it runs InProcessGpuThread::StopSoon(). BUG=258935 ========== to ========== Avoid race at shutdown between browser's IO thread and in-process GPU thread Background: When the GPU thread is in-process it runs inside the browser process. Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. This means that for some timings, when InProcessGpuThread::StopSoon(), running on the IOThread, asks for task_runner() the GPU thread has already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: Destruct InProcessGpuThread before its host process, i.e. before its BrowserChildProcessHostImpl. This ensures that the IOThread still can reach a non-null task_runner() when it runs InProcessGpuThread::StopSoon(). BUG=258935 ==========
I don't think I understand the race. What causes the in-process GPU thread to "destroy itself"? This sounds like a double-free, since the GPU thread should be destroyed by GpuProcessHost
On 2017/07/06 20:11:04, piman wrote: > I don't think I understand the race. What causes the in-process GPU thread to > "destroy itself"? This sounds like a double-free, since the GPU thread should be > destroyed by GpuProcessHost My understanding is that BrowserChildProcessHostImpl::~BrowserChildProcessHostImpl() somehow causes the native GPU thread to exit and destruct itself. This exit happens just before InProcessGpuThread::StopSoon() tries to grab the task_runner(). The fact the this happens with low probability made me think of it as a race. So we have a race that leads to a double-free? Here are my debug prints (without patch): [5119:5234:0706/225225.185535:3842216599716:ERROR:browser_child_process_host_impl.cc(207)] HUGO BrowserChildProcessHostImpl deletes: 0x2495270dfc20 [5119:5234:0706/225225.186215:3842216600396:ERROR:in_process_gpu_thread.cc(52)] HUGO InProcessGpuThread::~InProcessGpuThread() [5119:5234:0706/225225.186241:3842216600422:ERROR:in_process_gpu_thread.cc(123)] HUGO InProcessGpuThread::Stop() PlatformThread exits A message_loop_: 0x249526f64020 [5119:5234:0706/225225.186251:3842216600442:ERROR:in_process_gpu_thread.cc(177)] HUGO A InProcessGpuThread::StopSoon() short sleep before task_runner()->PostTask. message_loop_: 0x249526f64020 run_loop: 0x7fba82dcedc0 [5119:5251:0706/225225.186337:3842216600518:ERROR:message_loop.cc(227)] HUGO MessageLoop::QuitWhenIdle() this:0x249526f64020 run_loop_ : 0x7fba82dcedc0 [5119:5251:0706/225225.186371:3842216600552:ERROR:run_loop.cc(134)] HUGO RunLoop::QuitWhenIdle 0x7fba82dcedc0 [5119:5251:0706/225225.186403:3842216600584:ERROR:child_process.cc(81)] HUGO ChildProcess::~ChildProcess() [5119:5251:0706/225225.186442:3842216600623:ERROR:gpu_child_thread.cc(188)] HUGO GpuChildThread::~GpuChildThread() [5119:5234:0706/225225.236074:3842216650257:ERROR:in_process_gpu_thread.cc(179)] HUGO B InProcessGpuThread::StopSoon() task_runner()->PostTask, message_loop_: 0 run_loop: 0 BANG! When I change the destruction order, task_runner() is always non-null and I see that ~GpuChildThread() runs while InProcessGpuThread::Stop()'s Join() (IOThread) waits for the GPU thread's exit.
Oh that wasn't very readable. Reformat: [5119:5234 browser_child_process_host_impl.cc(207)] BrowserChildProcessHostImpl deletes: 0x2495270dfc20 [5119:5234 in_process_gpu_thread.cc(52)] InProcessGpuThread::~InProcessGpuThread() [5119:5234 in_process_gpu_thread.cc(123)] InProcessGpuThread::Stop() [5119:5234 in_process_gpu_thread.cc(177)] InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() [5119:5251 run_loop.cc(134)] RunLoop::QuitWhenIdle 0x7fba82dcedc0 [5119:5251 child_process.cc(81)] ChildProcess::~ChildProcess() [5119:5251 gpu_child_thread.cc(188)] GpuChildThread::~GpuChildThread() [5119:5234 in_process_gpu_thread.cc(179)] InProcessGpuThread::StopSoon() does task_runner()->PostTask, message_loop_: 0 run_loop: 0 -> BANG!
On Thu, Jul 6, 2017 at 2:29 PM, <hugoh@opera.com> wrote: > Oh that wasn't very readable. Reformat: > > [5119:5234 browser_child_process_host_impl.cc(207)] > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > [5119:5234 in_process_gpu_thread.cc(52)] > InProcessGpuThread::~InProcessGpuThread() > > [5119:5234 in_process_gpu_thread.cc(123)] > InProcessGpuThread::Stop() > > [5119:5234 in_process_gpu_thread.cc(177)] > InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is what leads to a race in base::Thread, which I think assumes it only exits the MessageLoop from StopSoon. > [5119:5251 run_loop.cc(134)] RunLoop::QuitWhenIdle 0x7fba82dcedc0 > [5119:5251 child_process.cc(81)] ChildProcess::~ChildProcess() > [5119:5251 gpu_child_thread.cc(188)] GpuChildThread::~GpuChildThread() > > [5119:5234 in_process_gpu_thread.cc(179)] > InProcessGpuThread::StopSoon() does task_runner()->PostTask, > message_loop_: 0 run_loop: 0 -> BANG! > > https://codereview.chromium.org/2973723003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/07/06 22:34:29, chromium-reviews wrote: > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > Oh that wasn't very readable. Reformat: > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > InProcessGpuThread::~InProcessGpuThread() > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > InProcessGpuThread::Stop() > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is > what leads to a race in base::Thread, which I think assumes it only exits > the MessageLoop from StopSoon. > Yes, I think so too! Some debugging showed me that ChildThreadImpl::OnChannelError() calls it: [25635:25658:in_process_gpu_thread.cc(182)] InProcessGpuThread::Stop() pausing 5 seconds message_loop_: 0x28224c01a000 [25635:25665:message_loop.cc(432)] MessageLoop::PostTask posted from: OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * [25635:25665:child_thread_impl.cc(622)] ChildThreadImpl::OnChannelError [25635:25665:message_loop.cc(235)] MessageLoop::QuitWhenIdle() this: 0x28224c01a000 #0 0x0000016c9907 base::debug::StackTrace::StackTrace() #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() #4 0x0000016e50d3 base::MessageLoop::RunTask() #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() #6 0x0000016e5874 base::MessageLoop::DoWork() #7 0x0000016e78ea base::MessagePumpDefault::Run() #8 0x000001701e80 base::RunLoop::Run() #9 0x000001724877 base::Thread::ThreadMain() #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() #11 0x7f6c93cfb184 start_thread #12 0x7f6c924ebbed clone [25635:25665:child_process.cc(81)] ChildProcess::~ChildProcess() [25635:25665:gpu_child_thread.cc(195)] GpuChildThread::~GpuChildThread() [25635:25658:in_process_gpu_thread.cc(220)] InProcessGpuThread::StopSoon() message_loop_ = 0 * Why do we get an error on the IPC channel that GpuChildThread listens to...? #0 0x0000016c9907 base::debug::StackTrace::StackTrace() #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() #4 0x0000018a705d IPC::(anonymous namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError() #5 0x0000018a5f22 IPC::(anonymous namespace)::ChannelAssociatedGroupController::OnPeerAssociatedEndpointClosed() #6 0x0000018a602a IPC::(anonymous namespace)::ChannelAssociatedGroupController::OnPeerAssociatedEndpointClosed() #7 0x0000018927d1 mojo::PipeControlMessageHandler::RunOrClosePipe() #8 0x000001889e86 mojo::Connector::ReadSingleMessage() #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() #12 0x0000016e50d3 base::MessageLoop::RunTask() #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() #14 0x0000016e5874 base::MessageLoop::DoWork() #15 0x0000016e8439 base::MessagePumpLibevent::Run() #16 0x000001701e80 base::RunLoop::Run() #17 0x000001724877 base::Thread::ThreadMain() #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() #19 0x7f4bf1fb3184 start_thread #20 0x7f4bf07a3bed clone ... I think because OnPeerAssociatedEndpointClosed() has: if (!endpoint->peer_closed()) { if (endpoint->client()) NotifyEndpointOfError(endpoint.get(), false /* force_async */); Here the connection still has a listening client (presumably because ~InProcessGpuThread hasn't run); so an error is sent to GpuChildThread. When ChildThreadImpl::OnChannelError gets the error it shuts down the message loop. https://chromium.googlesource.com/chromium/src/base/+/master/threading/thread... has an early return when !message_loop_ = null. But this protection is not enough. In this bug, GpuChildThread "wins" the race and sets message_loop_ it to null just after that check.
On Fri, Jul 7, 2017 at 6:50 AM, <hugoh@opera.com> wrote: > On 2017/07/06 22:34:29, chromium-reviews wrote: > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > Oh that wasn't very readable. Reformat: > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > InProcessGpuThread::Stop() > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is > > what leads to a race in base::Thread, which I think assumes it only exits > > the MessageLoop from StopSoon. > > > > Yes, I think so too! > > Some debugging showed me that ChildThreadImpl::OnChannelError() calls it: > > [25635:25658:in_process_gpu_thread.cc(182)] > InProcessGpuThread::Stop() pausing 5 seconds > message_loop_: 0x28224c01a000 > > [25635:25665:message_loop.cc(432)] > MessageLoop::PostTask posted from: > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > [25635:25665:child_thread_impl.cc(622)] > ChildThreadImpl::OnChannelError > > [25635:25665:message_loop.cc(235)] > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > #4 0x0000016e50d3 base::MessageLoop::RunTask() > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > #6 0x0000016e5874 base::MessageLoop::DoWork() > #7 0x0000016e78ea base::MessagePumpDefault::Run() > #8 0x000001701e80 base::RunLoop::Run() > #9 0x000001724877 base::Thread::ThreadMain() > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > #11 0x7f6c93cfb184 start_thread > #12 0x7f6c924ebbed clone > > [25635:25665:child_process.cc(81)] > ChildProcess::~ChildProcess() > > [25635:25665:gpu_child_thread.cc(195)] > GpuChildThread::~GpuChildThread() > > [25635:25658:in_process_gpu_thread.cc(220)] > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > * Why do we get an error on the IPC channel that GpuChildThread listens > to...? > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > #4 0x0000018a705d IPC::(anonymous > namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError() > #5 0x0000018a5f22 IPC::(anonymous > namespace)::ChannelAssociatedGroupController:: > OnPeerAssociatedEndpointClosed() > #6 0x0000018a602a IPC::(anonymous > namespace)::ChannelAssociatedGroupController:: > OnPeerAssociatedEndpointClosed() > #7 0x0000018927d1 mojo::PipeControlMessageHandler::RunOrClosePipe() > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > #12 0x0000016e50d3 base::MessageLoop::RunTask() > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > #14 0x0000016e5874 base::MessageLoop::DoWork() > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > #16 0x000001701e80 base::RunLoop::Run() > #17 0x000001724877 base::Thread::ThreadMain() > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > #19 0x7f4bf1fb3184 start_thread > #20 0x7f4bf07a3bed clone > > ... I think because OnPeerAssociatedEndpointClosed() has: > > if (!endpoint->peer_closed()) { > if (endpoint->client()) > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > Here the connection still has a listening client (presumably because > ~InProcessGpuThread hasn't run); so an error is sent to GpuChildThread. > When > ChildThreadImpl::OnChannelError gets the error it shuts down the message > loop. > > https://chromium.googlesource.com/chromium/src/base/+/ > master/threading/thread.cc#198 > has an early return when !message_loop_ = null. But this protection is not > enough. In this bug, GpuChildThread "wins" the race and sets message_loop_ > it to > null just after that check. > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, quitting the message loop is necessary when out-of-process (e.g. if the browser crashes, we want child processes to terminate). But the problem is that when the child thread is in-process, this causes a race. I think there's 2 options: 1- fix the data race in base::Thread (e.g. adding an atomic flag) 2- make ChildThreadImpl::OnChannelError not quit the message loop when in-process (ChildThreadImpl::IsInBrowserProcess is true). > > https://codereview.chromium.org/2973723003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/07/07 18:14:52, chromium-reviews wrote: > On Fri, Jul 7, 2017 at 6:50 AM, <mailto:hugoh@opera.com> wrote: > > > On 2017/07/06 22:34:29, chromium-reviews wrote: > > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > > > Oh that wasn't very readable. Reformat: > > > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > > InProcessGpuThread::Stop() > > > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > > InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. > > > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is > > > what leads to a race in base::Thread, which I think assumes it only exits > > > the MessageLoop from StopSoon. > > > > > > > Yes, I think so too! > > > > Some debugging showed me that ChildThreadImpl::OnChannelError() calls it: > > > > [25635:25658:in_process_gpu_thread.cc(182)] > > InProcessGpuThread::Stop() pausing 5 seconds > > message_loop_: 0x28224c01a000 > > > > [25635:25665:message_loop.cc(432)] > > MessageLoop::PostTask posted from: > > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > > > [25635:25665:child_thread_impl.cc(622)] > > ChildThreadImpl::OnChannelError > > > > [25635:25665:message_loop.cc(235)] > > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > #4 0x0000016e50d3 base::MessageLoop::RunTask() > > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > #6 0x0000016e5874 base::MessageLoop::DoWork() > > #7 0x0000016e78ea base::MessagePumpDefault::Run() > > #8 0x000001701e80 base::RunLoop::Run() > > #9 0x000001724877 base::Thread::ThreadMain() > > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > #11 0x7f6c93cfb184 start_thread > > #12 0x7f6c924ebbed clone > > > > [25635:25665:child_process.cc(81)] > > ChildProcess::~ChildProcess() > > > > [25635:25665:gpu_child_thread.cc(195)] > > GpuChildThread::~GpuChildThread() > > > > [25635:25658:in_process_gpu_thread.cc(220)] > > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > > > > * Why do we get an error on the IPC channel that GpuChildThread listens > > to...? > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > > #4 0x0000018a705d IPC::(anonymous > > namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError() > > #5 0x0000018a5f22 IPC::(anonymous > > namespace)::ChannelAssociatedGroupController:: > > OnPeerAssociatedEndpointClosed() > > #6 0x0000018a602a IPC::(anonymous > > namespace)::ChannelAssociatedGroupController:: > > OnPeerAssociatedEndpointClosed() > > #7 0x0000018927d1 mojo::PipeControlMessageHandler::RunOrClosePipe() > > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > #12 0x0000016e50d3 base::MessageLoop::RunTask() > > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > #14 0x0000016e5874 base::MessageLoop::DoWork() > > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > > #16 0x000001701e80 base::RunLoop::Run() > > #17 0x000001724877 base::Thread::ThreadMain() > > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > #19 0x7f4bf1fb3184 start_thread > > #20 0x7f4bf07a3bed clone > > > > ... I think because OnPeerAssociatedEndpointClosed() has: > > > > if (!endpoint->peer_closed()) { > > if (endpoint->client()) > > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > > > Here the connection still has a listening client (presumably because > > ~InProcessGpuThread hasn't run); so an error is sent to GpuChildThread. > > When > > ChildThreadImpl::OnChannelError gets the error it shuts down the message > > loop. > > > > https://chromium.googlesource.com/chromium/src/base/+/ > > master/threading/thread.cc#198 > > has an early return when !message_loop_ = null. But this protection is not > > enough. In this bug, GpuChildThread "wins" the race and sets message_loop_ > > it to > > null just after that check. > > > > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, > quitting the message loop is necessary when out-of-process (e.g. if the > browser crashes, we want child processes to terminate). > But the problem is that when the child thread is in-process, this causes a > race. I think there's 2 options: > 1- fix the data race in base::Thread (e.g. adding an atomic flag) > 2- make ChildThreadImpl::OnChannelError not quit the message loop when > in-process (ChildThreadImpl::IsInBrowserProcess is true). > > 1. True, that could make base::Thread a bit more thread safe. I preferred to not add more locking in base::Thread because of the risk of introducing dead locks... Fixing the call site felt safer (no new logic was needed). 2. If the channel gets an error don't we still want ChildThreadImpl::OnChannelError to quit the message loop even if the thread is IsInBrowserProcess?
On 2017/07/07 21:40:46, hugoh_UTC2 wrote: > On 2017/07/07 18:14:52, chromium-reviews wrote: > > On Fri, Jul 7, 2017 at 6:50 AM, <mailto:hugoh@opera.com> wrote: > > > > > On 2017/07/06 22:34:29, chromium-reviews wrote: > > > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > Oh that wasn't very readable. Reformat: > > > > > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > > > InProcessGpuThread::Stop() > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > > > InProcessGpuThread::StopSoon() sleeps _before_ task_runner()->PostTask. > > > > > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is > > > > what leads to a race in base::Thread, which I think assumes it only exits > > > > the MessageLoop from StopSoon. > > > > > > > > > > Yes, I think so too! > > > > > > Some debugging showed me that ChildThreadImpl::OnChannelError() calls it: > > > > > > [25635:25658:in_process_gpu_thread.cc(182)] > > > InProcessGpuThread::Stop() pausing 5 seconds > > > message_loop_: 0x28224c01a000 > > > > > > [25635:25665:message_loop.cc(432)] > > > MessageLoop::PostTask posted from: > > > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > > > > > [25635:25665:child_thread_impl.cc(622)] > > > ChildThreadImpl::OnChannelError > > > > > > [25635:25665:message_loop.cc(235)] > > > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > > > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > > > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > #4 0x0000016e50d3 base::MessageLoop::RunTask() > > > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > #6 0x0000016e5874 base::MessageLoop::DoWork() > > > #7 0x0000016e78ea base::MessagePumpDefault::Run() > > > #8 0x000001701e80 base::RunLoop::Run() > > > #9 0x000001724877 base::Thread::ThreadMain() > > > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > #11 0x7f6c93cfb184 start_thread > > > #12 0x7f6c924ebbed clone > > > > > > [25635:25665:child_process.cc(81)] > > > ChildProcess::~ChildProcess() > > > > > > [25635:25665:gpu_child_thread.cc(195)] > > > GpuChildThread::~GpuChildThread() > > > > > > [25635:25658:in_process_gpu_thread.cc(220)] > > > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > > > > > > > * Why do we get an error on the IPC channel that GpuChildThread listens > > > to...? > > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > > > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > > > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > > > #4 0x0000018a705d IPC::(anonymous > > > namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError() > > > #5 0x0000018a5f22 IPC::(anonymous > > > namespace)::ChannelAssociatedGroupController:: > > > OnPeerAssociatedEndpointClosed() > > > #6 0x0000018a602a IPC::(anonymous > > > namespace)::ChannelAssociatedGroupController:: > > > OnPeerAssociatedEndpointClosed() > > > #7 0x0000018927d1 mojo::PipeControlMessageHandler::RunOrClosePipe() > > > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > > > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > > > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > > > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > #12 0x0000016e50d3 base::MessageLoop::RunTask() > > > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > #14 0x0000016e5874 base::MessageLoop::DoWork() > > > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > > > #16 0x000001701e80 base::RunLoop::Run() > > > #17 0x000001724877 base::Thread::ThreadMain() > > > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > #19 0x7f4bf1fb3184 start_thread > > > #20 0x7f4bf07a3bed clone > > > > > > ... I think because OnPeerAssociatedEndpointClosed() has: > > > > > > if (!endpoint->peer_closed()) { > > > if (endpoint->client()) > > > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > > > > > Here the connection still has a listening client (presumably because > > > ~InProcessGpuThread hasn't run); so an error is sent to GpuChildThread. > > > When > > > ChildThreadImpl::OnChannelError gets the error it shuts down the message > > > loop. > > > > > > https://chromium.googlesource.com/chromium/src/base/+/ > > > master/threading/thread.cc#198 > > > has an early return when !message_loop_ = null. But this protection is not > > > enough. In this bug, GpuChildThread "wins" the race and sets message_loop_ > > > it to > > > null just after that check. > > > > > > > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, > > quitting the message loop is necessary when out-of-process (e.g. if the > > browser crashes, we want child processes to terminate). > > But the problem is that when the child thread is in-process, this causes a > > race. I think there's 2 options: > > 1- fix the data race in base::Thread (e.g. adding an atomic flag) > > 2- make ChildThreadImpl::OnChannelError not quit the message loop when > > in-process (ChildThreadImpl::IsInBrowserProcess is true). > > > > > > 1. True, that could make base::Thread a bit more thread safe. I preferred to not > add more locking in base::Thread because of the risk of introducing dead > locks... Fixing the call site felt safer (no new logic was needed). > > 2. If the channel gets an error don't we still want > ChildThreadImpl::OnChannelError to quit the message loop even if the thread is > IsInBrowserProcess? I'm not sure why we would - we should never quit the message loop of a base::Thread (except via Stop()) because it causes the above data race.
The CQ bit was checked by mostynb@opera.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/07/07 21:51:25, piman wrote: > On 2017/07/07 21:40:46, hugoh_UTC2 wrote: > > On 2017/07/07 18:14:52, chromium-reviews wrote: > > > On Fri, Jul 7, 2017 at 6:50 AM, <mailto:hugoh@opera.com> wrote: > > > > > > > On 2017/07/06 22:34:29, chromium-reviews wrote: > > > > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > > > Oh that wasn't very readable. Reformat: > > > > > > > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > > > > InProcessGpuThread::Stop() > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > > > > InProcessGpuThread::StopSoon() sleeps _before_ > task_runner()->PostTask. > > > > > > > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > > > > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think this is > > > > > what leads to a race in base::Thread, which I think assumes it only > exits > > > > > the MessageLoop from StopSoon. > > > > > > > > > > > > > Yes, I think so too! > > > > > > > > Some debugging showed me that ChildThreadImpl::OnChannelError() calls it: > > > > > > > > [25635:25658:in_process_gpu_thread.cc(182)] > > > > InProcessGpuThread::Stop() pausing 5 seconds > > > > message_loop_: 0x28224c01a000 > > > > > > > > [25635:25665:message_loop.cc(432)] > > > > MessageLoop::PostTask posted from: > > > > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > > > > > > > [25635:25665:child_thread_impl.cc(622)] > > > > ChildThreadImpl::OnChannelError > > > > > > > > [25635:25665:message_loop.cc(235)] > > > > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > > > > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > > > > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > #4 0x0000016e50d3 base::MessageLoop::RunTask() > > > > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > #6 0x0000016e5874 base::MessageLoop::DoWork() > > > > #7 0x0000016e78ea base::MessagePumpDefault::Run() > > > > #8 0x000001701e80 base::RunLoop::Run() > > > > #9 0x000001724877 base::Thread::ThreadMain() > > > > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > #11 0x7f6c93cfb184 start_thread > > > > #12 0x7f6c924ebbed clone > > > > > > > > [25635:25665:child_process.cc(81)] > > > > ChildProcess::~ChildProcess() > > > > > > > > [25635:25665:gpu_child_thread.cc(195)] > > > > GpuChildThread::~GpuChildThread() > > > > > > > > [25635:25658:in_process_gpu_thread.cc(220)] > > > > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > > > > > > > > > > * Why do we get an error on the IPC channel that GpuChildThread listens > > > > to...? > > > > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > > > > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > > > > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > > > > #4 0x0000018a705d IPC::(anonymous > > > > namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError() > > > > #5 0x0000018a5f22 IPC::(anonymous > > > > namespace)::ChannelAssociatedGroupController:: > > > > OnPeerAssociatedEndpointClosed() > > > > #6 0x0000018a602a IPC::(anonymous > > > > namespace)::ChannelAssociatedGroupController:: > > > > OnPeerAssociatedEndpointClosed() > > > > #7 0x0000018927d1 mojo::PipeControlMessageHandler::RunOrClosePipe() > > > > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > > > > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > > > > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > > > > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > #12 0x0000016e50d3 base::MessageLoop::RunTask() > > > > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > #14 0x0000016e5874 base::MessageLoop::DoWork() > > > > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > > > > #16 0x000001701e80 base::RunLoop::Run() > > > > #17 0x000001724877 base::Thread::ThreadMain() > > > > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > #19 0x7f4bf1fb3184 start_thread > > > > #20 0x7f4bf07a3bed clone > > > > > > > > ... I think because OnPeerAssociatedEndpointClosed() has: > > > > > > > > if (!endpoint->peer_closed()) { > > > > if (endpoint->client()) > > > > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > > > > > > > Here the connection still has a listening client (presumably because > > > > ~InProcessGpuThread hasn't run); so an error is sent to GpuChildThread. > > > > When > > > > ChildThreadImpl::OnChannelError gets the error it shuts down the message > > > > loop. > > > > > > > > https://chromium.googlesource.com/chromium/src/base/+/ > > > > master/threading/thread.cc#198 > > > > has an early return when !message_loop_ = null. But this protection is not > > > > enough. In this bug, GpuChildThread "wins" the race and sets message_loop_ > > > > it to > > > > null just after that check. > > > > > > > > > > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, > > > quitting the message loop is necessary when out-of-process (e.g. if the > > > browser crashes, we want child processes to terminate). > > > But the problem is that when the child thread is in-process, this causes a > > > race. I think there's 2 options: > > > 1- fix the data race in base::Thread (e.g. adding an atomic flag) > > > 2- make ChildThreadImpl::OnChannelError not quit the message loop when > > > in-process (ChildThreadImpl::IsInBrowserProcess is true). > > > > > > > > > > 1. True, that could make base::Thread a bit more thread safe. I preferred to > not > > add more locking in base::Thread because of the risk of introducing dead > > locks... Fixing the call site felt safer (no new logic was needed). > > > > 2. If the channel gets an error don't we still want > > ChildThreadImpl::OnChannelError to quit the message loop even if the thread is > > IsInBrowserProcess? > > I'm not sure why we would - we should never quit the message loop of a > base::Thread (except via Stop()) because it causes the above data race. It feels like we could hide errors that shouldn't have happened in the first place? PS1, changing destruction order, does avoid the error. I see that RenderThreadImpl::OnChannelError() crashes upon channel errors in --single-process mode... Maybe we should do the same for other ChildThreadImpl as well? I do this in PS2. Try bots passed but it might be too bold... If an in-browser-process child thread's host (=IOThread?) crashes - that means the complete browser has crashed anyway?
On Mon, Jul 10, 2017 at 6:00 AM, <hugoh@opera.com> wrote: > On 2017/07/07 21:51:25, piman wrote: > > On 2017/07/07 21:40:46, hugoh_UTC2 wrote: > > > On 2017/07/07 18:14:52, chromium-reviews wrote: > > > > On Fri, Jul 7, 2017 at 6:50 AM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > On 2017/07/06 22:34:29, chromium-reviews wrote: > > > > > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > > > > > Oh that wasn't very readable. Reformat: > > > > > > > > > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > > > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > > > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > > > > > InProcessGpuThread::Stop() > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > > > > > InProcessGpuThread::StopSoon() sleeps _before_ > > task_runner()->PostTask. > > > > > > > > > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > > > > > > > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think > this > is > > > > > > what leads to a race in base::Thread, which I think assumes it > only > > exits > > > > > > the MessageLoop from StopSoon. > > > > > > > > > > > > > > > > Yes, I think so too! > > > > > > > > > > Some debugging showed me that ChildThreadImpl::OnChannelError() > calls > it: > > > > > > > > > > [25635:25658:in_process_gpu_thread.cc(182)] > > > > > InProcessGpuThread::Stop() pausing 5 seconds > > > > > message_loop_: 0x28224c01a000 > > > > > > > > > > [25635:25665:message_loop.cc(432)] > > > > > MessageLoop::PostTask posted from: > > > > > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > > > > > > > > > [25635:25665:child_thread_impl.cc(622)] > > > > > ChildThreadImpl::OnChannelError > > > > > > > > > > [25635:25665:message_loop.cc(235)] > > > > > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > > > > > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > > > > > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > > #4 0x0000016e50d3 base::MessageLoop::RunTask() > > > > > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > > #6 0x0000016e5874 base::MessageLoop::DoWork() > > > > > #7 0x0000016e78ea base::MessagePumpDefault::Run() > > > > > #8 0x000001701e80 base::RunLoop::Run() > > > > > #9 0x000001724877 base::Thread::ThreadMain() > > > > > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > > #11 0x7f6c93cfb184 start_thread > > > > > #12 0x7f6c924ebbed clone > > > > > > > > > > [25635:25665:child_process.cc(81)] > > > > > ChildProcess::~ChildProcess() > > > > > > > > > > [25635:25665:gpu_child_thread.cc(195)] > > > > > GpuChildThread::~GpuChildThread() > > > > > > > > > > [25635:25658:in_process_gpu_thread.cc(220)] > > > > > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > > > > > > > > > > > > > * Why do we get an error on the IPC channel that GpuChildThread > listens > > > > > to...? > > > > > > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > > > > > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > > > > > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > > > > > #4 0x0000018a705d IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupControll > er::NotifyEndpointOfError() > > > > > #5 0x0000018a5f22 IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupController:: > > > > > OnPeerAssociatedEndpointClosed() > > > > > #6 0x0000018a602a IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupController:: > > > > > OnPeerAssociatedEndpointClosed() > > > > > #7 0x0000018927d1 mojo::PipeControlMessageHandler:: > RunOrClosePipe() > > > > > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > > > > > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > > > > > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > > > > > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > > #12 0x0000016e50d3 base::MessageLoop::RunTask() > > > > > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > > #14 0x0000016e5874 base::MessageLoop::DoWork() > > > > > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > > > > > #16 0x000001701e80 base::RunLoop::Run() > > > > > #17 0x000001724877 base::Thread::ThreadMain() > > > > > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > > #19 0x7f4bf1fb3184 start_thread > > > > > #20 0x7f4bf07a3bed clone > > > > > > > > > > ... I think because OnPeerAssociatedEndpointClosed() has: > > > > > > > > > > if (!endpoint->peer_closed()) { > > > > > if (endpoint->client()) > > > > > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > > > > > > > > > Here the connection still has a listening client (presumably > because > > > > > ~InProcessGpuThread hasn't run); so an error is sent to > GpuChildThread. > > > > > When > > > > > ChildThreadImpl::OnChannelError gets the error it shuts down the > message > > > > > loop. > > > > > > > > > > https://chromium.googlesource.com/chromium/src/base/+/ > > > > > master/threading/thread.cc#198 > > > > > has an early return when !message_loop_ = null. But this > protection is > not > > > > > enough. In this bug, GpuChildThread "wins" the race and sets > message_loop_ > > > > > it to > > > > > null just after that check. > > > > > > > > > > > > > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, > > > > quitting the message loop is necessary when out-of-process (e.g. if > the > > > > browser crashes, we want child processes to terminate). > > > > But the problem is that when the child thread is in-process, this > causes a > > > > race. I think there's 2 options: > > > > 1- fix the data race in base::Thread (e.g. adding an atomic flag) > > > > 2- make ChildThreadImpl::OnChannelError not quit the message loop > when > > > > in-process (ChildThreadImpl::IsInBrowserProcess is true). > > > > > > > > > > > > > > 1. True, that could make base::Thread a bit more thread safe. I > preferred to > > not > > > add more locking in base::Thread because of the risk of introducing > dead > > > locks... Fixing the call site felt safer (no new logic was needed). > > > > > > 2. If the channel gets an error don't we still want > > > ChildThreadImpl::OnChannelError to quit the message loop even if the > thread > is > > > IsInBrowserProcess? > > > > I'm not sure why we would - we should never quit the message loop of a > > base::Thread (except via Stop()) because it causes the above data race. > > It feels like we could hide errors that shouldn't have happened in the > first > place? PS1, changing destruction order, does avoid the error. > > I see that RenderThreadImpl::OnChannelError() crashes upon channel errors > in > --single-process mode... Maybe we should do the same for other > ChildThreadImpl > as well? I do this in PS2. Try bots passed but it might be too bold... > > If an in-browser-process child thread's host (=IOThread?) crashes - that > means > the complete browser has crashed anyway? > > https://codereview.chromium.org/2973723003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Mon, Jul 10, 2017 at 6:00 AM, <hugoh@opera.com> wrote: > On 2017/07/07 21:51:25, piman wrote: > > On 2017/07/07 21:40:46, hugoh_UTC2 wrote: > > > On 2017/07/07 18:14:52, chromium-reviews wrote: > > > > On Fri, Jul 7, 2017 at 6:50 AM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > On 2017/07/06 22:34:29, chromium-reviews wrote: > > > > > > On Thu, Jul 6, 2017 at 2:29 PM, <mailto:hugoh@opera.com> wrote: > > > > > > > > > > > > > Oh that wasn't very readable. Reformat: > > > > > > > > > > > > > > [5119:5234 browser_child_process_host_impl.cc(207)] > > > > > > > BrowserChildProcessHostImpl deletes: 0x2495270dfc20 > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(52)] > > > > > > > InProcessGpuThread::~InProcessGpuThread() > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(123)] > > > > > > > InProcessGpuThread::Stop() > > > > > > > > > > > > > > [5119:5234 in_process_gpu_thread.cc(177)] > > > > > > > InProcessGpuThread::StopSoon() sleeps _before_ > > task_runner()->PostTask. > > > > > > > > > > > > > > [5119:5251 message_loop.cc(227)] MessageLoop::QuitWhenIdle() > > > > > > > > > > > > > > > > > > > Do you know what calls this MessageLoop::QuitWhenIdle()? I think > this > is > > > > > > what leads to a race in base::Thread, which I think assumes it > only > > exits > > > > > > the MessageLoop from StopSoon. > > > > > > > > > > > > > > > > Yes, I think so too! > > > > > > > > > > Some debugging showed me that ChildThreadImpl::OnChannelError() > calls > it: > > > > > > > > > > [25635:25658:in_process_gpu_thread.cc(182)] > > > > > InProcessGpuThread::Stop() pausing 5 seconds > > > > > message_loop_: 0x28224c01a000 > > > > > > > > > > [25635:25665:message_loop.cc(432)] > > > > > MessageLoop::PostTask posted from: > > > > > OnChannelError@../../ipc/ipc_channel_proxy.cc:160 * > > > > > > > > > > [25635:25665:child_thread_impl.cc(622)] > > > > > ChildThreadImpl::OnChannelError > > > > > > > > > > [25635:25665:message_loop.cc(235)] > > > > > MessageLoop::QuitWhenIdle() this: 0x28224c01a000 > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > > #1 0x0000016e46e9 base::MessageLoop::QuitWhenIdle() > > > > > #2 0x00000260f4a5 content::ChildThreadImpl::OnChannelError() > > > > > #3 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > > #4 0x0000016e50d3 base::MessageLoop::RunTask() > > > > > #5 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > > #6 0x0000016e5874 base::MessageLoop::DoWork() > > > > > #7 0x0000016e78ea base::MessagePumpDefault::Run() > > > > > #8 0x000001701e80 base::RunLoop::Run() > > > > > #9 0x000001724877 base::Thread::ThreadMain() > > > > > #10 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > > #11 0x7f6c93cfb184 start_thread > > > > > #12 0x7f6c924ebbed clone > > > > > > > > > > [25635:25665:child_process.cc(81)] > > > > > ChildProcess::~ChildProcess() > > > > > > > > > > [25635:25665:gpu_child_thread.cc(195)] > > > > > GpuChildThread::~GpuChildThread() > > > > > > > > > > [25635:25658:in_process_gpu_thread.cc(220)] > > > > > InProcessGpuThread::StopSoon() message_loop_ = 0 > > > > > > > > > > > > > > > * Why do we get an error on the IPC channel that GpuChildThread > listens > > > > > to...? > > > > > > > > > > #0 0x0000016c9907 base::debug::StackTrace::StackTrace() > > > > > #1 0x00000189f80b IPC::ChannelProxy::Context::OnChannelError() > > > > > #2 0x0000018a219b IPC::internal::MessagePipeReader::OnPipeError() > > > > > #3 0x00000188bc2c mojo::InterfaceEndpointClient::NotifyError() > > > > > #4 0x0000018a705d IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupControll > er::NotifyEndpointOfError() > > > > > #5 0x0000018a5f22 IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupController:: > > > > > OnPeerAssociatedEndpointClosed() > > > > > #6 0x0000018a602a IPC::(anonymous > > > > > namespace)::ChannelAssociatedGroupController:: > > > > > OnPeerAssociatedEndpointClosed() > > > > > #7 0x0000018927d1 mojo::PipeControlMessageHandler:: > RunOrClosePipe() > > > > > #8 0x000001889e86 mojo::Connector::ReadSingleMessage() > > > > > #9 0x00000188a512 mojo::Connector::ReadAllAvailableMessages() > > > > > #10 0x000001899674 mojo::SimpleWatcher::OnHandleReady() > > > > > #11 0x0000016c9e4b base::debug::TaskAnnotator::RunTask() > > > > > #12 0x0000016e50d3 base::MessageLoop::RunTask() > > > > > #13 0x0000016e5592 base::MessageLoop::DeferOrRunPendingTask() > > > > > #14 0x0000016e5874 base::MessageLoop::DoWork() > > > > > #15 0x0000016e8439 base::MessagePumpLibevent::Run() > > > > > #16 0x000001701e80 base::RunLoop::Run() > > > > > #17 0x000001724877 base::Thread::ThreadMain() > > > > > #18 0x00000171f093 base::(anonymous namespace)::ThreadFunc() > > > > > #19 0x7f4bf1fb3184 start_thread > > > > > #20 0x7f4bf07a3bed clone > > > > > > > > > > ... I think because OnPeerAssociatedEndpointClosed() has: > > > > > > > > > > if (!endpoint->peer_closed()) { > > > > > if (endpoint->client()) > > > > > NotifyEndpointOfError(endpoint.get(), false /* force_async */); > > > > > > > > > > Here the connection still has a listening client (presumably > because > > > > > ~InProcessGpuThread hasn't run); so an error is sent to > GpuChildThread. > > > > > When > > > > > ChildThreadImpl::OnChannelError gets the error it shuts down the > message > > > > > loop. > > > > > > > > > > https://chromium.googlesource.com/chromium/src/base/+/ > > > > > master/threading/thread.cc#198 > > > > > has an early return when !message_loop_ = null. But this > protection is > not > > > > > enough. In this bug, GpuChildThread "wins" the race and sets > message_loop_ > > > > > it to > > > > > null just after that check. > > > > > > > > > > > > > Ok, thanks a lot. OnChannelError makes sense and is desirable. Also, > > > > quitting the message loop is necessary when out-of-process (e.g. if > the > > > > browser crashes, we want child processes to terminate). > > > > But the problem is that when the child thread is in-process, this > causes a > > > > race. I think there's 2 options: > > > > 1- fix the data race in base::Thread (e.g. adding an atomic flag) > > > > 2- make ChildThreadImpl::OnChannelError not quit the message loop > when > > > > in-process (ChildThreadImpl::IsInBrowserProcess is true). > > > > > > > > > > > > > > 1. True, that could make base::Thread a bit more thread safe. I > preferred to > > not > > > add more locking in base::Thread because of the risk of introducing > dead > > > locks... Fixing the call site felt safer (no new logic was needed). > > > > > > 2. If the channel gets an error don't we still want > > > ChildThreadImpl::OnChannelError to quit the message loop even if the > thread > is > > > IsInBrowserProcess? > > > > I'm not sure why we would - we should never quit the message loop of a > > base::Thread (except via Stop()) because it causes the above data race. > > It feels like we could hide errors that shouldn't have happened in the > first > place? > I'm not sure what error you're talking about that we'd be hiding - if we quit the thread's loop we don't warn or DCHECK or anything per se, we just introduce a data race. It's invalid to do and it should absolutely be avoided. On the contrary, if anything, I'll argue that your change hides the error, by hoping that the thread never gets a chance to detect the channel error and run the incorrect code in this particular case. Terminating the thread before the channel could cause some tasks to fail executing. I think it's an undesirable pattern - at least it means the test behavior differs from the production behavior. > PS1, changing destruction order, does avoid the error. > Relying on destruction order is fragile, doubly so when in this case there no obvious relationship between the 2 objects. I would ask for checks (or better, tests) to make sure this doesn't regress. I see that RenderThreadImpl::OnChannelError() crashes upon channel errors in > --single-process mode... Maybe we should do the same for other > ChildThreadImpl > as well? I do this in PS2. Try bots passed but it might be too bold... > > If an in-browser-process child thread's host (=IOThread?) crashes - that > means > the complete browser has crashed anyway? > OnChannelError doesn't only indicate that the host process has crashed, it indicates that the host connection was terminated, which can happen for many reasons (including, in this case, destroying an object). > > https://codereview.chromium.org/2973723003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Patchset #3 (id:40001) has been deleted
Description was changed from ========== Avoid race at shutdown between browser's IO thread and in-process GPU thread Background: When the GPU thread is in-process it runs inside the browser process. Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. This means that for some timings, when InProcessGpuThread::StopSoon(), running on the IOThread, asks for task_runner() the GPU thread has already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: Destruct InProcessGpuThread before its host process, i.e. before its BrowserChildProcessHostImpl. This ensures that the IOThread still can reach a non-null task_runner() when it runs InProcessGpuThread::StopSoon(). BUG=258935 ========== to ========== Avoid race at shutdown between browser's IO thread and in-process GPU thread Background: - When the GPU thread is in-process it runs inside the browser process. - GpuChildThread is a ChildThreadImpl and a IPC::Listener. - When GpuChildThread sees an error (or shutdown) of its IPC channel, it quits its message loop (= stops the thread). Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. This means that for some timings, when InProcessGpuThread::StopSoon(), running on the IOThread, asks for task_runner() the GPU thread has already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: When in --single-process, only let Thread::Stop() stop the thread's message loop. BUG=258935 ==========
Patchset #3 (id:60001) has been deleted
Patchset #3 (id:80001) has been deleted
Ok, thanks for clarifying. PTAL at PS3.
lgtm
Description was changed from ========== Avoid race at shutdown between browser's IO thread and in-process GPU thread Background: - When the GPU thread is in-process it runs inside the browser process. - GpuChildThread is a ChildThreadImpl and a IPC::Listener. - When GpuChildThread sees an error (or shutdown) of its IPC channel, it quits its message loop (= stops the thread). Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. This means that for some timings, when InProcessGpuThread::StopSoon(), running on the IOThread, asks for task_runner() the GPU thread has already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: When in --single-process, only let Thread::Stop() stop the thread's message loop. BUG=258935 ========== to ========== Avoid race at shutdown between browser process' IO thread and GPU thread Background: - When the GPU thread is *in-process* it runs inside the browser process. - GpuChildThread is a ChildThreadImpl and a IPC::Listener. - When GpuChildThread sees an error (or shutdown) of its IPC channel, it quits its message loop (= stops the thread). Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. When InProcessGpuThread::StopSoon asks for task_runner() the in-process GPU thread has - sometimes - already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: If a thread runs in the browser process, only Thread::Stop should stop its message loop. Otherwise, QuitWhenIdle could race Thread::Stop. BUG=258935 ==========
The CQ bit was checked by hugoh@opera.com
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by hugoh@opera.com
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 100001, "attempt_start_ts": 1499844489319030, "parent_rev": "a4f5fdaff26071bf65dfe22683b6826942376c64", "commit_rev": "f1c501fa3022cc7de41048cce085a3e0ade8d225"}
Message was sent while issue was closed.
Description was changed from ========== Avoid race at shutdown between browser process' IO thread and GPU thread Background: - When the GPU thread is *in-process* it runs inside the browser process. - GpuChildThread is a ChildThreadImpl and a IPC::Listener. - When GpuChildThread sees an error (or shutdown) of its IPC channel, it quits its message loop (= stops the thread). Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. When InProcessGpuThread::StopSoon asks for task_runner() the in-process GPU thread has - sometimes - already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: If a thread runs in the browser process, only Thread::Stop should stop its message loop. Otherwise, QuitWhenIdle could race Thread::Stop. BUG=258935 ========== to ========== Avoid race at shutdown between browser process' IO thread and GPU thread Background: - When the GPU thread is *in-process* it runs inside the browser process. - GpuChildThread is a ChildThreadImpl and a IPC::Listener. - When GpuChildThread sees an error (or shutdown) of its IPC channel, it quits its message loop (= stops the thread). Problem: The browser process' IOThread calls InProcessGpuThread::StopSoon() while the in-process GPU thread destructs itself. When InProcessGpuThread::StopSoon asks for task_runner() the in-process GPU thread has - sometimes - already destructed itself and its MessageQueue. The IOThread then sees task_runner() == null and crashes: [FATAL:ref_counted.h(484)] Assert failed: ptr_ != nullptr. 0 base::debug::StackTrace::StackTrace() 1 base::debug::StackTrace::StackTrace() 2 logging::LogMessage::~LogMessage() 3 base::Thread::StopSoon() 4 base::Thread::Stop() 5 content::InProcessGpuThread::~InProcessGpuThread() 6 content::InProcessGpuThread::~InProcessGpuThread() 7 content::GpuProcessHost::~GpuProcessHost() 8 content::GpuProcessHost::~GpuProcessHost() 9 content::BrowserChildProcessHostImpl::TerminateAll() 10 content::BrowserProcessSubThread::IOThreadPreCleanUp() 11 content::BrowserProcessSubThread::CleanUp() 12 base::Thread::ThreadMain() 13 base::(anonymous namespace)::ThreadFunc() Solution: If a thread runs in the browser process, only Thread::Stop should stop its message loop. Otherwise, QuitWhenIdle could race Thread::Stop. BUG=258935 Review-Url: https://codereview.chromium.org/2973723003 Cr-Commit-Position: refs/heads/master@{#485895} Committed: https://chromium.googlesource.com/chromium/src/+/f1c501fa3022cc7de41048cce085... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:100001) as https://chromium.googlesource.com/chromium/src/+/f1c501fa3022cc7de41048cce085... |