|
|
Created:
6 years, 3 months ago by haraken Modified:
6 years, 1 month ago Reviewers:
jamesr, jochen (gone - plz use gerrit), jar (doing other things), Mads Ager (chromium), Ken Russell (switch to Gerrit), tkent CC:
chromium-reviews, mkwst+moarreviews-renderer_chromium.org, jam, erikwright+watch_chromium.org, sadrul, darin-cc_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
DescriptionPending tasks in a message loop should be deleted before shutting down Blink
Currently Blink is shut down before all the pending tasks in the message loop are deleted. This is problematic in Oilpan because a destructor of the pending tasks can touch Oilpan objects. Because Oilpan is already detached from the renderer thread at that point, touching Oilpan objects in the destructor leads to a crash. (See the bug report for a concrete scenario.)
To prevent Blink objects from getting accessed after Blink is shut down, this CL deletes all pending tasks in a message loop before shutting down Blink.
BUG=411026
TEST=None. I cannot reproduce the crash.
Committed: https://crrev.com/fdd5612c20f777e1279efd7c1e99d82ed04afaaf
Cr-Commit-Position: refs/heads/master@{#296697}
Committed: https://crrev.com/16d32a9f7f6d1ebb639cacedb5156272a9fec764
Cr-Commit-Position: refs/heads/master@{#297338}
Committed: https://crrev.com/53f081de05b86f73eca4e383a16c8dc723b78a99
Cr-Commit-Position: refs/heads/master@{#303557}
Patch Set 1 #
Total comments: 3
Patch Set 2 : #Patch Set 3 : #Patch Set 4 : #
Total comments: 1
Patch Set 5 : #Patch Set 6 : #Patch Set 7 : #Patch Set 8 : #
Total comments: 2
Patch Set 9 : #Patch Set 10 : #
Total comments: 1
Patch Set 11 : #Patch Set 12 : #Patch Set 13 : #
Messages
Total messages: 64 (8 generated)
haraken@chromium.org changed reviewers: + ager@chromium.org, jamesr@chromium.org, jochen@chromium.org, tkent@chromium.org
PTAL
LGTM Oilpan exposes this issue but it seems bad in general that task destructors can touch Blink objects after Blink has been shut down.
jochen@: would you take a look at this when you have a cycle?
i'm not sure this is the right approach. which tasks are touching oilpan objects? Can we fix the tasks instead. You will want to have some message loop expert on this CL, like jar@ for exmaple
On 2014/09/22 at 06:44:45, jochen wrote: > i'm not sure this is the right approach. > > which tasks are touching oilpan objects? Can we fix the tasks instead. My two cents: I don't think the tasks need fixing. A Blink task that operates on Blink objects should be allowed to operate on those objects in its destructor. Consider a task that holds a RefPtr/Persistent to a Blink object to perform some cleanup task before letting that object die. The destructor of the task should then deref the object (or delete the persistent). I think that is totally reasonable and I think it would be unnatural to have a restriction saying that you cannot touch Blink objects in Blink task's destructors? So, if we buy that, I think some change along these lines is the right thing to do. You should not touch Blink objects after you have asked Blink to shutdown. Therefore, the tasks should be destructed before Blink is shut down. > You will want to have some message loop expert on this CL, like jar@ for exmaple
> i'm not sure this is the right approach. > > which tasks are touching oilpan objects? Can we fix the tasks instead. Here is a stack trace of the crash: #1 0x7fcd38fa9b93 in blink::WebGeolocationPermissionRequestManager::reset() src/third_party/WebKit/Source/web/WebGeolocationPermissionRequestManager.cpp:38:7 #2 0x7fcd3794b00f in content::GeolocationDispatcher::~GeolocationDispatcher() src/third_party/WebKit/public/web/WebGeolocationPermissionRequestManager.h:46:49 #3 0x7fcd376f7e73 in content::RenderFrameImpl::~RenderFrameImpl() src/content/renderer/render_frame_impl.cc:478:3 #4 0x7fcd376f86aa in content::RenderFrameImpl::~RenderFrameImpl() src/content/renderer/render_frame_impl.cc:476:37 #5 0x7fcd37761138 in content::RenderViewImpl::~RenderViewImpl() src/base/memory/scoped_ptr.h:137:5 #6 0x7fcd37761a11 in content::RenderViewImpl::~RenderViewImpl() src/content/renderer/render_view_impl.cc:839:35 #7 0x7fcd377b9fc6 in base::internal::BindState<base::internal::RunnableAdapter<void (content::RenderWidget::*)()>, void (content::RenderWidget*), void (content::RenderWidget*)>::~BindState() src/base/memory/ref_counted.h:131:7 #8 0x7fcd2de3917b in base::PendingTask::~PendingTask() src/base/pending_task.cc:34:1 #9 0x7fcd2de17a16 in base::MessageLoop::DeletePendingTasks() src/third_party/libc++/trunk/include/memory:1700:58 #10 0x7fcd2de166c7 in base::MessageLoop::~MessageLoop() src/base/message_loop/message_loop.cc:174:5 #11 0x7fcd377bb079 in content::RendererMain(content::MainFunctionParams const&) src/content/renderer/renderer_main.cc:233:3 I'm not sure if the CL is a right fix or not, but at least it looks strange that the current implementation touches Blink objects after asking Blink to shut down (it doesn't cause any issue in non-oilpan by accident).
haraken@chromium.org changed reviewers: + jar@chromium.org
+jar (but it looks like jar@ is OOO for a while)
Can we just quit the message loop completely before deinitializing Blink? What tasks in the process *do* need to run after blink it shut down?
https://codereview.chromium.org/583043005/diff/1/base/message_loop/message_lo... File base/message_loop/message_loop.h (right): https://codereview.chromium.org/583043005/diff/1/base/message_loop/message_lo... base/message_loop/message_loop.h:377: bool DeletePendingTasks(); this doesn't make sense. ~MessageLoop() does the same thing but better https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... File content/renderer/renderer_main.cc (right): https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... content/renderer/renderer_main.cc:238: main_message_loop.DeletePendingTasks(); since you're deleting *all* pending tasks and not just Blink ones then you definitely do not care about anything else running, so not lgtm. You should simply shut down the loop before shutting down blink
Thanks James for review! https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... File content/renderer/renderer_main.cc (right): https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... content/renderer/renderer_main.cc:238: main_message_loop.DeletePendingTasks(); On 2014/09/22 14:38:04, jamesr wrote: > since you're deleting *all* pending tasks and not just Blink ones then you > definitely do not care about anything else running, so not lgtm. You should > simply shut down the loop before shutting down blink I'm not familiar with a message loop, but what's a common way to completely shut down the main_message_loop here? At first I tried something like: scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); // line 155 { ...; main_message_loop.reset(); // line 238 } but it crashed chrome :-/
On 2014/09/22 15:54:18, haraken wrote: > Thanks James for review! > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > File content/renderer/renderer_main.cc (right): > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > content/renderer/renderer_main.cc:238: main_message_loop.DeletePendingTasks(); > On 2014/09/22 14:38:04, jamesr wrote: > > since you're deleting *all* pending tasks and not just Blink ones then you > > definitely do not care about anything else running, so not lgtm. You should > > simply shut down the loop before shutting down blink > > I'm not familiar with a message loop, but what's a common way to completely shut > down the main_message_loop here? > > At first I tried something like: > > scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); // > line 155 > { > ...; > main_message_loop.reset(); // line 238 > } > > but it crashed chrome :-/ What is the crash stack? If I run code like this standalone it works fine. If somebody is trying to access the message loop after you are destroying it here to post a task then this patch won't fix the bug anyway, since this newly posted task will still be destroyed after blink is shut down. You'll need to figure out who is accessing the message loop when and why.
On 2014/09/22 16:40:33, jamesr wrote: > On 2014/09/22 15:54:18, haraken wrote: > > Thanks James for review! > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > File content/renderer/renderer_main.cc (right): > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > content/renderer/renderer_main.cc:238: main_message_loop.DeletePendingTasks(); > > On 2014/09/22 14:38:04, jamesr wrote: > > > since you're deleting *all* pending tasks and not just Blink ones then you > > > definitely do not care about anything else running, so not lgtm. You should > > > simply shut down the loop before shutting down blink > > > > I'm not familiar with a message loop, but what's a common way to completely > shut > > down the main_message_loop here? > > > > At first I tried something like: > > > > scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); // > > line 155 > > { > > ...; > > main_message_loop.reset(); // line 238 > > } > > > > but it crashed chrome :-/ > > > What is the crash stack? If I run code like this standalone it works fine. > > If somebody is trying to access the message loop after you are destroying it > here to post a task then this patch won't fix the bug anyway, since this newly > posted task will still be destroyed after blink is shut down. You'll need to > figure out who is accessing the message loop when and why. I uploaded the change to a patch set 2. This crashes with the following call stack. The problem is that RenderProcessImpl's destructor posts a task to the message loop and thus the message loop needs to be kept alive in the RenderProcessImpl's destructor. Thus we cannot destruct the message loop before destructing the RenderProcessImpl's destructor. That being said, this is a weird situation. Probably should we restructure the code so that the RenderProcessImpl's destructor doesn't post any task to the message loop? Any advice is welcome :) SUMMARY: AddressSanitizer: SEGV ??:0 ?? ==9==ABORTING [36:36:0924/101619:FATAL:thread_task_runner_handle.cc(23)] Check failed: current. #0 0x7f2961f1ea5f __interceptor_backtrace #1 0x7f296bc49384 base::debug::StackTrace::StackTrace() #2 0x7f296c09c6f2 logging::LogMessage::~LogMessage() #3 0x7f296c70bfab base::ThreadTaskRunnerHandle::Get() #4 0x7f296c82a847 base::Timer::PostNewScheduledTask() #5 0x7f296c829702 base::Timer::Reset() #6 0x7f296c828b1a base::Timer::Start() #7 0x7f29a423b0ab base::BaseTimerMethodPointer<>::Start() #8 0x7f29a421d1b6 content::BlinkPlatformImpl::setSharedTimerFireInterval() #9 0x7f297cc234df blink::Scheduler::setSharedTimerFireInterval() #10 0x7f297cc3120f blink::setSharedTimerFireInterval() #11 0x7f297cbdbdd4 blink::MainThreadSharedTimer::setFireInterval() #12 0x7f297cbd7e1a blink::ThreadTimers::updateSharedTimer() #13 0x7f297cb1e668 blink::TimerBase::setNextFireTime() #14 0x7f297cb1dbac blink::TimerBase::start() #15 0x7f297c8215ca blink::TimerBase::startOneShot() #16 0x7f2981e09984 blink::FrameLoader::scheduleCheckCompleted() #17 0x7f2981e08bf3 blink::FrameLoader::setDefersLoading() #18 0x7f298201e100 blink::Page::setDefersLoading() #19 0x7f29820ba8ea blink::ScopedPageLoadDeferrer::detach() #20 0x7f29820bacbf blink::ScopedPageLoadDeferrer::~ScopedPageLoadDeferrer() #21 0x7f297e495263 blink::WebView::didExitModalLoop() #22 0x7f29a4cd7661 content::RenderThreadImpl::Shutdown() #23 0x7f29a4cd7f51 content::RenderThreadImpl::Shutdown() #24 0x7f29a45cf571 content::ChildProcess::~ChildProcess() #25 0x7f29a4cb34ca content::RenderProcess::~RenderProcess() #26 0x7f29a4cb2ab4 content::RenderProcessImpl::~RenderProcessImpl() #27 0x7f29a4ff8105 content::RendererMain() #28 0x7f296baafb93 content::RunZygote() #29 0x7f296bab14ff content::RunNamedProcessTypeMain() #30 0x7f296bac16e0 content::ContentMainRunnerImpl::Run() #31 0x7f296baac995 content::ContentMain() #32 0x7f2961f7ceba ChromeMain #33 0x7f2961f7cabe main #34 0x7f29557ba76d __libc_start_main #35 0x7f2961f7c7ad <unknown>
On 2014/09/24 01:24:04, haraken wrote: > On 2014/09/22 16:40:33, jamesr wrote: > > On 2014/09/22 15:54:18, haraken wrote: > > > Thanks James for review! > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > File content/renderer/renderer_main.cc (right): > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > content/renderer/renderer_main.cc:238: > main_message_loop.DeletePendingTasks(); > > > On 2014/09/22 14:38:04, jamesr wrote: > > > > since you're deleting *all* pending tasks and not just Blink ones then you > > > > definitely do not care about anything else running, so not lgtm. You > should > > > > simply shut down the loop before shutting down blink > > > > > > I'm not familiar with a message loop, but what's a common way to completely > > shut > > > down the main_message_loop here? > > > > > > At first I tried something like: > > > > > > scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); > // > > > line 155 > > > { > > > ...; > > > main_message_loop.reset(); // line 238 > > > } > > > > > > but it crashed chrome :-/ > > > > > > What is the crash stack? If I run code like this standalone it works fine. > > > > If somebody is trying to access the message loop after you are destroying it > > here to post a task then this patch won't fix the bug anyway, since this newly > > posted task will still be destroyed after blink is shut down. You'll need to > > figure out who is accessing the message loop when and why. > > I uploaded the change to a patch set 2. This crashes with the following call > stack. > > The problem is that RenderProcessImpl's destructor posts a task to the message > loop and thus the message loop needs to be kept alive in the RenderProcessImpl's > destructor. Thus we cannot destruct the message loop before destructing the > RenderProcessImpl's destructor. > > That being said, this is a weird situation. Probably should we restructure the > code so that the RenderProcessImpl's destructor doesn't post any task to the > message loop? Any advice is welcome :) At a first glance, it seems not that easy to make the RenderProcessImpl's destructor not post any task to the message loop. Probably what we can do is: 1) Flush all pending tasks (which may touch Blink objects) in the message loop. 2) Destruct the RenderProcessImpl's destructor. 3) Allow the RenderProcessImpl's destructor to post a task to the message loop but the task shouldn't touch Blink objects of course. 4) Destruct the message loop. This is something the patch set 1 is doing. I'd like to hear your thoughts.
On 2014/09/24 02:09:40, haraken wrote: > On 2014/09/24 01:24:04, haraken wrote: > > On 2014/09/22 16:40:33, jamesr wrote: > > > On 2014/09/22 15:54:18, haraken wrote: > > > > Thanks James for review! > > > > > > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > > File content/renderer/renderer_main.cc (right): > > > > > > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > > content/renderer/renderer_main.cc:238: > > main_message_loop.DeletePendingTasks(); > > > > On 2014/09/22 14:38:04, jamesr wrote: > > > > > since you're deleting *all* pending tasks and not just Blink ones then > you > > > > > definitely do not care about anything else running, so not lgtm. You > > should > > > > > simply shut down the loop before shutting down blink > > > > > > > > I'm not familiar with a message loop, but what's a common way to > completely > > > shut > > > > down the main_message_loop here? > > > > > > > > At first I tried something like: > > > > > > > > scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); > > // > > > > line 155 > > > > { > > > > ...; > > > > main_message_loop.reset(); // line 238 > > > > } > > > > > > > > but it crashed chrome :-/ > > > > > > > > > What is the crash stack? If I run code like this standalone it works fine. > > > > > > If somebody is trying to access the message loop after you are destroying it > > > here to post a task then this patch won't fix the bug anyway, since this > newly > > > posted task will still be destroyed after blink is shut down. You'll need > to > > > figure out who is accessing the message loop when and why. > > > > I uploaded the change to a patch set 2. This crashes with the following call > > stack. > > > > The problem is that RenderProcessImpl's destructor posts a task to the message > > loop and thus the message loop needs to be kept alive in the > RenderProcessImpl's > > destructor. Thus we cannot destruct the message loop before destructing the > > RenderProcessImpl's destructor. > > > > That being said, this is a weird situation. Probably should we restructure the > > code so that the RenderProcessImpl's destructor doesn't post any task to the > > message loop? Any advice is welcome :) > > At a first glance, it seems not that easy to make the RenderProcessImpl's > destructor not post any task to the message loop. Probably what we can do is: > > 1) Flush all pending tasks (which may touch Blink objects) in the message loop. > 2) Destruct the RenderProcessImpl's destructor. > 3) Allow the RenderProcessImpl's destructor to post a task to the message loop > but the task shouldn't touch Blink objects of course. > 4) Destruct the message loop. > > This is something the patch set 1 is doing. > > I'd like to hear your thoughts. The stack you posted above is a task that *is* touching blink objects, so this proposal is definitely not going to work.
The blink shutdown call is here: https://code.google.com/p/chromium/codesearch#chromium/src/content/renderer/r... which is after the code you posted that posts the task: https://code.google.com/p/chromium/codesearch#chromium/src/content/renderer/r... why are you trying to shut it down earlier?
On 2014/09/24 02:47:09, jamesr wrote: > On 2014/09/24 02:09:40, haraken wrote: > > On 2014/09/24 01:24:04, haraken wrote: > > > On 2014/09/22 16:40:33, jamesr wrote: > > > > On 2014/09/22 15:54:18, haraken wrote: > > > > > Thanks James for review! > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > > > File content/renderer/renderer_main.cc (right): > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > > > content/renderer/renderer_main.cc:238: > > > main_message_loop.DeletePendingTasks(); > > > > > On 2014/09/22 14:38:04, jamesr wrote: > > > > > > since you're deleting *all* pending tasks and not just Blink ones then > > you > > > > > > definitely do not care about anything else running, so not lgtm. You > > > should > > > > > > simply shut down the loop before shutting down blink > > > > > > > > > > I'm not familiar with a message loop, but what's a common way to > > completely > > > > shut > > > > > down the main_message_loop here? > > > > > > > > > > At first I tried something like: > > > > > > > > > > scoped_ptr<base::MessageLoop> main_message_loop(new > base::MessageLoop()); > > > // > > > > > line 155 > > > > > { > > > > > ...; > > > > > main_message_loop.reset(); // line 238 > > > > > } > > > > > > > > > > but it crashed chrome :-/ > > > > > > > > > > > > What is the crash stack? If I run code like this standalone it works > fine. > > > > > > > > If somebody is trying to access the message loop after you are destroying > it > > > > here to post a task then this patch won't fix the bug anyway, since this > > newly > > > > posted task will still be destroyed after blink is shut down. You'll need > > to > > > > figure out who is accessing the message loop when and why. > > > > > > I uploaded the change to a patch set 2. This crashes with the following call > > > stack. > > > > > > The problem is that RenderProcessImpl's destructor posts a task to the > message > > > loop and thus the message loop needs to be kept alive in the > > RenderProcessImpl's > > > destructor. Thus we cannot destruct the message loop before destructing the > > > RenderProcessImpl's destructor. > > > > > > That being said, this is a weird situation. Probably should we restructure > the > > > code so that the RenderProcessImpl's destructor doesn't post any task to the > > > message loop? Any advice is welcome :) > > > > At a first glance, it seems not that easy to make the RenderProcessImpl's > > destructor not post any task to the message loop. Probably what we can do is: > > > > 1) Flush all pending tasks (which may touch Blink objects) in the message > loop. > > 2) Destruct the RenderProcessImpl's destructor. > > 3) Allow the RenderProcessImpl's destructor to post a task to the message loop > > but the task shouldn't touch Blink objects of course. > > 4) Destruct the message loop. > > > > This is something the patch set 1 is doing. > > > > I'd like to hear your thoughts. > > The stack you posted above is a task that *is* touching blink objects, so this > proposal is definitely not going to work. It will work in practice because the task is not touching *Oilpaned* Blink objects (the problem happens only if Oilpaned Blink objects are touched after the main thread is detached from Oilpan's GC), but I agree that's a fragile assumption. Do you think that removing all the tasks from the RenderProcessImpl's destructor is a right way to go?
On 2014/09/24 02:51:16, haraken wrote: > It will work in practice because the task is not touching *Oilpaned* Blink > objects (the problem happens only if Oilpaned Blink objects are touched after > the main thread is detached from Oilpan's GC), but I agree that's a fragile > assumption. > > Do you think that removing all the tasks from the RenderProcessImpl's destructor > is a right way to go? Does oilpan blow up before the blink::shutdown() call? I don't understand what you are trying to do here. It should be valid to call in to blink at any time before the blink::shutdown() call, which is much later than the code you're patching here, and not safe to call into blink at all after blink::shutdown().
On 2014/09/24 01:24:04, haraken wrote: > On 2014/09/22 16:40:33, jamesr wrote: > > On 2014/09/22 15:54:18, haraken wrote: > > > Thanks James for review! > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > File content/renderer/renderer_main.cc (right): > > > > > > > > > https://codereview.chromium.org/583043005/diff/1/content/renderer/renderer_ma... > > > content/renderer/renderer_main.cc:238: > main_message_loop.DeletePendingTasks(); > > > On 2014/09/22 14:38:04, jamesr wrote: > > > > since you're deleting *all* pending tasks and not just Blink ones then you > > > > definitely do not care about anything else running, so not lgtm. You > should > > > > simply shut down the loop before shutting down blink > > > > > > I'm not familiar with a message loop, but what's a common way to completely > > shut > > > down the main_message_loop here? > > > > > > At first I tried something like: > > > > > > scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); > // > > > line 155 > > > { > > > ...; > > > main_message_loop.reset(); // line 238 > > > } > > > > > > but it crashed chrome :-/ > > > > > > What is the crash stack? If I run code like this standalone it works fine. > > > > If somebody is trying to access the message loop after you are destroying it > > here to post a task then this patch won't fix the bug anyway, since this newly > > posted task will still be destroyed after blink is shut down. You'll need to > > figure out who is accessing the message loop when and why. > > I uploaded the change to a patch set 2. This crashes with the following call > stack. > > The problem is that RenderProcessImpl's destructor posts a task to the message > loop and thus the message loop needs to be kept alive in the RenderProcessImpl's > destructor. Thus we cannot destruct the message loop before destructing the > RenderProcessImpl's destructor. > > That being said, this is a weird situation. Probably should we restructure the > code so that the RenderProcessImpl's destructor doesn't post any task to the > message loop? Any advice is welcome :) > > > SUMMARY: AddressSanitizer: SEGV ??:0 ?? > ==9==ABORTING > [36:36:0924/101619:FATAL:thread_task_runner_handle.cc(23)] Check failed: > current. > #0 0x7f2961f1ea5f __interceptor_backtrace > #1 0x7f296bc49384 base::debug::StackTrace::StackTrace() > #2 0x7f296c09c6f2 logging::LogMessage::~LogMessage() > #3 0x7f296c70bfab base::ThreadTaskRunnerHandle::Get() > #4 0x7f296c82a847 base::Timer::PostNewScheduledTask() > #5 0x7f296c829702 base::Timer::Reset() > #6 0x7f296c828b1a base::Timer::Start() > #7 0x7f29a423b0ab base::BaseTimerMethodPointer<>::Start() > #8 0x7f29a421d1b6 content::BlinkPlatformImpl::setSharedTimerFireInterval() > #9 0x7f297cc234df blink::Scheduler::setSharedTimerFireInterval() > #10 0x7f297cc3120f blink::setSharedTimerFireInterval() > #11 0x7f297cbdbdd4 blink::MainThreadSharedTimer::setFireInterval() > #12 0x7f297cbd7e1a blink::ThreadTimers::updateSharedTimer() > #13 0x7f297cb1e668 blink::TimerBase::setNextFireTime() > #14 0x7f297cb1dbac blink::TimerBase::start() > #15 0x7f297c8215ca blink::TimerBase::startOneShot() > #16 0x7f2981e09984 blink::FrameLoader::scheduleCheckCompleted() > #17 0x7f2981e08bf3 blink::FrameLoader::setDefersLoading() > #18 0x7f298201e100 blink::Page::setDefersLoading() > #19 0x7f29820ba8ea blink::ScopedPageLoadDeferrer::detach() > #20 0x7f29820bacbf blink::ScopedPageLoadDeferrer::~ScopedPageLoadDeferrer() > #21 0x7f297e495263 blink::WebView::didExitModalLoop() > #22 0x7f29a4cd7661 content::RenderThreadImpl::Shutdown() > #23 0x7f29a4cd7f51 content::RenderThreadImpl::Shutdown() > #24 0x7f29a45cf571 content::ChildProcess::~ChildProcess() > #25 0x7f29a4cb34ca content::RenderProcess::~RenderProcess() > #26 0x7f29a4cb2ab4 content::RenderProcessImpl::~RenderProcessImpl() What if you just keep the loop around until ~RenderProcessImpl() completes? I.e. scope the RenderProcessImpl more tightly than the message loop. > #27 0x7f29a4ff8105 content::RendererMain() > #28 0x7f296baafb93 content::RunZygote() > #29 0x7f296bab14ff content::RunNamedProcessTypeMain() > #30 0x7f296bac16e0 content::ContentMainRunnerImpl::Run() > #31 0x7f296baac995 content::ContentMain() > #32 0x7f2961f7ceba ChromeMain > #33 0x7f2961f7cabe main > #34 0x7f29557ba76d __libc_start_main > #35 0x7f2961f7c7ad <unknown>
I'm not sure I'm fully following your proposal... but in general, it would be better to not have the message loop destruction sequence tied so tightly to a single specific user (RenderProcessImpl). It is plausible that the message loop destruction should be "staged," but it would IMO be bad to cater too tightly to a single user, as there may be other users with different restrictions, and then it would be hard to untangle the semantics for each user. Much better would be to fashion users to either be able to to deal with task processing ending, or handle destruction without running trailing tasks. It may be valuable to think about some standard staging of the message loop shutdown, and a lot of OSs have such systems that we can look to for reasonable models. We'd need to have an API where interested users could get notifications... a way to indicate that they are OK with moving on to the "next" stage. <sigh>... we'd have to think through avoiding deadlock... and probably have *some* timers to force things when there was no agreement from users... but it could be done. </sigh> Examples of stages might be: a) Going down RSN (posted time, unless all observers say OK?) b) Future task posts won't be run, but will be destroyed. c) Future task posts won't be run or destroyed (they will leak). d) Message loop is destroyed, and continued use (posting) will potentially crash. There *might* be some finer granularity... but the above might work for most uses. Most users should try to avoid involvement with the above. Risks of deadlocks, and confusion, are IMO too great... so you need a really good reason to go there. On Tue, Sep 23, 2014 at 7:09 PM, <haraken@chromium.org> wrote: > On 2014/09/24 01:24:04, haraken wrote: > >> On 2014/09/22 16:40:33, jamesr wrote: >> > On 2014/09/22 15:54:18, haraken wrote: >> > > Thanks James for review! >> > > >> > > >> > >> > > https://codereview.chromium.org/583043005/diff/1/content/ > renderer/renderer_main.cc > >> > > File content/renderer/renderer_main.cc (right): >> > > >> > > >> > >> > > https://codereview.chromium.org/583043005/diff/1/content/ > renderer/renderer_main.cc#newcode238 > >> > > content/renderer/renderer_main.cc:238: >> main_message_loop.DeletePendingTasks(); >> > > On 2014/09/22 14:38:04, jamesr wrote: >> > > > since you're deleting *all* pending tasks and not just Blink ones >> then >> > you > >> > > > definitely do not care about anything else running, so not lgtm. >> You >> should >> > > > simply shut down the loop before shutting down blink >> > > >> > > I'm not familiar with a message loop, but what's a common way to >> > completely > >> > shut >> > > down the main_message_loop here? >> > > >> > > At first I tried something like: >> > > >> > > scoped_ptr<base::MessageLoop> main_message_loop(new >> base::MessageLoop()); >> // >> > > line 155 >> > > { >> > > ...; >> > > main_message_loop.reset(); // line 238 >> > > } >> > > >> > > but it crashed chrome :-/ >> > >> > >> > What is the crash stack? If I run code like this standalone it works >> fine. >> > >> > If somebody is trying to access the message loop after you are >> destroying it >> > here to post a task then this patch won't fix the bug anyway, since this >> > newly > >> > posted task will still be destroyed after blink is shut down. You'll >> need >> > to > >> > figure out who is accessing the message loop when and why. >> > > I uploaded the change to a patch set 2. This crashes with the following >> call >> stack. >> > > The problem is that RenderProcessImpl's destructor posts a task to the >> message >> loop and thus the message loop needs to be kept alive in the >> > RenderProcessImpl's > >> destructor. Thus we cannot destruct the message loop before destructing >> the >> RenderProcessImpl's destructor. >> > > That being said, this is a weird situation. Probably should we >> restructure the >> code so that the RenderProcessImpl's destructor doesn't post any task to >> the >> message loop? Any advice is welcome :) >> > > At a first glance, it seems not that easy to make the RenderProcessImpl's > destructor not post any task to the message loop. Probably what we can do > is: > > 1) Flush all pending tasks (which may touch Blink objects) in the message > loop. > 2) Destruct the RenderProcessImpl's destructor. > 3) Allow the RenderProcessImpl's destructor to post a task to the message > loop > but the task shouldn't touch Blink objects of course. > 4) Destruct the message loop. > > This is something the patch set 1 is doing. > > I'd like to hear your thoughts. > > > https://codereview.chromium.org/583043005/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Looking more closely you need to shut the loop down at a specific point in RenderThreadImpl shutdown, which'll mean spreading this code out a bit. In particular, you need to run everything up to but not including the blink::shutdown() call with the loop active, then destroy the loop, then call blink::shutdown(). I think anything trying to access the message loop after the blink::shutdown() call is probably bogus and we should find + fix it. This thread exists for blink pretty much, anything else that happens is in service of blink.
On 2014/09/24 03:40:09, jamesr wrote: > Looking more closely you need to shut the loop down at a specific point in > RenderThreadImpl shutdown, which'll mean spreading this code out a bit. In > particular, you need to run everything up to but not including the > blink::shutdown() call with the loop active, then destroy the loop, then call > blink::shutdown(). I think anything trying to access the message loop after the > blink::shutdown() call is probably bogus and we should find + fix it. This > thread exists for blink pretty much, anything else that happens is in service of > blink. Thanks James. Is your proposal something like this? int RendererMain() { scoped_ptr<base::MessageLoop> main_message_loop(new base::MessageLoop()); ...; { RenderProcessImpl render_process(main_message_loop.Pass()); ...; } } class RenderProcessImpl { RenderProcessImpl(scoped_ptr<base::MessageLoop> message_loop) : message_loop_(message_loop) { } ~RenderProcessImpl() { ...; message_loop_.reset(); ...; shutdown_blink(); } scoped_ptr<base::MessageLoop> message_loop_; };
Worth a try!
I tried to completely shut down (i.e., destruct) the message loop just before RenderThreadImpl::Shutdown() calls blink::shutdown(). However, this caused a crash in blink::shutdown() because blink::shutdown() accesses the (already destructed) message loop in order to unregister a task observer from the message loop (See shutdown() in WebKit.cpp). There are two ways to address the issue: a) Remove the unregistration code from the blink::shutdown() so that blink::shutdown() never touches the message loop. b) Instead of having RenderThreadImpl::Shutdown() completely shut down the message loop, just flush all pending tasks of the message loop before RenderThreadImpl::Shutdown() calls blink::shutdown(). I think that b) is more modest and the patch set 3 takes the approach. What do you think?
On 2014/09/24 04:53:29, haraken wrote: > I tried to completely shut down (i.e., destruct) the message loop just before > RenderThreadImpl::Shutdown() calls blink::shutdown(). However, this caused a > crash in blink::shutdown() because blink::shutdown() accesses the (already > destructed) message loop in order to unregister a task observer from the message > loop (See shutdown() in WebKit.cpp). > > There are two ways to address the issue: > > a) Remove the unregistration code from the blink::shutdown() so that > blink::shutdown() never touches the message loop. > > b) Instead of having RenderThreadImpl::Shutdown() completely shut down the > message loop, just flush all pending tasks of the message loop before > RenderThreadImpl::Shutdown() calls blink::shutdown(). > > I think that b) is more modest and the patch set 3 takes the approach. What do > you think? s/patch set 3/patch set 4/
This is still risky because if blink::shutdown() calls any code that creates a task or puts anything else on the message loop then those tasks may run or be destroyed after blink::shutdown() returns. The only way to be safe and make sure nothing calls in to blink after blink::shutdown() is to already have the message loop destroyed before blink::shutdown(), i.e. do option 1. There's no point in doing cleanup if the loop is already gone. https://codereview.chromium.org/583043005/diff/60001/base/message_loop/messag... File base/message_loop/message_loop.h (right): https://codereview.chromium.org/583043005/diff/60001/base/message_loop/messag... base/message_loop/message_loop.h:374: // Flushes all pending tasks in this message loop. i really think if we need to patch base/ for this then we're doing something wrong
On 2014/09/24 04:57:26, jamesr wrote: > This is still risky because if blink::shutdown() calls any code that creates a > task or puts anything else on the message loop then those tasks may run or be > destroyed after blink::shutdown() returns. The only way to be safe and make > sure nothing calls in to blink after blink::shutdown() is to already have the > message loop destroyed before blink::shutdown(), i.e. do option 1. There's no > point in doing cleanup if the loop is already gone. Implemented the option 1 in the patch set 5. This needs a Blink-side CL (https://codereview.chromium.org/598903002). The patch set 5 introduces MessageLoop::Shutdown() so that RenderThreadImpl::Shutdown() can shut down the message loop just before calling blink::shutdown(). If we really don't want to modify src/base/ at all, I can pass the message_loop_ around further down to the RenderThreadImpl (instead of introducing MessageLoop::Shutdown()), but it will require a more intrusive change.
Adding MessageLoop::Shutdown() introduces a new awkward state for base::MessageLoop where it's still around but is "dead". We shouldn't do that.
On 2014/09/24 05:46:10, jamesr wrote: > Adding MessageLoop::Shutdown() introduces a new awkward state for > base::MessageLoop where it's still around but is "dead". We shouldn't do that. Done. I passed a scoped_ptr<base::MessageLoop> to RenderThreadImpl so that RenderThreadImpl can destruct the message loop before calling blink::shutdown(). Would you take a look at the patch set 7?
jamesr@: friendly ping :)
The lack of trybots makes me nervous, but the code change lgtm as far as I can tell. It'd be superduper awesome if you could cook up a testcase.
On 2014/09/25 06:03:21, jamesr wrote: > The lack of trybots makes me nervous, but the code change lgtm as far as I can > tell. It'd be superduper awesome if you could cook up a testcase. Thanks! Unfortunately I cannot reproduce the crash reported in the bug though... I need to land a Blink-side fix (https://codereview.chromium.org/598903002/) before landing this CL.
On 2014/09/25 06:24:53, haraken wrote: > On 2014/09/25 06:03:21, jamesr wrote: > > The lack of trybots makes me nervous, but the code change lgtm as far as I can > > tell. It'd be superduper awesome if you could cook up a testcase. > > Thanks! Unfortunately I cannot reproduce the crash reported in the bug though... > > I need to land a Blink-side fix (https://codereview.chromium.org/598903002/) > before landing this CL. The Blink-side CL landed, so let me land this one.
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/583043005/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001) as 1fb8091790b5065fa77956fbdc2f5ce5b539494b
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/fdd5612c20f777e1279efd7c1e99d82ed04afaaf Cr-Commit-Position: refs/heads/master@{#296697}
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/608043002/ by reveman@chromium.org. The reason for reverting is: Speculative revert. Likely cause of crbug.com/418206.
Message was sent while issue was closed.
On 2014/09/26 20:24:59, reveman wrote: > A revert of this CL (patchset #7 id:120001) has been created in > https://codereview.chromium.org/608043002/ by mailto:reveman@chromium.org. > > The reason for reverting is: Speculative revert. Likely cause of > crbug.com/418206. This CL needs to be reverted. Just confirmed, this caused a lot of crashes in Linux ASAN build. To reproduce, build blink_tests with ASAN, then: third_party/WebKit/Tools/Scripts/run-webkit-tests --no-show-results --no-new-test-results --full-results-html --clobber-old-results --exit-after-n-failures 5000 --exit-after-n-crashes-or-timeouts 100 --debug-rwt-logging --target Release --builder-name "WebKit Linux ASAN" --build-number 15000 --master-name ChromiumWebkit --build-name WebKit_Linux_ASAN --time-out-ms 48000 --additional-expectations=third_party/WebKit/LayoutTests/ASANExpectations --enable-sanitizer virtual/implsidepainting/inspector/timeline
jamesr@: PTAL again. The previous patch caused crashes in ASAN bots because GpuChannelHost is destructed in RenderThreadImpl's destructor and the GpuChannelHost's destructor touches the message loop. (See https://code.google.com/p/chromium/issues/detail?id=418114 for more details.) To avoid the issue, I added 'gpu_channel_ = NULL;' before the RenderThreadImpl destructs the message loop.
https://codereview.chromium.org/583043005/diff/140001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/583043005/diff/140001/content/renderer/render... content/renderer/render_thread_impl.cc:681: NPChannelBase::CleanupChannels(); could this cause problems? it appears to call into blink again: https://code.google.com/p/chromium/codesearch#chromium/src/content/plugin/plu...
jamesr@: Thanks for review. PTAL. https://codereview.chromium.org/583043005/diff/140001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/583043005/diff/140001/content/renderer/render... content/renderer/render_thread_impl.cc:681: NPChannelBase::CleanupChannels(); On 2014/09/29 19:03:56, jamesr wrote: > could this cause problems? it appears to call into blink again: > > https://code.google.com/p/chromium/codesearch#chromium/src/content/plugin/plu... Moved this line up to the main_message_loop_.reset(). (Though I couldn't find any real issue caused by calling CleanupChannels() after destructing the message loop.)
OK. I wasn't totally sure about that either, but this seems safer
On 2014/09/30 01:09:54, jamesr wrote: > OK. I wasn't totally sure about that either, but this seems safer Thanks, let me land the CL.
lgtm
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/583043005/160001
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as de8c98a7bfd02b8022da6577d3e4112b15c34ba2
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/16d32a9f7f6d1ebb639cacedb5156272a9fec764 Cr-Commit-Position: refs/heads/master@{#297338}
Message was sent while issue was closed.
A revert of this CL (patchset #9 id:160001) has been created in https://codereview.chromium.org/614233002/ by kareng@google.com. The reason for reverting is: BUG=418206 broke canary AGAIN! please do NOT reland without fixing! https://code.google.com/p/chromium/issues/detail?id=418206.
jamesr@: Please take another look. kareng@: Sorry about the repeated breakage. Actually it's hard to make sure that this CL doesn't cause any crash until hitting actual crash reports from the wild... All tests are passing. I think the latest CL fixes the crash reported in https://code.google.com/p/chromium/issues/detail?id=418206#c9, but honestly speaking I'm not 100% sure that the latest CL won't cause any crash. I won't land this CL without your approval. https://codereview.chromium.org/583043005/diff/180001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/583043005/diff/180001/content/renderer/render... content/renderer/render_thread_impl.cc:668: gpu_channel_->DestroyChannel(); Just doing 'gpu_channel_ = NULL' was not enough, because it just clears a scoped_refptr from the RenderThreadImpl object. Another scoped_refptr is held by ContextProviderCommandBuffer() (and probably by other places as well) and thus 'gpu_channel_ = NULL' does not cause the destruction of GPUChannelHost (the GPUChannelHost's destructor touches the message loop). To prevent the GPUChannelHost's destructor from touching the message loop, I added GPUChannelHost::DestroyChannel and explicitly called it here. The GPUChannelHost::DestroyChannel flushes the message loop so that the GPUChannelHost's destructor doesn't need to do anything.
Let me reland this CL.
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/583043005/240001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
haraken@chromium.org changed reviewers: + kbr@chromium.org
kbr@: Would you take a look at the change in content/common/gpu/ ?
content/common/gpu changes LGTM.
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/583043005/240001
Message was sent while issue was closed.
Committed patchset #13 (id:240001)
Message was sent while issue was closed.
Patchset 13 (id:??) landed as https://crrev.com/53f081de05b86f73eca4e383a16c8dc723b78a99 Cr-Commit-Position: refs/heads/master@{#303557} |