Issue 9837053: Added code for renderer to cleanly exit.

Issue 9837053: Added code for renderer to cleanly exit. (Closed)

Created:
8 years, 9 months ago by asharif1

Modified:
8 years, 8 months ago

Reviewers:
agl, Evan Martin, Markus (顧孟勤)

CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, jam, bjanakiraman

Base URL:
http://git.chromium.org/chromium/src.git@master

Visibility:
Public.

More Reviews

Description

This is only executed when Chrome is built with -DRENDERER_CLEAN_EXIT. Before this CL, if you kill Chrome's main process, the renderer would not exit cleanly. Because of this, when running Chrome under Profile-Guided-Optimization mode, we would not get the profile for the renderer process. That results in only minor speed gains when using PGO (<5% improvement). When the profile of the renderer process is captured and fed back, we get up to 27% performance improvement in page cyclers on ChromeOS. TEST=Rebuilt Chrome with -DRENDERER_CLEAN_EXIT. Ran on ChromeOS. Attached gdb to the renderer. Set a breakpoint on exit(). Sent SIGTERM to parent Chrome process. Saw the breakpoint get hit. BUG=107584

Patch Set 1 #

Patch Set 2 : git try #

Total comments: 4

Patch Set 3 : Tested this patch on ChromeOS. Working. #

Patch Set 4 : Added a more descriptive message. #

Total comments: 15

Patch Set 5 : Addressed some comments. #

Created: 8 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+50 lines, -0 lines)			Patch
M	base/process_util.h	View	1 2 3 4	2 chunks	+9 lines, -0 lines	0 comments	Download
M	base/process_util.cc	View	1 2 3 4	1 chunk	+15 lines, -0 lines	0 comments	Download
M	chrome/renderer/chrome_render_process_observer.cc	View	1 2	1 chunk	+4 lines, -0 lines	0 comments	Download
M	content/renderer/renderer_main.cc	View	1 2 3 4	2 chunks	+22 lines, -0 lines	0 comments	Download

Messages

Total messages: 10 (0 generated)

Expand Messages | Collapse Messages

Evan Martin

markus or agl should review, here are my comments http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.cc File content/renderer/renderer_main.cc (right): http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.cc#newcode131 content/renderer/renderer_main.cc:131: ...

8 years, 9 months ago (2012-03-26 17:17:25 UTC) #2

agl

In additional to Evan's comments, the CL description needs to mention why we want this. ...

8 years, 9 months ago (2012-03-26 17:24:32 UTC) #3

asharif1

On 2012/03/26 17:24:32, agl wrote: > In additional to Evan's comments, the CL description needs ...

8 years, 9 months ago (2012-03-26 17:37:41 UTC) #4

asharif1

On 2012/03/26 17:37:41, asharif1 wrote: > On 2012/03/26 17:24:32, agl wrote: > > In additional ...

8 years, 9 months ago (2012-03-28 20:48:16 UTC) #5

Markus (顧孟勤)

I think, this still needs a little bit of work. And it probably also needs ...

8 years, 9 months ago (2012-03-28 23:07:56 UTC) #6

asharif1

http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_render_process_observer.cc File chrome/renderer/chrome_render_process_observer.cc (right): http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_render_process_observer.cc#newcode168 chrome/renderer/chrome_render_process_observer.cc:168: _exit(0); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > _exit() ...

8 years, 9 months ago (2012-03-29 02:13:11 UTC) #7

http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend...
File chrome/renderer/chrome_render_process_observer.cc (right):

http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend...
chrome/renderer/chrome_render_process_observer.cc:168: _exit(0);
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> _exit() is a great function to call in case of unexpected behavior. Unlike
> exit(), it is async-signal-safe. And it also doesn't have any problem with
> multiple threads trying to exit at the same time.
> 
> On the other hand, your code calls exit(). That's probably not what you want.
> Destructors cannot reliably execute from signal context.
> 
> If you want to call exit(), you pretty much have to do so from the main thread
> after exiting its mainloop. If that's not possible, call _exit() instead.

In this case, I do want to call exit().

If I change this to _exit(), what happens is the following:

One thread receives enters this function and calls _exit().
Around the same time, the main process sends a SIGTERM to the renderer and that
calls exit().

Sometimes the exit handlers are called and other times they are not. I want to
guarantee that they are called.

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
File content/renderer/renderer_main.cc (right):

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
content/renderer/renderer_main.cc:133: fprintf(stderr, "Calling
clean_exit().\n");
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> You cannot call fprintf() from a signal handler.

Done.

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
content/renderer/renderer_main.cc:139: struct sigaction term_action;
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> It is good practice to use memset() to clear out any data structure of type
> "struct sigaction". If you do that, you also don't have to explicitly set
> sa_flags.

Done.

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
content/renderer/renderer_main.cc:144: fprintf(stderr, "Installing SIGTERM
handler to the renderer process.\n");
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> Why do you need to print this message? We typically don't print messages that
> just inform about the normal execution of the program.

Done.

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
content/renderer/renderer_main.cc:145: return_value = sigaction(SIGTERM,
&term_action, NULL);
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> If sigaction() failed, you should trigger a CHECK() failure. Something is
> clearly very wrong. If it didn't fail, then no need to print anything.

Done.

http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m...
content/renderer/renderer_main.cc:261: AddSigtermHandler();
On 2012/03/28 23:07:56, Markus (顧孟勤) wrote:
> Can you say a little more about the rationale for why we are adding this
> handler?
> 
> The default behavior for this SIGTERM is to terminate the program. You can
check
> this by reading "man 7 signal". The only thing that your signal handler does
> different is printing a message and (unreliably) running at_exit() handlers.
> 
> Is that what you need to do? If so, we should probably do things correctly. In
> other words, we would need to find a way that the signal handler can safely
> inform the main loop that it should exit.
> 
> And if we do that, then we probably should do so unconditionally. So, no need
> for RENDERER_CLEAN_EXIT.

Read the CL description.

My goal is to run exit handlers upon exit. That's it.

I experimented with just adding a signal handler here and calling exit() and
just calling exit() instead of _exit in the other file. Both need to be in place
for it to work.

Since you're more familiar with the codebase, what do you suggest I do to make
it cleanly exit instead of the current situation where both an uncaught SIGTERM
and _exit() call try to terminate the process around the same time.

Markus (顧孟勤)

On Wed, Mar 28, 2012 at 19:13, <asharif@chromium.org> wrote: > One thread receives enters this ...

8 years, 9 months ago (2012-03-29 04:48:07 UTC) #8

On Wed, Mar 28, 2012 at 19:13, <asharif@chromium.org> wrote:

> One thread receives enters this function and calls _exit().
> Around the same time, the main process sends a SIGTERM to the renderer
> and that calls exit().
>
> Sometimes the exit handlers are called and other times they are not. I
> want to guarantee that they are called.

You are actually describing one of the many possible problems with signals.
If you need to run arbitrary code, don't use signal handlers. And exit
handlers are exactly that: arbitrary code.

Signal handlers are very difficult to write correctly, and they are very
limited in what they can do. They are almost always the wrong answer.

My goal is to run exit handlers upon exit. That's it.
>

If you exit the renderer normally, don't the exit handlers already run?
RendererMain() certainly looks as if it intends to exit and uninitialize at
some point.

I experimented with just adding a signal handler here and calling exit()
> and just calling exit() instead of _exit in the other file. Both need to
> be in place for it to work.
>

That sounds wrong. In the presence of threads, it is generally not safe to
call exit(); that's why _exit() was invented.

You almost always will have to either a) shut down all threads and then
call exit() from the remaining main thread, or b) call _exit() and forgo
the use of exit handlers. This is pretty much the same problem that you
would encounter with signal handlers; for the purposes of this discussion,
you can think of a signal handler as something very similar to a new
short-lived thread.

Doing anything else is guaranteed to result in deadlocks and data
corruption. Depending on what the other threads are doing when you are
shutting down the program, you might get lucky. You might even get lucky
most of the time. But this is not a bug we want to check into Chrome. I'd
hate to be the one who has to debug this problem.

Since you're more familiar with the codebase, what do you suggest I do
> to make it cleanly exit instead of the current situation where both an
> uncaught SIGTERM and _exit() call try to terminate the process around
> the same time.
>

Both uncaught SIGTERM and _exit() actually do the right thing. They shut
the program (i.e. all threads) down, without running any exit handlers.
That's a safe operation -- but apparently not what you need...

So, I think, I need to understand a little more about the actual problem
that you want to solve. Why is this suddenly becoming a problem? Do we
normally send SIGTERM to the renderers? Shouldn't we ask them to shut down
normally (presumably by either sending them an IPC, or by closing the IPC
channel, which implicitly notifies the other end of the channel)? In fact,
the kernel ensures that closing the IPC channel happens automatically, if
the browser process terminated without doing anything special.

Markus

asharif1

Okay, a little background since we're going back and forth on this. ChromeOS wants to ...

8 years, 9 months ago (2012-03-29 05:09:49 UTC) #9

Okay, a little background since we're going back and forth on this.

ChromeOS wants to use Profile-Guided Optimization (PGO) with Chrome. It
provides up to 27% performance improvement on PageCyclers.

How does PGO work? First, a binary is built with instrumentation which
records edge information in memory using counters. At exit time, the binary
dumps all edge counts with function ids, etc. to a file (it installs a
handler using atexit() and exit() calls that handler). The code to register
the handler and dump the profile is generated by gcc when
-fprofile-generate is used.

Chrome is a multiprocess app as we all know and when it is terminated, the
main process writes its counters to the file properly. The renderer
currently does not. Without the renderer's profile we do not get the
performance improvement we desire.

I debugged a bit as to how the renderer shuts down and gathered the
following information: one of the threads receives a SIGTERM from the main
process. One of the other threads just calls _exit(). These two events
happen around the same time. The SIGTERM is uncaught and terminates the
process. Even if I change the _exit() to exit(), the SIGTERM terminates the
process before the profile is written to the file. If I add a SIGTERM
handler to call exit(), the other thread calls _exit() before the profile
is written. So both places have to change in order to cleanly call the exit
handlers (one of which writes the profile).

On Wed, Mar 28, 2012 at 9:47 PM, Markus Gutschke <markus@chromium.org>wrote:

> On Wed, Mar 28, 2012 at 19:13, <asharif@chromium.org> wrote:
>
>> One thread receives enters this function and calls _exit().
>> Around the same time, the main process sends a SIGTERM to the renderer
>> and that calls exit().
>>
>> Sometimes the exit handlers are called and other times they are not. I
>> want to guarantee that they are called.
>
>
> You are actually describing one of the many possible problems with
> signals. If you need to run arbitrary code, don't use signal handlers. And
> exit handlers are exactly that: arbitrary code.
>
> Signal handlers are very difficult to write correctly, and they are very
> limited in what they can do. They are almost always the wrong answer.
>
> My goal is to run exit handlers upon exit. That's it.
>>
>
> If you exit the renderer normally, don't the exit handlers already run?
>

No they do not currently. I filed a bug against Chrome
http://code.google.com/p/chromium/issues/detail?id=107584&q=renderer%20clean%....
It's very easy to reproduce. Run Chrome, attach gdb to the renderer and put
a breakpoint on exit() or an exit handler (in my case gcov_exit()). It will
not be hit when you close Chrome or send a SIGTERM to the main Chrome
process.

> RendererMain() certainly looks as if it intends to exit and uninitialize
> at some point.
>
> I experimented with just adding a signal handler here and calling exit()
>> and just calling exit() instead of _exit in the other file. Both need to
>> be in place for it to work.
>>
>
> That sounds wrong. In the presence of threads, it is generally not safe to
> call exit(); that's why _exit() was invented.
>

I have verified that my patch actually does work every time I have tested
it (tens of times). But there may be subtle issues that are wrong with it.
Could you provide some test case for me to test it against?

>
> You almost always will have to either a) shut down all threads and then
> call exit() from the remaining main thread, or b) call _exit() and forgo
> the use of exit handlers. This is pretty much the same problem that you
> would encounter with signal handlers; for the purposes of this discussion,
> you can think of a signal handler as something very similar to a new
> short-lived thread.
>
> Doing anything else is guaranteed to result in deadlocks and data
> corruption. Depending on what the other threads are doing when you are
> shutting down the program, you might get lucky. You might even get lucky
> most of the time. But this is not a bug we want to check into Chrome. I'd
> hate to be the one who has to debug this problem.
>

I don't want to check-in something to Chrome which causes issues like
delaying exit, etc. If you look at the comment above the call to _exit() it
provides justification to using _exit(). This is why I added a define which
I will only enable when I am building the instrumented version
(-fprofile-generate in CXXFLAGS).

>
> Since you're more familiar with the codebase, what do you suggest I do
>> to make it cleanly exit instead of the current situation where both an
>> uncaught SIGTERM and _exit() call try to terminate the process around
>> the same time.
>>
>
> Both uncaught SIGTERM and _exit() actually do the right thing. They shut
> the program (i.e. all threads) down, without running any exit handlers.
> That's a safe operation -- but apparently not what you need...
>
> So, I think, I need to understand a little more about the actual problem
> that you want to solve. Why is this suddenly becoming a problem? Do we
> normally send SIGTERM to the renderers? Shouldn't we ask them to shut down
> normally (presumably by either sending them an IPC, or by closing the IPC
> channel, which implicitly notifies the other end of the channel)? In fact,
> the kernel ensures that closing the IPC channel happens automatically, if
> the browser process terminated without doing anything special.
>

Now that you hopefully understand the situation, let's come up with the
design and I can try to author the patch.

Thanks,

>
>
> Markus
>

Markus (顧孟勤)

8 years, 9 months ago (2012-03-29 05:46:39 UTC) #10

We really need input from people who have looked at this code in more
detail (unfortunately, that probably means Evan). But here are some
suggestions that might get you started on how to do this properly.

In the vast majority of cases, the renderer will eventually return to the
main loop. So, if we are interested in running exit handlers (e.g. when
profiling), we should not call _exit() from the
SuicideOnChannelErrorFilter. If we just continue, we will hopefully shut
down properly in a couple of seconds.

Of course, there will be rare cases, when we are in fact hung and don't
exit properly. You can probably address this problem by calling something
like "alarm(30)" in the SuicideOnChannelErrorFilter. This will trigger a
SIGALRM after 30s, and I believe we keep the default behavior, which will
end up terminating the program.

Of course, you now need to double-check that the browser does in fact try
to shut down the renderer properly, instead of sending it a SIGTERM.

If it doesn't do that, we have to figure out how to fix that -- and again,
this might have to be code that is different between the normal case and
the code that runs when profiling.


Markus

Expand Messages | Collapse Messages