|
|
Created:
8 years, 9 months ago by asharif1 Modified:
8 years, 8 months ago CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, jam, bjanakiraman Base URL:
http://git.chromium.org/chromium/src.git@master Visibility:
Public. |
DescriptionThis is only executed when Chrome is built with -DRENDERER_CLEAN_EXIT.
Before this CL, if you kill Chrome's main process, the renderer would not exit
cleanly. Because of this, when running Chrome under Profile-Guided-Optimization
mode, we would not get the profile for the renderer process. That results in
only minor speed gains when using PGO (<5% improvement). When the profile of the
renderer process is captured and fed back, we get up to 27% performance
improvement in page cyclers on ChromeOS.
TEST=Rebuilt Chrome with -DRENDERER_CLEAN_EXIT.
Ran on ChromeOS.
Attached gdb to the renderer. Set a breakpoint on exit().
Sent SIGTERM to parent Chrome process.
Saw the breakpoint get hit.
BUG=107584
Patch Set 1 #Patch Set 2 : git try #
Total comments: 4
Patch Set 3 : Tested this patch on ChromeOS. Working. #Patch Set 4 : Added a more descriptive message. #
Total comments: 15
Patch Set 5 : Addressed some comments. #
Messages
Total messages: 10 (0 generated)
Thoughts on this patch?
markus or agl should review, here are my comments http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.cc File content/renderer/renderer_main.cc (right): http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.... content/renderer/renderer_main.cc:131: { curly should be on same line of function decl (see surrounding code) http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.... content/renderer/renderer_main.cc:132: LOG(INFO) << "SIGTERM caught by the renderer process. Calling exit()."; I think LOG() probably does a bunch of stuff that isn't safe from a signal handler. You might want to just write() to the stderr fd. http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.... content/renderer/renderer_main.cc:138: struct sigaction termAction; variable names should be like term_action http://codereview.chromium.org/9837053/diff/3/content/renderer/renderer_main.... content/renderer/renderer_main.cc:144: LOG(INFO) << "Added SIGTERM handler to the renderer process."; Rather than this, perhaps you should just check the return value of sigaction? grep the code for use of the PLOG() macro to automatically strerror(errno).
In additional to Evan's comments, the CL description needs to mention why we want this. (i.e. in order to cleanly collect profiling data for feedback-directed optimising compilers.)
On 2012/03/26 17:24:32, agl wrote: > In additional to Evan's comments, the CL description needs to mention why we > want this. (i.e. in order to cleanly collect profiling data for > feedback-directed optimising compilers.) Thanks for your feedback and I will be taking that into account when I upload the next patchset. At the moment, I'm still debugging why this is flaky on ChromeOS. I mean when I put a breakpoint on exit(), it sometimes gets called and sometimes not when I click the close button on the Chrome parent window. I'll update here once I make progress.
On 2012/03/26 17:37:41, asharif1 wrote: > On 2012/03/26 17:24:32, agl wrote: > > In additional to Evan's comments, the CL description needs to mention why we > > want this. (i.e. in order to cleanly collect profiling data for > > feedback-directed optimising compilers.) > > Thanks for your feedback and I will be taking that into account when I upload > the next patchset. At the moment, I'm still debugging why this is flaky on > ChromeOS. I mean when I put a breakpoint on exit(), it sometimes gets called and > sometimes not when I click the close button on the Chrome parent window. I'll > update here once I make progress. Please take another look. This time I extended the patch a bit to work under more situations. Now it works even if you kill the main Chrome process or pkill Chrome, or kill just the renderer process using SIGTERM.
I think, this still needs a little bit of work. And it probably also needs a bit more of an explanation why we are doing this. http://codereview.chromium.org/9837053/diff/12003/base/process_util.cc File base/process_util.cc (right): http://codereview.chromium.org/9837053/diff/12003/base/process_util.cc#newcode82 base/process_util.cc:82: extern void clean_exit(int return_value) This function should be marked as __attribute__((noreturn)) http://codereview.chromium.org/9837053/diff/12003/base/process_util.cc#newcode89 base/process_util.cc:89: fprintf(stderr, "Renderer process exiting.\n"); If you plan on calling clean_exit() from a signal handler, then you cannot call fprintf() here. You have to restrict yourself to functions that are async-signal-safe. See http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html for a list of suitable functions. http://codereview.chromium.org/9837053/diff/12003/base/process_util.cc#newcode95 base/process_util.cc:95: fprintf(stderr, "Not calling exit() myself.\n"); I don't think you ever want to return from this function. The calling code is unlikely to be prepared for an exit-type function to return. http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend... File chrome/renderer/chrome_render_process_observer.cc (right): http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend... chrome/renderer/chrome_render_process_observer.cc:168: _exit(0); _exit() is a great function to call in case of unexpected behavior. Unlike exit(), it is async-signal-safe. And it also doesn't have any problem with multiple threads trying to exit at the same time. On the other hand, your code calls exit(). That's probably not what you want. Destructors cannot reliably execute from signal context. If you want to call exit(), you pretty much have to do so from the main thread after exiting its mainloop. If that's not possible, call _exit() instead. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... File content/renderer/renderer_main.cc (right): http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:133: fprintf(stderr, "Calling clean_exit().\n"); You cannot call fprintf() from a signal handler. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:139: struct sigaction term_action; It is good practice to use memset() to clear out any data structure of type "struct sigaction". If you do that, you also don't have to explicitly set sa_flags. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:144: fprintf(stderr, "Installing SIGTERM handler to the renderer process.\n"); Why do you need to print this message? We typically don't print messages that just inform about the normal execution of the program. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:145: return_value = sigaction(SIGTERM, &term_action, NULL); If sigaction() failed, you should trigger a CHECK() failure. Something is clearly very wrong. If it didn't fail, then no need to print anything. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:261: AddSigtermHandler(); Can you say a little more about the rationale for why we are adding this handler? The default behavior for this SIGTERM is to terminate the program. You can check this by reading "man 7 signal". The only thing that your signal handler does different is printing a message and (unreliably) running at_exit() handlers. Is that what you need to do? If so, we should probably do things correctly. In other words, we would need to find a way that the signal handler can safely inform the main loop that it should exit. And if we do that, then we probably should do so unconditionally. So, no need for RENDERER_CLEAN_EXIT.
http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend... File chrome/renderer/chrome_render_process_observer.cc (right): http://codereview.chromium.org/9837053/diff/12003/chrome/renderer/chrome_rend... chrome/renderer/chrome_render_process_observer.cc:168: _exit(0); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > _exit() is a great function to call in case of unexpected behavior. Unlike > exit(), it is async-signal-safe. And it also doesn't have any problem with > multiple threads trying to exit at the same time. > > On the other hand, your code calls exit(). That's probably not what you want. > Destructors cannot reliably execute from signal context. > > If you want to call exit(), you pretty much have to do so from the main thread > after exiting its mainloop. If that's not possible, call _exit() instead. In this case, I do want to call exit(). If I change this to _exit(), what happens is the following: One thread receives enters this function and calls _exit(). Around the same time, the main process sends a SIGTERM to the renderer and that calls exit(). Sometimes the exit handlers are called and other times they are not. I want to guarantee that they are called. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... File content/renderer/renderer_main.cc (right): http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:133: fprintf(stderr, "Calling clean_exit().\n"); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > You cannot call fprintf() from a signal handler. Done. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:139: struct sigaction term_action; On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > It is good practice to use memset() to clear out any data structure of type > "struct sigaction". If you do that, you also don't have to explicitly set > sa_flags. Done. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:144: fprintf(stderr, "Installing SIGTERM handler to the renderer process.\n"); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > Why do you need to print this message? We typically don't print messages that > just inform about the normal execution of the program. Done. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:145: return_value = sigaction(SIGTERM, &term_action, NULL); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > If sigaction() failed, you should trigger a CHECK() failure. Something is > clearly very wrong. If it didn't fail, then no need to print anything. Done. http://codereview.chromium.org/9837053/diff/12003/content/renderer/renderer_m... content/renderer/renderer_main.cc:261: AddSigtermHandler(); On 2012/03/28 23:07:56, Markus (顧孟勤) wrote: > Can you say a little more about the rationale for why we are adding this > handler? > > The default behavior for this SIGTERM is to terminate the program. You can check > this by reading "man 7 signal". The only thing that your signal handler does > different is printing a message and (unreliably) running at_exit() handlers. > > Is that what you need to do? If so, we should probably do things correctly. In > other words, we would need to find a way that the signal handler can safely > inform the main loop that it should exit. > > And if we do that, then we probably should do so unconditionally. So, no need > for RENDERER_CLEAN_EXIT. Read the CL description. My goal is to run exit handlers upon exit. That's it. I experimented with just adding a signal handler here and calling exit() and just calling exit() instead of _exit in the other file. Both need to be in place for it to work. Since you're more familiar with the codebase, what do you suggest I do to make it cleanly exit instead of the current situation where both an uncaught SIGTERM and _exit() call try to terminate the process around the same time.
On Wed, Mar 28, 2012 at 19:13, <asharif@chromium.org> wrote: > One thread receives enters this function and calls _exit(). > Around the same time, the main process sends a SIGTERM to the renderer > and that calls exit(). > > Sometimes the exit handlers are called and other times they are not. I > want to guarantee that they are called. You are actually describing one of the many possible problems with signals. If you need to run arbitrary code, don't use signal handlers. And exit handlers are exactly that: arbitrary code. Signal handlers are very difficult to write correctly, and they are very limited in what they can do. They are almost always the wrong answer. My goal is to run exit handlers upon exit. That's it. > If you exit the renderer normally, don't the exit handlers already run? RendererMain() certainly looks as if it intends to exit and uninitialize at some point. I experimented with just adding a signal handler here and calling exit() > and just calling exit() instead of _exit in the other file. Both need to > be in place for it to work. > That sounds wrong. In the presence of threads, it is generally not safe to call exit(); that's why _exit() was invented. You almost always will have to either a) shut down all threads and then call exit() from the remaining main thread, or b) call _exit() and forgo the use of exit handlers. This is pretty much the same problem that you would encounter with signal handlers; for the purposes of this discussion, you can think of a signal handler as something very similar to a new short-lived thread. Doing anything else is guaranteed to result in deadlocks and data corruption. Depending on what the other threads are doing when you are shutting down the program, you might get lucky. You might even get lucky most of the time. But this is not a bug we want to check into Chrome. I'd hate to be the one who has to debug this problem. Since you're more familiar with the codebase, what do you suggest I do > to make it cleanly exit instead of the current situation where both an > uncaught SIGTERM and _exit() call try to terminate the process around > the same time. > Both uncaught SIGTERM and _exit() actually do the right thing. They shut the program (i.e. all threads) down, without running any exit handlers. That's a safe operation -- but apparently not what you need... So, I think, I need to understand a little more about the actual problem that you want to solve. Why is this suddenly becoming a problem? Do we normally send SIGTERM to the renderers? Shouldn't we ask them to shut down normally (presumably by either sending them an IPC, or by closing the IPC channel, which implicitly notifies the other end of the channel)? In fact, the kernel ensures that closing the IPC channel happens automatically, if the browser process terminated without doing anything special. Markus
Okay, a little background since we're going back and forth on this. ChromeOS wants to use Profile-Guided Optimization (PGO) with Chrome. It provides up to 27% performance improvement on PageCyclers. How does PGO work? First, a binary is built with instrumentation which records edge information in memory using counters. At exit time, the binary dumps all edge counts with function ids, etc. to a file (it installs a handler using atexit() and exit() calls that handler). The code to register the handler and dump the profile is generated by gcc when -fprofile-generate is used. Chrome is a multiprocess app as we all know and when it is terminated, the main process writes its counters to the file properly. The renderer currently does not. Without the renderer's profile we do not get the performance improvement we desire. I debugged a bit as to how the renderer shuts down and gathered the following information: one of the threads receives a SIGTERM from the main process. One of the other threads just calls _exit(). These two events happen around the same time. The SIGTERM is uncaught and terminates the process. Even if I change the _exit() to exit(), the SIGTERM terminates the process before the profile is written to the file. If I add a SIGTERM handler to call exit(), the other thread calls _exit() before the profile is written. So both places have to change in order to cleanly call the exit handlers (one of which writes the profile). On Wed, Mar 28, 2012 at 9:47 PM, Markus Gutschke <markus@chromium.org>wrote: > On Wed, Mar 28, 2012 at 19:13, <asharif@chromium.org> wrote: > >> One thread receives enters this function and calls _exit(). >> Around the same time, the main process sends a SIGTERM to the renderer >> and that calls exit(). >> >> Sometimes the exit handlers are called and other times they are not. I >> want to guarantee that they are called. > > > You are actually describing one of the many possible problems with > signals. If you need to run arbitrary code, don't use signal handlers. And > exit handlers are exactly that: arbitrary code. > > Signal handlers are very difficult to write correctly, and they are very > limited in what they can do. They are almost always the wrong answer. > > My goal is to run exit handlers upon exit. That's it. >> > > If you exit the renderer normally, don't the exit handlers already run? > No they do not currently. I filed a bug against Chrome http://code.google.com/p/chromium/issues/detail?id=107584&q=renderer%20clean%.... It's very easy to reproduce. Run Chrome, attach gdb to the renderer and put a breakpoint on exit() or an exit handler (in my case gcov_exit()). It will not be hit when you close Chrome or send a SIGTERM to the main Chrome process. > RendererMain() certainly looks as if it intends to exit and uninitialize > at some point. > > I experimented with just adding a signal handler here and calling exit() >> and just calling exit() instead of _exit in the other file. Both need to >> be in place for it to work. >> > > That sounds wrong. In the presence of threads, it is generally not safe to > call exit(); that's why _exit() was invented. > I have verified that my patch actually does work every time I have tested it (tens of times). But there may be subtle issues that are wrong with it. Could you provide some test case for me to test it against? > > You almost always will have to either a) shut down all threads and then > call exit() from the remaining main thread, or b) call _exit() and forgo > the use of exit handlers. This is pretty much the same problem that you > would encounter with signal handlers; for the purposes of this discussion, > you can think of a signal handler as something very similar to a new > short-lived thread. > > Doing anything else is guaranteed to result in deadlocks and data > corruption. Depending on what the other threads are doing when you are > shutting down the program, you might get lucky. You might even get lucky > most of the time. But this is not a bug we want to check into Chrome. I'd > hate to be the one who has to debug this problem. > I don't want to check-in something to Chrome which causes issues like delaying exit, etc. If you look at the comment above the call to _exit() it provides justification to using _exit(). This is why I added a define which I will only enable when I am building the instrumented version (-fprofile-generate in CXXFLAGS). > > Since you're more familiar with the codebase, what do you suggest I do >> to make it cleanly exit instead of the current situation where both an >> uncaught SIGTERM and _exit() call try to terminate the process around >> the same time. >> > > Both uncaught SIGTERM and _exit() actually do the right thing. They shut > the program (i.e. all threads) down, without running any exit handlers. > That's a safe operation -- but apparently not what you need... > > So, I think, I need to understand a little more about the actual problem > that you want to solve. Why is this suddenly becoming a problem? Do we > normally send SIGTERM to the renderers? Shouldn't we ask them to shut down > normally (presumably by either sending them an IPC, or by closing the IPC > channel, which implicitly notifies the other end of the channel)? In fact, > the kernel ensures that closing the IPC channel happens automatically, if > the browser process terminated without doing anything special. > Now that you hopefully understand the situation, let's come up with the design and I can try to author the patch. Thanks, > > > Markus >
We really need input from people who have looked at this code in more detail (unfortunately, that probably means Evan). But here are some suggestions that might get you started on how to do this properly. In the vast majority of cases, the renderer will eventually return to the main loop. So, if we are interested in running exit handlers (e.g. when profiling), we should not call _exit() from the SuicideOnChannelErrorFilter. If we just continue, we will hopefully shut down properly in a couple of seconds. Of course, there will be rare cases, when we are in fact hung and don't exit properly. You can probably address this problem by calling something like "alarm(30)" in the SuicideOnChannelErrorFilter. This will trigger a SIGALRM after 30s, and I believe we keep the default behavior, which will end up terminating the program. Of course, you now need to double-check that the browser does in fact try to shut down the renderer properly, instead of sending it a SIGTERM. If it doesn't do that, we have to figure out how to fix that -- and again, this might have to be code that is different between the normal case and the code that runs when profiling. Markus |