|
|
Chromium Code Reviews|
Created:
11 years ago by viettrungluu Modified:
9 years, 6 months ago CC:
chromium-reviews_googlegroups.com, John Grabowski, pam+watch_chromium.org, ben+cc_chromium.org Visibility:
Public. |
DescriptionMac: for branded Google Chrome, handle SIGTERM in renderer to clean up Breakpad.
Test:
1. Build a (branded) Google Chrome.
2. From the command-line, run: "launchctl bslist | wc -l" and note the number.
3. Start Google Chrome on a fresh profile.
4. Once a browser window has opened, do #2 again, and note the number -- it should have increased (by 2).
5. Close the tab, and do #2 again. The number should have decreased by 1.
6. Open tabs, navigate, etc., and do #2 between each stage. Whenever a new (e.g., renderer) process is started, the number should increase (e.g., when you go to a "new" site). When a process is terminated (e.g., you close the last tab for a particular site), the number should decrease.
7. Quit, and do #2 again. Hopefully, the number should be the same as the first time.
(Note: when a renderer crashes, the number will not decrease -- so the number after quitting will not be the same as before starting. This is why this is only a stopgap. For comparison, do this on a branded Google Chrome without this patch; typically, the numbers will not decrease on renderer termination.)
BUG=28547
TEST=See above.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=34318
Patch Set 1 #
Total comments: 1
Patch Set 2 : Added missing _exit(). #Patch Set 3 : Moved call to DestructCrashReporter() outside signal handler (into thread). #
Total comments: 1
Patch Set 4 : Updated per Mark's comment. #
Total comments: 14
Patch Set 5 : Updated per Mark's review. #Patch Set 6 : Oops, I wasn't done. #Patch Set 7 : foo #
Total comments: 8
Patch Set 8 : Updated per Mark's re-review. #Messages
Total messages: 34 (0 generated)
Just sending this out to a bunch of people who may be around soon-ish and have the expertise to look at it. N.B.: I haven't tested it at all on a branded build (and have only checked that, with obvious modifications, that the signal handler gets called in a Chromium build)....
Oops. Meant to include Avi, who did all the important legwork in tracking the problem down.
>> I haven't tested it at all on a branded build Since all your changes are in GOOGLE_CHROME_BUILD, does that basically mean you have never really compiled this? http://codereview.chromium.org/484007/diff/1/2 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/1/2#newcode50 chrome/renderer/renderer_main.cc:50: DestructCrashReporter(); There are specific rules about what can be done in a signal handler (e.g. can't call malloc, or exit). We'd need to look at the source of DestructCrashReporter to know if it's safe to call. Also, presumably SIGTERM means "renderer should die", and the default handler does an exit. We don't actually die in this handler but I think we want to. Yes?
On 2009/12/10 06:57:05, John Grabowski wrote: > >> I haven't tested it at all on a branded build > > Since all your changes are in GOOGLE_CHROME_BUILD, does that basically mean you > have never really compiled this? This? No. I've only compiled versions with the && defined(GOOGLE_CHROME_BUILD) commented out (or, more recently, by just #define-ing GOOGLE_CHROME_BUILD inside that file), and the DestructCrashReporter() replaced with something else. > http://codereview.chromium.org/484007/diff/1/2 > File chrome/renderer/renderer_main.cc (right): > > http://codereview.chromium.org/484007/diff/1/2#newcode50 > chrome/renderer/renderer_main.cc:50: DestructCrashReporter(); > There are specific rules about what can be done in a signal handler (e.g. can't > call malloc, or exit). We'd need to look at the source of DestructCrashReporter > to know if it's safe to call. Right. If not, we can use a mechanism like the one in browser_main.cc. > Also, presumably SIGTERM means "renderer should die", and the default handler > does an exit. We don't actually die in this handler but I think we want to. > Yes? Oops. That's embarrassing.
On 2009/12/10 07:29:34, viettrungluu wrote: > > http://codereview.chromium.org/484007/diff/1/2#newcode50 > > chrome/renderer/renderer_main.cc:50: DestructCrashReporter(); > > There are specific rules about what can be done in a signal handler (e.g. > can't > > call malloc, or exit). We'd need to look at the source of > DestructCrashReporter > > to know if it's safe to call. > > Right. If not, we can use a mechanism like the one in browser_main.cc. It calls BreakpadRelease(), which doesn't look so safe. Back to the drawing board, I guess.
New version's up. It's an awful cut-and-paste job, which should be fixed properly. Again, I haven't tested it on a branded build, but with a |#define GOOGLE_CHROME_BUILD| (in the file) and |DestructCrashReporter();| replaced (with, e.g., |write(1,"foo\n",4u);|), it seems to work on a Chromium debug build.
(disclaimer: I may be misunderstanding the issue in which case I hope Mark, Avi or Neal will correct me) This bug causes Chrome to gradually degrade global system state which is really bad. We should take this seriously and make sure we fix it properly. Even if we use a temporary stopgap solution we should IMHO keep this high priority until we have a good fix in place. A few points: 1. We need to make sure whatever solution we have here is backed up with unit tests which test the following: a. Graceful shutdown of child processes. b. Process killed via sigkill in which case you can't catch this from the dying process. 2. Can't the leak also happen with other process types apart from the Renderer? Breakpad is enabled for Browser/Renderer/Plugin/Utility/Worker processes. Nealsid: Regarding a permanent solution, is there a way to make Breakpad only allocate one port globally? This would solve the problem since we could only ever leak a single port. Suggestions for a temporary stopgap solution: Is it possible for the browser process to handle cleanup for subprocesses? Perhaps each subprocess could send back the bootstrap port to the browser on startup and the browser process could run a thread that uses kqueues to monitor child process termination, each time a child process dies we make sure to kill the relevant bootstrap port.
drive by: are we gonna have this issue in the utility process also? (or any other process type used) Should we this in some help that actually wraps breakpad startup/shutdown so it can be done for all processes?
On 2009/12/10 13:22:13, TVL wrote: > drive by: are we gonna have this issue in the utility process also? This is why I'm worried about this way of handling it. If Breakpad is modifying global system state, it should be up to Breakpad to ensure that no matter how the process terminates that things get cleaned up. Forcing clients to deal with the situation leads to either duplicated code (if everyone does it right) or unhandled problems (if not everyone does).
I haven't read this carefully, but I saw that there was concern over the macro being used, and sure enough the concern was well-placed. Trung, please adapt the change per my comment. I'd like you to test this with a Breakpad-enabled build. Talk to me if you're not sure how to produce one. http://codereview.chromium.org/484007/diff/5001/5 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/5001/5#newcode42 chrome/renderer/renderer_main.cc:42: #if defined(OS_MACOSX) && defined(GOOGLE_CHROME_BUILD) This is the wrong way to say "am I using Breakpad?" You need to make a runtime call: IsCrashReporterEnabled(), from chrome/app/breakpad_mac.h. The implementation either comes from breakpad_mac.mm (if Breakpad is compiled in) or breakpad_mac_stubs.mm (otherwise). It's impossible to make the assumption based on preprocessor macros.
On 2009/12/10 15:27:33, Avi wrote: > On 2009/12/10 13:22:13, TVL wrote: > > drive by: are we gonna have this issue in the utility process also? > > This is why I'm worried about this way of handling it. If Breakpad is modifying > global system state, it should be up to Breakpad to ensure that no matter how > the process terminates that things get cleaned up. Forcing clients to deal with > the situation leads to either duplicated code (if everyone does it right) or > unhandled problems (if not everyone does). Agreed. Let me address various issues which have been raised all at once: 1. The renderer in particular can be terminated by SIGTERM (look for things associated to sudden_termination_allowed_ in renderer_host/). This seems to be the common case, and seems to me to be a reasonable optimization to speed things up (by just dropping the render process on the floor and letting the system clean up). The theory is/should be that, since the renderer is sandboxed, killing it really shouldn't affect things outside (disclaimer: I know very little about our sandboxing). This makes it disturbing that the renderer dying can cause a leak of global resources. It opens a route towards a denial-of-service attack (albeit a really slow one; not necessarily that slow, even, if a web page can figure out how to get renderers spawned off). 2. We have this problem -- leaking on "abnormal" termination -- with other processes as well. It's just that it's not as acute (and self-inflicted) as it is with the renderer. (I don't know if we use kills as a normal way to terminate other processes.) 3. I'm not happy with a design which allows this. Do we really need to install signal handlers on all our processes just to ensure that Breakpad cleans up? This seems like bad design. Moreover, there'd be nothing we could ever do for SIGKILLs. 4. As a result, my feeling is that the real solution is to fix Breakpad....
On Thu, Dec 10, 2009 at 7:27 AM, <avi@chromium.org> wrote: > On 2009/12/10 13:22:13, TVL wrote: > >> drive by: are we gonna have this issue in the utility process also? >> > > This is why I'm worried about this way of handling it. If Breakpad is > modifying > global system state, it should be up to Breakpad to ensure that no matter > how > the process terminates that things get cleaned up. Forcing clients to deal > with > the situation leads to either duplicated code (if everyone does it right) > or > The problem with a Framework taking over global signals handlers for the process in order to clean up is that I still can't guarantee they'll be called, because someone might override them later, and if I am overriding a signal handler the process has already set up I need to deal with messy signal handler chaining, which was a source of bugs in the previous Linux version. An atexit handler might work, but I'm not even sure if that gets called when unhandled signals terminate a process. Additionally, it seems to me that Chrome has a bunch of code it needs to execute on renderer shutdown, which it hasn't been. So, given that the duplicated code you are mentioning for all clients to tear down Breakpad properly is something like this: BreakpadRelease(breakpadRef); I believe it is quite reasonable for clients of Breakpad to control when they initialize AND when they teardown Breakpad. Especially given Chrome's unique process teardown model. Neal > unhandled problems (if not everyone does). > > > http://codereview.chromium.org/484007 >
On 2009/12/10 16:20:01, nealsid wrote: > Additionally, it seems to me that Chrome has a bunch of code it needs to > execute on renderer shutdown, which it hasn't been. The renderer should not need to do anything, ever, by design (this is the theory, evidently not the practice), which is why killing it abruptly should be safe and not leak resources.
By definition we may kill renderers (misbehaved/hung processes, etc). This may also not be that rare of an event (e.g. a misbehaved plugin on a user's machine). This means that we can't safely assume we can run code at process termination in the dying process. If we could use a single bootstrap port for all Chrome processes that would be ideal. Otherwise a solution where the browser process takes responsibility for closing any leaked ports would be best. Best regards, Jeremy On Thu, Dec 10, 2009 at 6:19 PM, Neal Sidhwaney <nealsid@gmail.com> wrote: > > > On Thu, Dec 10, 2009 at 7:27 AM, <avi@chromium.org> wrote: > >> On 2009/12/10 13:22:13, TVL wrote: >> >>> drive by: are we gonna have this issue in the utility process also? >>> >> >> This is why I'm worried about this way of handling it. If Breakpad is >> modifying >> global system state, it should be up to Breakpad to ensure that no matter >> how >> the process terminates that things get cleaned up. Forcing clients to deal >> with >> the situation leads to either duplicated code (if everyone does it right) >> or >> > > The problem with a Framework taking over global signals handlers for the > process in order to clean up is that I still can't guarantee they'll be > called, because someone might override them later, and if I am overriding a > signal handler the process has already set up I need to deal with messy > signal handler chaining, which was a source of bugs in the previous Linux > version. > > An atexit handler might work, but I'm not even sure if that gets called > when unhandled signals terminate a process. > > Additionally, it seems to me that Chrome has a bunch of code it needs to > execute on renderer shutdown, which it hasn't been. So, given that the > duplicated code you are mentioning for all clients to tear down Breakpad > properly is something like this: > > BreakpadRelease(breakpadRef); > > I believe it is quite reasonable for clients of Breakpad to control when > they initialize AND when they teardown Breakpad. Especially given Chrome's > unique process teardown model. > > Neal > > > > >> unhandled problems (if not everyone does). >> >> >> http://codereview.chromium.org/484007 >> > >
Here's my take: We need to tolerate abnormal process toleration. This can be SIGTERM, or another signal, or exit/_exit without returning to main. Doing a signal handler to avoid leaking a global resource is a band-aid. I won't block it, and it will solve the common way for us to leak, but I don't think that it's ultimately the best solution. If we do implement this band-aid, it should certainly be Chrome-side, and not Breakpad-side. Breakpad shouldn't interfere with signal handlers at all, because the application might want to do some other stuff with signal handlers. I haven't looked closely enough to determine if there's a way to get Breakpad to not leak these ports - that is, I don't know what non-band-aid solutions are possible yet. Mark
On 2009/12/10 15:33:05, Mark Mentovai wrote:
> I haven't read this carefully, but I saw that there was concern over the macro
> being used, and sure enough the concern was well-placed.
>
> Trung, please adapt the change per my comment.
Done. Should we also change it around the instance of DestructCrashReporter() in
chrome_dll_main.cc?
> I'd like you to test this with a
> Breakpad-enabled build.
Tested. The bandaid (and I agree that that's what it is) seems to work, at least
for the common case:
$ launchctl bslist | wc -l
118
$ xcodebuild/Release/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--user-data-dir=/tmp/foo &
[1] 43387
$ launchctl bslist | wc -l
120
$ launchctl bslist | wc -l [after closing a tab]
119
$ launchctl bslist | wc -l [after opening a tab]
120
$ launchctl bslist | wc -l [after quitting]
118
[1]+ Done xcodebuild/Release/Google\
Chrome.app/Contents/MacOS/Google\ Chrome --user-data-dir=/tmp/foo
$
Extra irony value: to build, I had to reboot due to leakage from the beta
channel Chrome.
Trung wrote: > Done. Should we also change it around the instance of > DestructCrashReporter() in > chrome_dll_main.cc? Certainly.
To comment on Jeremy's idea of using one port - it's a good one and might be feasible. Right now we automatically deallocate the port upon a crash, because otherwise the launched process(the Inspector which does the crashdump generation) will hang until the crashing process exits, which obviously cannot work. Deallocating the bootstrap port during a crash can't work if one port is shared among all services, but it's likely that our use of the bootstrap services could be tweaked. I need to play around with it a little, but it's something on the order of at least a week, since I have commitments in my other project(I'm 50% allocated to Crash this quarter). Neal On Thu, Dec 10, 2009 at 10:51 AM, Mark Mentovai <mark@chromium.org> wrote: > Trung wrote: > > Done. Should we also change it around the instance of > > DestructCrashReporter() in > > chrome_dll_main.cc? > > Certainly. >
I think we all agree that the right thing to do is to fix Breakpad, possibly along the lines suggested by Jeremy. As a bandaid to stop the bleeding (as in, we're killing machines running Snow Leopard now), I think we should go with this, so could you guys take a closer look? This patch is far from foolproof, but should stop the bleeding enough. It's built to be as simple as possible; I wouldn't want a more complicated patch which would require more QA, especially with a proper fix hopefully coming soon. (If anyone has ideas for an equally simple but better stopgap, I'm all ears.) Notes: - I set up a signal handler, which unblocks a thread which will release Breakpad and shut down. - If that thread fails to shut down the renderer (and, e.g., hangs), then the renderer should go through a normal shut down. (I've tested that it does.) - If that fails to happen, the browser process should kill off the renderer after a few seconds (if the browser is still around). We can be reasonably sure of the renderer dying.
In http://crbug.com/28547, jrg said: > Hmm. I tried your test and agree; we never clean up. For my test we die in _exit as called from > SuicideOnChannelErrorFilter::OnChannelError(). (I don't know if that means we always kill renderers by > triggering an error, such as by closing the fd, or if so if that is expected.) Do we need to do anything to handle an _exit from there? http://codereview.chromium.org/484007/diff/5002/7 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/5002/7#newcode43 chrome/renderer/renderer_main.cc:43: // TODO(viettrungluu): Code taken from browser_main.cc (with a bit of editing). You must use an anonymous namespace (or mark everything static): namespace { http://codereview.chromium.org/484007/diff/5002/7#newcode46 chrome/renderer/renderer_main.cc:46: int g_shutdown_pipe_read_fd = -1; This variable is not needed and should be removed. (It was removed from the file where you copied this from, but then backed out for unrelated reasons. Eventually it'll be re-removed.) http://codereview.chromium.org/484007/diff/5002/7#newcode48 chrome/renderer/renderer_main.cc:48: // Common code between SIG{HUP, INT, TERM}Handler. No. Maybe just move the whole body of GracefulShutdownHandler into SIGTERMHandler. We don't handle any other signals in here. http://codereview.chromium.org/484007/diff/5002/7#newcode90 chrome/renderer/renderer_main.cc:90: ShutdownDetector::ShutdownDetector(int shutdown_fd) You can write this inline in the class definition above. http://codereview.chromium.org/484007/diff/5002/7#newcode92 chrome/renderer/renderer_main.cc:92: CHECK(shutdown_fd_ != -1); CHECK_NE? http://codereview.chromium.org/484007/diff/5002/7#newcode95 chrome/renderer/renderer_main.cc:95: void ShutdownDetector::ThreadMain() { And this too. http://codereview.chromium.org/484007/diff/5002/7#newcode124 chrome/renderer/renderer_main.cc:124: _exit(0); I don't think we want 0. We're always going down on a signal here, right? (Well, maybe not - you will need to check bytes_read before using signal.) See how I re-raise the signal and fall back to _exit using the signal number in browser_main.cc and adapt that, but if bytes_read != sizeof(signal), just _exit(1) (or something else nonzero).
I realize you've tested this manually and I don't want to slow down this patch, but is there any chance you could add a unit test to make sure that at least in the trivial case we aren't leaking these. If it's a ton of work I can live without it but given the severity here I think it would be good to have. On Fri, Dec 11, 2009 at 12:05 AM, <viettrungluu@chromium.org> wrote: > I think we all agree that the right thing to do is to fix Breakpad, > possibly > along the lines suggested by Jeremy. > > As a bandaid to stop the bleeding (as in, we're killing machines running > Snow > Leopard now), I think we should go with this, so could you guys take a > closer > look? This patch is far from foolproof, but should stop the bleeding > enough. > It's built to be as simple as possible; I wouldn't want a more complicated > patch > which would require more QA, especially with a proper fix hopefully coming > soon. > (If anyone has ideas for an equally simple but better stopgap, I'm all > ears.) > > Notes: > - I set up a signal handler, which unblocks a thread which will release > Breakpad > and shut down. > - If that thread fails to shut down the renderer (and, e.g., hangs), then > the > renderer should go through a normal shut down. (I've tested that it does.) > - If that fails to happen, the browser process should kill off the renderer > after a few seconds (if the browser is still around). > We can be reasonably sure of the renderer dying. > > > http://codereview.chromium.org/484007 >
On 2009/12/10 22:15:47, Mark Mentovai wrote: > > Hmm. I tried your test and agree; we never clean up. For my test we die in > _exit as called from > > SuicideOnChannelErrorFilter::OnChannelError(). (I don't know if that means we > always kill renderers by > > triggering an error, such as by closing the fd, or if so if that is expected.) > > Do we need to do anything to handle an _exit from there? Probably we should also release Breakpad when we _exit() here, I think. This isn't the common case though (I never managed to get the stack that John did, or even a call to SuicideOnChannelErrorFilter::OnChannelError() -- it might be that he got this stack due to debugging and timing). This is to say that this would be even harder to test. This makes me a little reluctant to just insert code here.
Thanks, Mark. I'm going to go test it again now.... http://codereview.chromium.org/484007/diff/5002/7 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/5002/7#newcode43 chrome/renderer/renderer_main.cc:43: // TODO(viettrungluu): Code taken from browser_main.cc (with a bit of editing). On 2009/12/10 22:15:47, Mark Mentovai wrote: > You must use an anonymous namespace (or mark everything static): > > namespace { Done. http://codereview.chromium.org/484007/diff/5002/7#newcode46 chrome/renderer/renderer_main.cc:46: int g_shutdown_pipe_read_fd = -1; On 2009/12/10 22:15:47, Mark Mentovai wrote: > This variable is not needed and should be removed. > > (It was removed from the file where you copied this from, but then backed out > for unrelated reasons. Eventually it'll be re-removed.) Done. http://codereview.chromium.org/484007/diff/5002/7#newcode48 chrome/renderer/renderer_main.cc:48: // Common code between SIG{HUP, INT, TERM}Handler. On 2009/12/10 22:15:47, Mark Mentovai wrote: > No. > > Maybe just move the whole body of GracefulShutdownHandler into SIGTERMHandler. > We don't handle any other signals in here. Done. http://codereview.chromium.org/484007/diff/5002/7#newcode90 chrome/renderer/renderer_main.cc:90: ShutdownDetector::ShutdownDetector(int shutdown_fd) On 2009/12/10 22:15:47, Mark Mentovai wrote: > You can write this inline in the class definition above. Done. http://codereview.chromium.org/484007/diff/5002/7#newcode92 chrome/renderer/renderer_main.cc:92: CHECK(shutdown_fd_ != -1); On 2009/12/10 22:15:47, Mark Mentovai wrote: > CHECK_NE? No such thing, apparently. http://codereview.chromium.org/484007/diff/5002/7#newcode95 chrome/renderer/renderer_main.cc:95: void ShutdownDetector::ThreadMain() { On 2009/12/10 22:15:47, Mark Mentovai wrote: > And this too. Done. http://codereview.chromium.org/484007/diff/5002/7#newcode124 chrome/renderer/renderer_main.cc:124: _exit(0); On 2009/12/10 22:15:47, Mark Mentovai wrote: > I don't think we want 0. We're always going down on a signal here, right? > (Well, maybe not - you will need to check bytes_read before using signal.) See > how I re-raise the signal and fall back to _exit using the signal number in > browser_main.cc and adapt that, but if bytes_read != sizeof(signal), just > _exit(1) (or something else nonzero). Done.
If you cannot repro my stack, that's fine. The hope is that "normal" cases are cleaned up (band-aid), and that the real fix is in breakpad. Presumably, then, my stack was an abnormal case which won't be band-aided. jrg On Thu, Dec 10, 2009 at 3:51 PM, <viettrungluu@chromium.org> wrote: > On 2009/12/10 22:15:47, Mark Mentovai wrote: > >> > Hmm. I tried your test and agree; we never clean up. For my test we >> die in >> _exit as called from >> > SuicideOnChannelErrorFilter::OnChannelError(). (I don't know if that >> means >> > we > >> always kill renderers by >> > triggering an error, such as by closing the fd, or if so if that is >> > expected.) > > Do we need to do anything to handle an _exit from there? >> > > Probably we should also release Breakpad when we _exit() here, I think. > > This isn't the common case though (I never managed to get the stack that > John > did, or even a call to SuicideOnChannelErrorFilter::OnChannelError() -- it > might > be that he got this stack due to debugging and timing). This is to say that > this > would be even harder to test. > > This makes me a little reluctant to just insert code here. > > > http://codereview.chromium.org/484007 >
LGTM pending positive test results and these minor changes. http://codereview.chromium.org/484007/diff/8001/8002 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/8001/8002#newcode73 chrome/renderer/renderer_main.cc:73: RAW_LOG(INFO, I guess I didn't read Will's string literal carefully enough in the other file when I reviewed that, but the text here is wrong in that the signal handler change comes before the pipe write. Just revise the literal here. For that matter, change all of the strings you're using here so that they don't match what the other file uses. Reason? RAW_LOG just puts a string out to stderr. There isn't any file or line number or anything else on the message. If we ever need to use these messages for diagnosis, we'll need a way to distinguish them from the messages in the other file. http://codereview.chromium.org/484007/diff/8001/8002#newcode102 chrome/renderer/renderer_main.cc:102: LOG(INFO) << "Handling shutdown for signal " << signal << "."; |signal| may be garbage, don't use it without checking bytes_read. http://codereview.chromium.org/484007/diff/8001/8002#newcode185 chrome/renderer/renderer_main.cc:185: // We need to handle SIGTERM when we're using Breakpad, otherwise Mach ports Sentences are almost always clearer and more concise without "we." This one would be better as: // When Breakpad is in use, handle SIGTERM to avoid leaking Mach ports. You should put a bug reference either in this comment, or in the larger comment up at the top inside your first #ifdef, or both.
LGTM I agree with Jeremy's sentiment about a unit test but the expectation is that this code gets removed in 2 weeks. Yes? Because of that I would also be happy if you could describe in detail how to test this (e.g. add steps with launchctl bslist commands) for krisr et al to hit manually. http://codereview.chromium.org/484007/diff/8001/8002 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/8001/8002#newcode62 chrome/renderer/renderer_main.cc:62: RAW_CHECK(g_shutdown_pipe_write_fd != -1); if (g_shutdown_pipe_write_fd == -1) _exit(1);
Thanks, Mark and John. Jeremy, I agree that testing would be good, but it's also kind of tough in this case (at least in a standard unit test). (To make things worse, checking Mach ports directly requires permissions, IIRC.) Probably the easiest test would be an automated version of the one now described in this CL's description.... http://codereview.chromium.org/484007/diff/8001/8002 File chrome/renderer/renderer_main.cc (right): http://codereview.chromium.org/484007/diff/8001/8002#newcode62 chrome/renderer/renderer_main.cc:62: RAW_CHECK(g_shutdown_pipe_write_fd != -1); On 2009/12/11 00:07:28, John Grabowski wrote: > if (g_shutdown_pipe_write_fd == -1) _exit(1); After discussion (in person), we decided the RAW_CHECK() should be okay. http://codereview.chromium.org/484007/diff/8001/8002#newcode73 chrome/renderer/renderer_main.cc:73: RAW_LOG(INFO, On 2009/12/11 00:02:08, Mark Mentovai wrote: > I guess I didn't read Will's string literal carefully enough in the other file > when I reviewed that, but the text here is wrong in that the signal handler > change comes before the pipe write. Just revise the literal here. > > For that matter, change all of the strings you're using here so that they don't > match what the other file uses. Reason? RAW_LOG just puts a string out to > stderr. There isn't any file or line number or anything else on the message. > If we ever need to use these messages for diagnosis, we'll need a way to > distinguish them from the messages in the other file. Done. http://codereview.chromium.org/484007/diff/8001/8002#newcode102 chrome/renderer/renderer_main.cc:102: LOG(INFO) << "Handling shutdown for signal " << signal << "."; On 2009/12/11 00:02:08, Mark Mentovai wrote: > |signal| may be garbage, don't use it without checking bytes_read. Done. http://codereview.chromium.org/484007/diff/8001/8002#newcode185 chrome/renderer/renderer_main.cc:185: // We need to handle SIGTERM when we're using Breakpad, otherwise Mach ports On 2009/12/11 00:02:08, Mark Mentovai wrote: > Sentences are almost always clearer and more concise without "we." This one > would be better as: > > // When Breakpad is in use, handle SIGTERM to avoid leaking Mach ports. > > You should put a bug reference either in this comment, or in the larger comment > up at the top inside your first #ifdef, or both. Done.
is this signal hander what is causing problems for browser_tests in release?
or the tab switching test on the perf bots?
On 2009/12/11 14:28:43, TVL wrote: > or the tab switching test on the perf bots? Not unless it changed the fabric of time? It was committed r34318. The tab switching test began failing r34316-r34317. The browser_tests began failing much earlier, though it's interesting that they accumulated more failures over time. It's not impossible that this signal handler is partly responsible, but I doubt it. Seems like something screwy is going on.
On Fri, Dec 11, 2009 at 8:26 AM, <viettrungluu@chromium.org> wrote: > On 2009/12/11 14:28:43, TVL wrote: > >> or the tab switching test on the perf bots? >> > > Not unless it changed the fabric of time? Perhaps you could write a unit test to confirm the fabric of time was not changed, just to be sure? jrg > It was committed r34318. The tab > switching test began failing r34316-r34317. > > The browser_tests began failing much earlier, though it's interesting that > they > accumulated more failures over time. It's not impossible that this signal > handler is partly responsible, but I doubt it. Seems like something screwy > is > going on. > > > http://codereview.chromium.org/484007 >
Howdy folks, it looks like breakpad is initialized in chrome_dll_main.cc's ChromeMain() (which then calls RendererMain(), PluginMain(), UtilityMain(), etc, depending on the process type). Shouldn't this cleanup stuff be done for all child process types and not just for the renderers? Nico
Nico, you are correct. Although killing a renderer might be more common, exactly the same thing can happen for all other process types. On Sat, Dec 26, 2009 at 4:31 AM, <thakis@chromium.org> wrote: > Howdy folks, > > it looks like breakpad is initialized in chrome_dll_main.cc's ChromeMain() > (which then calls RendererMain(), PluginMain(), UtilityMain(), etc, > depending on > the process type). Shouldn't this cleanup stuff be done for all child > process > types and not just for the renderers? > > Nico > > http://codereview.chromium.org/484007 >
To elaborate on what Jeremy said: - It is standard operating procedure to kill renderers (instead of telling them to quit), so that is an extremely common case. - I've seen code which will kill a plugin process, but I don't know if it's used on Mac (at all; in any case, killing a plugin process doesn't seem to be a common case). - I don't know about other process types, but I don't think killing any of them is a common case. - Of course, killing these processes should not result in any leakage. The correct solution is to fix Breakpad so that no cleanup is required, except perhaps in the browser process. (Neal is working on this.) Jeremy Moskovich wrote: > Nico, you are correct. > > Although killing a renderer might be more common, exactly the same thing > can happen for all other process types. > > > On Sat, Dec 26, 2009 at 4:31 AM, <thakis@chromium.org > <mailto:thakis@chromium.org>> wrote: > > Howdy folks, > > it looks like breakpad is initialized in chrome_dll_main.cc's > ChromeMain() > (which then calls RendererMain(), PluginMain(), UtilityMain(), etc, > depending on > the process type). Shouldn't this cleanup stuff be done for all > child process > types and not just for the renderers? > > Nico > > http://codereview.chromium.org/484007 > > |
