|
|
Created:
4 years, 5 months ago by Łukasz Anforowicz Modified:
4 years, 5 months ago CC:
chromium-reviews, asanka, darin-cc_chromium.org, jam Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAvoid crashing if MHTMLGenerationManager::JobFinished is called twice.
This CL makes sure that MHTMLGenerationManager::JobFinished is
idempotent and doesn't do anything when called for a second time.
This avoids the crash from the linked bug.
BUG=612098
Committed: https://crrev.com/a668d7811c2c19662cbeec9e10a3e283292445e5
Cr-Commit-Position: refs/heads/master@{#407244}
Patch Set 1 #
Total comments: 4
Patch Set 2 : Extra IPC sanitization + simpler idempotency. #Patch Set 3 : Repro test for the crash. #Patch Set 4 : Posting the injection ->UI->FILE->UI. #
Messages
Total messages: 26 (8 generated)
lukasza@chromium.org changed reviewers: + dewittj@chromium.org, rdsmith@chromium.org
Randy + Justin, can you take a look? I've confirmed (with some printf-style debugging) that the problem is indeed that JobFinished is called from MHTMLGenerationManager::RenderProcessExited shortly after it was already called from MHTMLGenerationManager::OnSerializeAsMHTMLResponse. To fix the crash, we need to avoid destroying the job for the second time in MHTMLGenerationManager::OnFileClosed. I think we can do it by 1) checking result of FindResult in OnFileClosed, 2) exiting early from JobFinished if it is called a second time. I think that #2 is preferrable because A) it puts the fix closer to the source of trouble (the JobFinished method being called twice) and B) it allows keeping the NOTREACHED assert in FindJob method. #2 ends up being a bit verbose, but I think it is okay (i.e. I think it is actually nice that things are explicitly spelled by Job::is_finished and Job::MarkAsFinished). For a little while I wondered if I can make JobFinished idempotent without introducing a separate Job::is_finished_ field, but it felt hacky and difficult to understand later on. WDYT?
this lgtm, but it all makes me wonder if there is a better ownership model for Job that would make this less of a problem. https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... File content/browser/download/mhtml_generation_manager.cc (right): https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... content/browser/download/mhtml_generation_manager.cc:404: job->MarkAsFinished(); May want to clear |job|'s |observed_renderer_process_host_| as well, to prevent more spurious callbacks.
LGTM either way, but if it would work to just stop observation on JobFinished(), do consider that option w/ downgrading the is_finished_ to assertion fodder (or eliminating it if you want, but it might well be worth keeping for assertion fodder). https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... File content/browser/download/mhtml_generation_manager.cc (right): https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... content/browser/download/mhtml_generation_manager.cc:404: job->MarkAsFinished(); On 2016/07/20 16:37:48, dewittj wrote: > May want to clear |job|'s |observed_renderer_process_host_| as well, to prevent > more spurious callbacks. This is a good point, but wouldn't this solve this problem in a different (possibly cleaner) way? If you stop observing things when the job is signaled as finished on the UI thread, you remove the possibility of getting further spurious notifications and keep the flow simple. (I have no objection to adding the is_finished_ flag and using it in DCHECKs or CHECKs to enforce the behavior, but I'm pulled towards a model that naturally guarantees a single call to JobFinished because of the state machine rather than one that handles idempotency.)
Oh, whoops, two followon comments: * Test for this case? * When reading the code in order to do this review I found myself tripping over the concept of job before quite getting it. Suggestion: Consider adding a sentence (either in the header file or the .md documentation) indicating that MHTMLGenerationManager is a singleton and there is one job per requested save operation which interacts with all relevant renderers?
Can you take another look please? On 2016/07/20 16:48:20, Randy Smith - Not in Fridays wrote: > Oh, whoops, two followon comments: > > * Test for this case? I am hoping that regressions would be caught by ClusterFuzz. I am having a difficult time thinking how to write a new, explicit testcase for this crash. Two reasons: - The repro is inherently racey - I don't understand what ClusterFuzz does to cause the renderer to exit Actually, after some more thinking I thought that I could shut down RenderProcessHost in a BrowserMessageFilter that watches out for MHTML responses from the renderer (*). This seemed like a good idea, but despite verifying that MHTMLGenerationManager::RenderProcessExited gets called I cannot repro the crash in 100 iterations... :-/ (*) class GenerateMHTMLAndExitRendererMessageFilter : public BrowserMessageFilter { public: GenerateMHTMLAndExitRendererMessageFilter( RenderProcessHostImpl* render_process_host) : BrowserMessageFilter(FrameMsgStart), render_process_host_(render_process_host) {} protected: ~GenerateMHTMLAndExitRendererMessageFilter() override {} private: bool OnMessageReceived(const IPC::Message& message) override { if (message.type() == FrameHostMsg_SerializeAsMHTMLResponse::ID) { LOG(ERROR) << "TEST FILTER KICKING IN"; BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, base::Bind( base::IgnoreResult(&RenderProcessHostImpl::FastShutdownIfPossible), base::Unretained(render_process_host_))); } return false; }; RenderProcessHostImpl* render_process_host_; DISALLOW_COPY_AND_ASSIGN(GenerateMHTMLAndExitRendererMessageFilter); }; // Regression test for the crash/race from https://crbug.com/612098. IN_PROC_BROWSER_TEST_F(MHTMLGenerationTest, GenerateMHTMLAndExitRenderer) { NavigateToURL(shell(), embedded_test_server()->GetURL("/simple_page.html")); RenderProcessHostImpl* render_process_host = static_cast<RenderProcessHostImpl*>( shell()->web_contents()->GetRenderProcessHost()); scoped_refptr<BrowserMessageFilter> filter = new GenerateMHTMLAndExitRendererMessageFilter(render_process_host); render_process_host->AddFilter(filter.get()); base::FilePath path(temp_dir_.path()); path = path.Append(FILE_PATH_LITERAL("test.mht")); GenerateMHTMLForCurrentPage(MHTMLGenerationParams(path)); EXPECT_GT(ReadFileSizeFromDisk(path), 100); // Verify the actual file size. } > * When reading the code in order to do this review I found myself tripping over > the concept of job before quite getting it. Suggestion: Consider adding a > sentence (either in the header file or the .md documentation) indicating that > MHTMLGenerationManager is a singleton and there is one job per requested save > operation which interacts with all relevant renderers? Done - I've updated mhtml_generation_manager.h and save-page-as.md (it already talked a little bit about this in "Classes overview" section, but I've expanded other sections a little bit) https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... File content/browser/download/mhtml_generation_manager.cc (right): https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... content/browser/download/mhtml_generation_manager.cc:404: job->MarkAsFinished(); On 2016/07/20 16:46:58, Randy Smith - Not in Fridays wrote: > On 2016/07/20 16:37:48, dewittj wrote: > > May want to clear |job|'s |observed_renderer_process_host_| as well, to > prevent > > more spurious callbacks. Done. Good point. > > This is a good point, but wouldn't this solve this problem in a different > (possibly cleaner) way? If you stop observing things when the job is signaled > as finished on the UI thread, you remove the possibility of getting further > spurious notifications and keep the flow simple. > > (I have no objection to adding the is_finished_ flag and using it in DCHECKs or > CHECKs to enforce the behavior, but I'm pulled towards a model that naturally > guarantees a single call to JobFinished because of the state machine rather than > one that handles idempotency.) Interesting. This requires ensuring that there is no other way to call JobFinished twice. Before this CL a malicious-or-buggy renderer could send 2 SerializeAsMHTMLResponse(mhtml_generation_in_renderer_succeeded=false) quickly one after the other, and cause similar crash (with no involvement from RenderProcessExited). Let me ensure that |frame_tree_node_id_of_busy_frame_|-related verification is done also in |mhtml_generation_in_renderer_succeeded=false| case. I believe this should mean that JobFinished is only called once per job. Done?
On 2016/07/20 21:10:07, Łukasz Anforowicz wrote: > Can you take another look please? > > On 2016/07/20 16:48:20, Randy Smith - Not in Fridays wrote: > > Oh, whoops, two followon comments: > > > > * Test for this case? > > I am hoping that regressions would be caught by ClusterFuzz. > > I am having a difficult time thinking how to write a new, explicit testcase for > this crash. Two reasons: > - The repro is inherently racey > - I don't understand what ClusterFuzz does to cause the renderer to exit > > Actually, after some more thinking I thought that I could shut down > RenderProcessHost in a BrowserMessageFilter that watches out for MHTML responses > from the renderer (*). This seemed like a good idea, but despite verifying that > MHTMLGenerationManager::RenderProcessExited gets called I cannot repro the crash > in 100 iterations... :-/ Ok, I thought I understood this crash and now I no longer think that. Job::CloseFile turns into a no-op (well, ok, runs the callback with a -1 argument) if the browser_file_ isn't valid, and the browser_file_ is std::move'd to the file thread in that function. So repeated calls to Job::CloseFile should be idempotent (modulo repeated callbacks) as long as job hasn't been deleted. If job has been deleted, FindJob() should return null, which should result in a ReceivedBadMessage() in OnSErializeAsMHTMLResponse. And destruction of the job should result in the removal of the observer. So if the success result happens first this should work fine, and if the failure result happens I'd expect a ReceivedBadMessage() in OnSerializeAsMHTMLResponse. I'm sure there's something basic I'm missing, but I'm now missing something; can you clarify it? In terms of the test, my (obviously incomplete) analysis above suggests that there's probably a difference based on the success/failure ordering, and it sounds like your test should result in failure then success; maybe that's the problem? More generally, if your test were to work I would think it would cause the crash in all cases, so I'd think single stepping through how it interfaces through the code would allow you to figure out where the problem is. At least if you understand how the crash is happening in the first place, which I no longer do :-}. > > (*) > > class GenerateMHTMLAndExitRendererMessageFilter : public BrowserMessageFilter { > public: > GenerateMHTMLAndExitRendererMessageFilter( > RenderProcessHostImpl* render_process_host) > : BrowserMessageFilter(FrameMsgStart), > render_process_host_(render_process_host) {} > > protected: > ~GenerateMHTMLAndExitRendererMessageFilter() override {} > > private: > bool OnMessageReceived(const IPC::Message& message) override { > if (message.type() == FrameHostMsg_SerializeAsMHTMLResponse::ID) { > LOG(ERROR) << "TEST FILTER KICKING IN"; > BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, base::Bind( > base::IgnoreResult(&RenderProcessHostImpl::FastShutdownIfPossible), > base::Unretained(render_process_host_))); > } > > return false; > }; > > RenderProcessHostImpl* render_process_host_; > > DISALLOW_COPY_AND_ASSIGN(GenerateMHTMLAndExitRendererMessageFilter); > }; > > // Regression test for the crash/race from https://crbug.com/612098. > IN_PROC_BROWSER_TEST_F(MHTMLGenerationTest, GenerateMHTMLAndExitRenderer) { > NavigateToURL(shell(), embedded_test_server()->GetURL("/simple_page.html")); > > RenderProcessHostImpl* render_process_host = > static_cast<RenderProcessHostImpl*>( > shell()->web_contents()->GetRenderProcessHost()); > scoped_refptr<BrowserMessageFilter> filter = > new GenerateMHTMLAndExitRendererMessageFilter(render_process_host); > render_process_host->AddFilter(filter.get()); > > base::FilePath path(temp_dir_.path()); > path = path.Append(FILE_PATH_LITERAL("test.mht")); > GenerateMHTMLForCurrentPage(MHTMLGenerationParams(path)); > > EXPECT_GT(ReadFileSizeFromDisk(path), 100); // Verify the actual file size. > } > > > * When reading the code in order to do this review I found myself tripping > over > > the concept of job before quite getting it. Suggestion: Consider adding a > > sentence (either in the header file or the .md documentation) indicating that > > MHTMLGenerationManager is a singleton and there is one job per requested save > > operation which interacts with all relevant renderers? > > Done - I've updated mhtml_generation_manager.h and save-page-as.md (it already > talked a little bit about this in "Classes overview" section, but I've expanded > other sections a little bit) > > https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... > File content/browser/download/mhtml_generation_manager.cc (right): > > https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... > content/browser/download/mhtml_generation_manager.cc:404: job->MarkAsFinished(); > On 2016/07/20 16:46:58, Randy Smith - Not in Fridays wrote: > > On 2016/07/20 16:37:48, dewittj wrote: > > > May want to clear |job|'s |observed_renderer_process_host_| as well, to > > prevent > > > more spurious callbacks. > > Done. Good point. > > > > This is a good point, but wouldn't this solve this problem in a different > > (possibly cleaner) way? If you stop observing things when the job is signaled > > as finished on the UI thread, you remove the possibility of getting further > > spurious notifications and keep the flow simple. > > > > (I have no objection to adding the is_finished_ flag and using it in DCHECKs > or > > CHECKs to enforce the behavior, but I'm pulled towards a model that naturally > > guarantees a single call to JobFinished because of the state machine rather > than > > one that handles idempotency.) > > Interesting. This requires ensuring that there is no other way to call > JobFinished twice. Before this CL a malicious-or-buggy renderer could send 2 > SerializeAsMHTMLResponse(mhtml_generation_in_renderer_succeeded=false) quickly > one after the other, and cause similar crash (with no involvement from > RenderProcessExited). > > Let me ensure that |frame_tree_node_id_of_busy_frame_|-related verification is > done also in |mhtml_generation_in_renderer_succeeded=false| case. I believe > this should mean that JobFinished is only called once per job. > > Done?
Whoops, also one comment. https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... File content/browser/download/mhtml_generation_manager.cc (right): https://codereview.chromium.org/2163073002/diff/1/content/browser/download/mh... content/browser/download/mhtml_generation_manager.cc:404: job->MarkAsFinished(); On 2016/07/20 21:10:07, Łukasz Anforowicz wrote: > On 2016/07/20 16:46:58, Randy Smith - Not in Fridays wrote: > > On 2016/07/20 16:37:48, dewittj wrote: > > > May want to clear |job|'s |observed_renderer_process_host_| as well, to > > prevent > > > more spurious callbacks. > > Done. Good point. > > > > This is a good point, but wouldn't this solve this problem in a different > > (possibly cleaner) way? If you stop observing things when the job is signaled > > as finished on the UI thread, you remove the possibility of getting further > > spurious notifications and keep the flow simple. > > > > (I have no objection to adding the is_finished_ flag and using it in DCHECKs > or > > CHECKs to enforce the behavior, but I'm pulled towards a model that naturally > > guarantees a single call to JobFinished because of the state machine rather > than > > one that handles idempotency.) > > Interesting. This requires ensuring that there is no other way to call > JobFinished twice. Before this CL a malicious-or-buggy renderer could send 2 > SerializeAsMHTMLResponse(mhtml_generation_in_renderer_succeeded=false) quickly > one after the other, and cause similar crash (with no involvement from > RenderProcessExited). > > Let me ensure that |frame_tree_node_id_of_busy_frame_|-related verification is > done also in |mhtml_generation_in_renderer_succeeded=false| case. I believe > this should mean that JobFinished is only called once per job. > > Done? Sure, that sounds good. Thanks for thinking the issue through further than I had. But that sounds exactly right--there's one place the message can come from in the success path (the single frame with the message outstanding to it) one it can come from in the error path, and each path guarantees that the other won't be able to send a message following.
On 2016/07/20 22:47:33, Randy Smith - Not in Fridays wrote: > Ok, I thought I understood this crash and now I no longer think that. > Job::CloseFile turns into a no-op (well, ok, runs the callback with a -1 > argument) if the browser_file_ isn't valid, and the browser_file_ is std::move'd > to the file thread in that function. So repeated calls to Job::CloseFile should > be idempotent (modulo repeated callbacks) as long as job hasn't been deleted. > If job has been deleted, FindJob() should return null, which should result in a > ReceivedBadMessage() in OnSErializeAsMHTMLResponse. And destruction of the job > should result in the removal of the observer. So if the success result happens > first this should work fine, and if the failure result happens I'd expect a > ReceivedBadMessage() in OnSerializeAsMHTMLResponse. I'm sure there's something > basic I'm missing, but I'm now missing something; can you clarify it? You are correct in observing that browser_file_ won't be valid when Job::CloseFile is called for the second time, but it doesn't make it idempotent - in this case CloseFile will *still* call the callback passed as an argument (and this callback is always MHTMLGenerationManager::OnFileClosed where the crash happens).
Randy, I have a repro test in the latest patchset. It consistently fails on my machine if I comment out MarkAsFinished call, but I am not sure how flaky and icky the PostDelayedTask(... 1us) is in practice; FWIW, 1ms doesn't repro.
does a regular PostTask without delay repro the failure?
On 2016/07/21 21:14:45, dewittj wrote: > does a regular PostTask without delay repro the failure? It does not. As Randy pointed out offline we have to achieve the following sequence on the UI thread: 1. Regular handling of the IPC calls JobFinished 2. RendererProcessExited calls JobFinished and synchronously calls OnFileClosed (because file was std::moved) 3. OnFileClosed is called from #1 (and crashes before this CL) When we intercept the IPC message (and are ready to inject the test-trigerred renderer exit), we are on IO thread and have not yet posted to the UI thread a call to the real/product handler of the IPC message. If we called PostTask, then #2 would happen before #1. Calling PostDelayedTask means that #1 and #2 are in the right order. We still have a race between #2 and #3 that depends on the delay (if the delay is too big, then #3 wins and we have no crash because #3 stops observation that triggers #2 [even before this CL]).
On 2016/07/21 21:24:56, Łukasz Anforowicz wrote: > On 2016/07/21 21:14:45, dewittj wrote: > > does a regular PostTask without delay repro the failure? > > It does not. > > As Randy pointed out offline we have to achieve the following sequence on the UI > thread: > 1. Regular handling of the IPC calls JobFinished > 2. RendererProcessExited calls JobFinished and synchronously calls OnFileClosed > (because file was std::moved) > 3. OnFileClosed is called from #1 (and crashes before this CL) > > When we intercept the IPC message (and are ready to inject the test-trigerred > renderer exit), we are on IO thread and have not yet posted to the UI thread a > call to the real/product handler of the IPC message. If we called PostTask, > then #2 would happen before #1. Calling PostDelayedTask means that #1 and #2 > are in the right order. We still have a race between #2 and #3 that depends on > the delay (if the delay is too big, then #3 wins and we have no crash because #3 > stops observation that triggers #2 [even before this CL]). Would it work to bounce the FastShutdownIfPossible task off the file thread before heading over to the UI thread? Actually, thinking it through, what you want to do is send it to the UI thread, and *then* bounce it off the FILE thread. My thinking is that our interception is on the IO thread before the post to the UI thread of the successful result, but we know that successful result will be posted in the same IO processed task as we are in. If we send the shutdown message -> UI -> FILE -> UI, it will initially hit the UI thread before the success message (but the success message will be right behind it in the queue because it'll be dispatched right afterwards from a different message handling function), will hit the FILE thread before the file close message from the UI thread generated by the success message, and (key), then hit the UI thread *before* the return message from the FILE thread indicating success of the file shutdown. There's still a bit of a race in that if for some reason our message gets delayed in getting to the UI thread there's some chance the message back from the FILE thread will get executed first, but I basically can't imagine how that would happen; the IO->UI messages are going to be put right next to each other in the UI queue, and when the UI sends the message to the FILE thread, the next thing it should do is execute our destruction message. WDYT? Please do document the h**l out of whatever you/we end of doing--I pity the poor person who will need to figure out what the heck is going on in this test in the future.
On 2016/07/21 21:34:25, Randy Smith - Not in Fridays wrote: > On 2016/07/21 21:24:56, Łukasz Anforowicz wrote: > > On 2016/07/21 21:14:45, dewittj wrote: > > > does a regular PostTask without delay repro the failure? > > > > It does not. > > > > As Randy pointed out offline we have to achieve the following sequence on the > UI > > thread: > > 1. Regular handling of the IPC calls JobFinished > > 2. RendererProcessExited calls JobFinished and synchronously calls > OnFileClosed > > (because file was std::moved) > > 3. OnFileClosed is called from #1 (and crashes before this CL) > > > > When we intercept the IPC message (and are ready to inject the test-trigerred > > renderer exit), we are on IO thread and have not yet posted to the UI thread a > > call to the real/product handler of the IPC message. If we called PostTask, > > then #2 would happen before #1. Calling PostDelayedTask means that #1 and #2 > > are in the right order. We still have a race between #2 and #3 that depends > on > > the delay (if the delay is too big, then #3 wins and we have no crash because > #3 > > stops observation that triggers #2 [even before this CL]). > > Would it work to bounce the FastShutdownIfPossible task off the file thread > before heading over to the UI thread? Actually, thinking it through, what you > want to do is send it to the UI thread, and *then* bounce it off the FILE > thread. My thinking is that our interception is on the IO thread before the > post to the UI thread of the successful result, but we know that successful > result will be posted in the same IO processed task as we are in. If we send > the shutdown message -> UI -> FILE -> UI, it will initially hit the UI thread > before the success message (but the success message will be right behind it in > the queue because it'll be dispatched right afterwards from a different message > handling function), will hit the FILE thread before the file close message from > the UI thread generated by the success message, and (key), then hit the UI > thread *before* the return message from the FILE thread indicating success of > the file shutdown. There's still a bit of a race in that if for some reason our > message gets delayed in getting to the UI thread there's some chance the message > back from the FILE thread will get executed first, but I basically can't imagine > how that would happen; the IO->UI messages are going to be put right next to > each other in the UI queue, and when the UI sends the message to the FILE > thread, the next thing it should do is execute our destruction message. WDYT? Hmmm... indeed - this works and consistently repros the crash. I've got this in patchset #4. > Please do document the h**l out of whatever you/we end of doing--I pity the poor > person who will need to figure out what the heck is going on in this test in the > future. Done in a big comment in the ...browsertest.cc file. I wonder how easy or difficult it will be to preserve this test after mojofication... :-/
I assume it is okay to land this CL in the current shape. Please shout if this is not correct. I'll push to CQ later today unless I hear otherwise.
The CQ bit was checked by lukasza@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by lukasza@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rdsmith@chromium.org, dewittj@chromium.org Link to the patchset: https://codereview.chromium.org/2163073002/#ps60001 (title: "Posting the injection ->UI->FILE->UI.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== Avoid crashing if MHTMLGenerationManager::JobFinished is called twice. This CL makes sure that MHTMLGenerationManager::JobFinished is idempotent and doesn't do anything when called for a second time. This avoids the crash from the linked bug. BUG=612098 ========== to ========== Avoid crashing if MHTMLGenerationManager::JobFinished is called twice. This CL makes sure that MHTMLGenerationManager::JobFinished is idempotent and doesn't do anything when called for a second time. This avoids the crash from the linked bug. BUG=612098 Committed: https://crrev.com/a668d7811c2c19662cbeec9e10a3e283292445e5 Cr-Commit-Position: refs/heads/master@{#407244} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/a668d7811c2c19662cbeec9e10a3e283292445e5 Cr-Commit-Position: refs/heads/master@{#407244}
Message was sent while issue was closed.
Still/again LGTM. Sorry to not get back to you yesterday; busy, messy day. |