Issue 2921063003: Fix process reuse for dedicated processes when over process limit.

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-02 19:09:10 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/1

3 years, 6 months ago (2017-06-02 19:10:02 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-02 20:16:22 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/399117)

3 years, 6 months ago (2017-06-02 20:16:24 UTC) #4

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-08 22:32:39 UTC) #5

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/20001

3 years, 6 months ago (2017-06-08 22:33:22 UTC) #6

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-08 23:09:15 UTC) #7

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/385676)

3 years, 6 months ago (2017-06-08 23:09:17 UTC) #8

alexmos

Description was changed from ========== Fix process reuse for dedicated processes when over process limit. ...

3 years, 6 months ago (2017-06-09 06:15:51 UTC) #9

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-09 06:15:58 UTC) #10

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/40001

3 years, 6 months ago (2017-06-09 06:16:12 UTC) #11

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-09 06:23:42 UTC) #12

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/459895)

3 years, 6 months ago (2017-06-09 06:23:42 UTC) #13

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-09 18:49:15 UTC) #14

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/60001

3 years, 6 months ago (2017-06-09 18:49:57 UTC) #15

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-09 19:37:38 UTC) #16

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/403850)

3 years, 6 months ago (2017-06-09 19:37:39 UTC) #17

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-10 01:14:30 UTC) #18

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/80001

3 years, 6 months ago (2017-06-10 01:15:00 UTC) #19

alexmos

Description was changed from ========== Fix process reuse for dedicated processes when over process limit. ...

3 years, 6 months ago (2017-06-10 01:36:17 UTC) #20

Description was changed from

==========
Fix process reuse for dedicated processes when over process limit.

BUG=513036
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
==========

to

==========
Fix process reuse for dedicated processes when over process limit.

Previously, process reuse was broken for --isolate-origins, where an
isolated origin was allowed to reuse a process containing content from
other non-isolated sites when over process limit.  It was also overly
restrictive for --site-per-process, where a site could never reuse an
existing process when over process, even if the process was dedicated
to the same site.

This CL fixes these cases by modifying
RenderProcessHostImpl::IsSuitableHost() to consult
ChildProcessSecurityPolicy and check whether the process is locked to
a compatible site.

One tricky case was caused by lazily-initialied SiteInstances: when a
new WebContents is created with a new SiteInstance that has no site,
that SiteInstance's process should be allowed to host a site that
requires a dedicated process until something commits in it (and the
SiteInstance sets its site).  Looking purely at origin locks isn't
sufficient here, as we can't tell apart the case where the process is
brand new and hasn't yet hosted any content vs. process that has
committed URLs that don't require dedicated processes -- both end up
with empty origin locks.  To track this state, a new flag is added to
RenderProcessHost.

Another tricky case is when a page navigates itself to about:blank.
This ought to be allowed to stay in-process, even for dedicated
processes.  This required adding an extra check to
HasWrongProcessForURL() so it allows about:blank to stay in-process
before it consults IsSuitableHost().

The changes in this CL also revealed a bug in
RenderProcessHostImpl::GetProcessHostForSite, which passed in a full
URL instead of site URL to IsSuitableHost.

BUG=513036,713444
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
==========

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-10 02:29:31 UTC) #21

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 6 months ago (2017-06-10 02:29:32 UTC) #22

alexmos

Description was changed from ========== Fix process reuse for dedicated processes when over process limit. ...

3 years, 6 months ago (2017-06-12 16:59:26 UTC) #23

Description was changed from

==========
Fix process reuse for dedicated processes when over process limit.

Previously, process reuse was broken for --isolate-origins, where an
isolated origin was allowed to reuse a process containing content from
other non-isolated sites when over process limit.  It was also overly
restrictive for --site-per-process, where a site could never reuse an
existing process when over process, even if the process was dedicated
to the same site.

This CL fixes these cases by modifying
RenderProcessHostImpl::IsSuitableHost() to consult
ChildProcessSecurityPolicy and check whether the process is locked to
a compatible site.

One tricky case was caused by lazily-initialied SiteInstances: when a
new WebContents is created with a new SiteInstance that has no site,
that SiteInstance's process should be allowed to host a site that
requires a dedicated process until something commits in it (and the
SiteInstance sets its site).  Looking purely at origin locks isn't
sufficient here, as we can't tell apart the case where the process is
brand new and hasn't yet hosted any content vs. process that has
committed URLs that don't require dedicated processes -- both end up
with empty origin locks.  To track this state, a new flag is added to
RenderProcessHost.

Another tricky case is when a page navigates itself to about:blank.
This ought to be allowed to stay in-process, even for dedicated
processes.  This required adding an extra check to
HasWrongProcessForURL() so it allows about:blank to stay in-process
before it consults IsSuitableHost().

The changes in this CL also revealed a bug in
RenderProcessHostImpl::GetProcessHostForSite, which passed in a full
URL instead of site URL to IsSuitableHost.

BUG=513036,713444
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
==========

to

==========
Fix process reuse for dedicated processes when over process limit.

Previously, process reuse was broken for --isolate-origins, where an
isolated origin was allowed to reuse a process containing content from
other non-isolated sites when over process limit.  It was also overly
restrictive for --site-per-process, where a site could never reuse an
existing process when over process limit, even if the process was
dedicated to the same site.

This CL fixes these cases by modifying
RenderProcessHostImpl::IsSuitableHost() to consult
ChildProcessSecurityPolicy and check whether the process is locked to
a compatible site.  For --site-per-process, it removes the "return
false" in ShouldTryToUseExistingProcessHost(), which allows using
IsSuitableHost().

One tricky case was caused by lazily-initialied SiteInstances: when a
new WebContents is created with a new SiteInstance that has no site,
that SiteInstance's process should be allowed to host a site that
requires a dedicated process until something commits in it and the
SiteInstance sets its site.  Looking purely at origin locks isn't
sufficient here, as we can't tell apart the case where the process is
brand new and hasn't yet hosted any content vs. process that has
committed URLs that don't require dedicated processes -- both end up
with empty origin locks.  To track this state, a new flag is added to
RenderProcessHost.

Another tricky case is when a page navigates itself to about:blank.
This ought to be allowed to stay in-process, even for dedicated
processes.  This required adding an extra check to
HasWrongProcessForURL() so it allows about:blank to stay in-process
before it consults IsSuitableHost(), which would otherwise prevent
this.

The changes in this CL also revealed a bug in
RenderProcessHostImpl::GetProcessHostForSite, which passed in a full
URL instead of site URL to IsSuitableHost.

BUG=513036,713444
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
==========

alexmos

alexmos@chromium.org changed reviewers: + creis@chromium.org

3 years, 6 months ago (2017-06-12 17:35:06 UTC) #24

alexmos

Charlie, can you please take a look? This attempts to fix process reuse for both ...

3 years, 6 months ago (2017-06-12 17:35:08 UTC) #25

Charlie, can you please take a look?  This attempts to fix process reuse for
both --isolate-origins and --site-per-process.  Lots of intricate cases involved
- some explanations below.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
File content/browser/isolated_origin_browsertest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:373:
NavigateIframeToURL(web_contents(), "test_iframe", isolated_foo_url);
Interesting note: without this CL, this actually hangs.  This is because we
first attempted the transfer but then, because of (incorrectly) trying to reuse
the existing www.foo.com process, we hit the is_transfer_to_same path:
   bool is_transfer_to_same =
        is_transfer &&
        entry.transferred_global_request_id().child_id ==
            dest_render_frame_host->GetProcess()->GetID();

because we compare process IDs and not SiteInstances (the SiteInstances are
different but process ID is the same due to the reuse).  This causes us to not
send the Navigate IPC.  I wonder if this can come up in other cases as well, and
whether we should avoid comparing process IDs in the is_transfer_to_same check
due to this.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:415:
EXPECT_EQ(child->current_frame_host()->GetProcess(), isolated_foo_process);
I decided to deal with this for a followup to keep this CL simpler.  The
sequence of events is:
- start with all processes hosting only dedicated sites
- create new tab with site-less SiteInstance
- call GetProcess() on that SiteInstance.  This creates a new process since none
of the dedicated processes can host it.
- navigate new tab to a site that has an existing dedicated process.  Even
though process reuse is possible, we hit the checks that say that a site-less
SiteInstance that sets a site should stay in the same process.

It doesn't matter much in practice, as opening a new tab has the NTP and never
hits this path.  Potential fixes could be to checking for process reuse when a
site-less SI sets a site, or to allow dedicated processes to host uninitialized
SiteInstances (they should transfer out if necessary when they navigate).

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:1769:
can_become_dedicated_process_ = false;
Note that I'm specifically not calling this in
IncrementServiceWorkerRefCount/IncrementSharedWorkerRefCount.  Shared workers
would need a page to already be committed in the process to be created, meaning
we should've already called this.  Service workers also should be fine:
ServiceWorkerProcessManager::AllocateWorkerProcess will either put the worker
into an existing process which already has a regular page committed, or it will
create a new SiteInstance for a specific URL (causing site_ to be assigned), and
then will immediately call GetProcess(), which will lead to LockToOrigin and
again disallow reuse.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:3009:
!IsSuitableHost(host, browser_context, site_url))) {
After my IsSuitableHost modifications, a few extension tests (such as
WebNavigationApiTest.UserAction) caught this bug, which caused unneeded process
swaps (when site_url was used for origin lock comparison).

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
accessibility tests) expect to navigate A->about:blank->A and for all this to
end up in same process (i.e., no swap for about:blank).  Without this check,
with --site-per-process, we'd go into IsSuitableHost, which would compare the
origin lock against "about:" (site_url for about:blank) and fail.  So I added a
special case here.

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
File content/public/browser/render_process_host.h (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
These are public because of MockRenderProcessHost. :(  I struggled with finding
good names here.  What I really wanted this to mean was that this is true up
until the first LockToOrigin call (which might come either when the first
navigation commits, or when a SiteInstance with a site that's already set calls
LockToOrigin).  I.e., this intends to be true for
uninitialized/untainted/site-less SiteInstances so that IsSuitableHost allows
their process to be used as a dedicated process.  Without this, the very first
navigation in a new tab caused a process swap from the initial SiteInstance,
causing lots of test failures.

Charlie Reis

Nice. I'm really excited about the problems we're solving here (and the other ones you're ...

3 years, 6 months ago (2017-06-12 23:08:18 UTC) #26

Nice.  I'm really excited about the problems we're solving here (and the other
ones you're identifying).  A few thoughts below.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
File content/browser/isolated_origin_browsertest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:373:
NavigateIframeToURL(web_contents(), "test_iframe", isolated_foo_url);
On 2017/06/12 17:35:07, alexmos wrote:
> Interesting note: without this CL, this actually hangs.  This is because we
> first attempted the transfer but then, because of (incorrectly) trying to
reuse
> the existing http://www.foo.com process, we hit the is_transfer_to_same path:
>    bool is_transfer_to_same =
>         is_transfer &&
>         entry.transferred_global_request_id().child_id ==
>             dest_render_frame_host->GetProcess()->GetID();
> 
> because we compare process IDs and not SiteInstances (the SiteInstances are
> different but process ID is the same due to the reuse).  This causes us to not
> send the Navigate IPC.  I wonder if this can come up in other cases as well,
and
> whether we should avoid comparing process IDs in the is_transfer_to_same check
> due to this.

Wow, nice find-- that could definitely come up in other cases.  It's perhaps not
as common as it could be since most sites don't trigger a cross-process redirect
and those that do (e.g., CWS) tend to require a different process.  But I could
see it happening for a redirect from one extension to another when we're over
the limit and the extensions end up sharing a process.  Can you file a bug?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:402: isolated_foo_instance);
Can you also check that isolated_foo_instance and the previous subframe
SiteInstance are not related, as the comment mentions?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:415:
EXPECT_EQ(child->current_frame_host()->GetProcess(), isolated_foo_process);
On 2017/06/12 17:35:07, alexmos wrote:
> I decided to deal with this for a followup to keep this CL simpler.  The
> sequence of events is:
> - start with all processes hosting only dedicated sites
> - create new tab with site-less SiteInstance
> - call GetProcess() on that SiteInstance.  This creates a new process since
none
> of the dedicated processes can host it.
> - navigate new tab to a site that has an existing dedicated process.  Even
> though process reuse is possible, we hit the checks that say that a site-less
> SiteInstance that sets a site should stay in the same process.
> 
> It doesn't matter much in practice, as opening a new tab has the NTP and never
> hits this path.  Potential fixes could be to checking for process reuse when a
> site-less SI sets a site, or to allow dedicated processes to host
uninitialized
> SiteInstances (they should transfer out if necessary when they navigate).

For potential fix #1, what would we do at the time of setting the site?  That's
at commit time, right?  (That would be too late to transfer to an existing
process.)

For potential fix #2, I'm not sure I understand-- would we randomly put a blank
SiteInstance into a dedicated process and hope that's the site we end up on? 
That seems unlikely to succeed in most cases, but maybe I'm missing the idea.

I could see setting up a transfer once we know we're going to commit a URL that
needs a dedicated (existing) process.  Or rather, letting PlzNavigate switch to
an RFH in that process, since there's not much point in fixing it in the
non-PlzNavigate case.  Either way, I agree there's probably no rush.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2862: // TODO(nick):
Consult the SiteIsolationPolicy here. https://crbug.com/513036
Is this TODO resolved now?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2874: return
policy->IsLockedToOrigin(host->GetID(), site_url);
I'm a little concerned about TOCTTOU problems here, since HasOriginLock and
IsLockedToOrigin separately acquire and release the AutoLock.  I'm not sure I
can picture a scenario where it's a problem (since once HasOriginLock returns
true, the value of either method isn't going to change), but it feels fragile.

I suppose we could document the risks on the declarations.  Or we could store
which site the process is locked to on RPHI, so we don't need to acquire the
AutoLock to query it.  Thoughts?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:3009:
!IsSuitableHost(host, browser_context, site_url))) {
On 2017/06/12 17:35:07, alexmos wrote:
> After my IsSuitableHost modifications, a few extension tests (such as
> WebNavigationApiTest.UserAction) caught this bug, which caused unneeded
process
> swaps (when site_url was used for origin lock comparison).

Nice.  Can you give an example of what the previous values were?  I'm not sure I
follow what the bug was.

(It may also be worth mentioning something to this effect in the comment above,
not necessarily with the example.)

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_unittest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_unittest.cc:181:
EXPECT_FALSE(site_instance->GetProcess()->CanBecomeDedicatedProcess());
What would happen if kUrl1 was a site requiring a dedicated process?  The name
is a bit odd here, since GetProcess() would be a dedicated process, but
CanBecomeDedicatedProcess() would presumably return false.

(Maybe the later suggestion about the name might help.)

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
On 2017/06/12 17:35:07, alexmos wrote:
> Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
> accessibility tests) expect to navigate A->about:blank->A and for all this to
> end up in same process (i.e., no swap for about:blank).  Without this check,
> with --site-per-process, we'd go into IsSuitableHost, which would compare the
> origin lock against "about:" (site_url for about:blank) and fail.  So I added
a
> special case here.

This makes me a bit nervous.  In other places, we've had to deal with
about:srcdoc and data: URLs alongside about:blank (e.g., later in
DetermineSiteInstanceForURL).  Those seem like they would have the same problem.

Is there a way we can reorder or share code to avoid having to copy/paste all
these cases in both places?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:472:
process_->UnsetCanBecomeDedicatedProcess();
I found it confusing that this was inside "LockToOrigin" but relies on that
method being called in all cases, including normal web sites.  Maybe we should
call it MaybeLockToOrigin or LockToOriginIfNeeded?

There almost might be a bug here-- there are a bunch of early returns in the
block above.  Should this line be the first line of the method, so that it
happens regardless?

(I wondered if it should be done before the call to this method, since it's more
related to the commit/set-site behavior than the origin lock behavior, but maybe
that would require too much duplicating logic across two call sites.)

https://codereview.chromium.org/2921063003/diff/80001/content/browser/web_con...
File content/browser/web_contents/web_contents_impl_unittest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/web_con...
content/browser/web_contents/web_contents_impl_unittest.cc:1030:
ui::PAGE_TRANSITION_TYPED);
It's unfortunate this wasn't failing already before your CL.  I suppose that's
because we aren't yet killing for committing the wrong URL, due to error pages?

(I wonder if we can start killing for illegal URLs with an exception for error
pages.  It won't provide a security boundary until we handle error pages, but it
will at least help us catch bugs in code that has the wrong expectations.)

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
File content/public/browser/render_process_host.h (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
On 2017/06/12 17:35:07, alexmos wrote:
> These are public because of MockRenderProcessHost. :(  I struggled with
finding
> good names here.  What I really wanted this to mean was that this is true up
> until the first LockToOrigin call (which might come either when the first
> navigation commits, or when a SiteInstance with a site that's already set
calls
> LockToOrigin).  I.e., this intends to be true for
> uninitialized/untainted/site-less SiteInstances so that IsSuitableHost allows
> their process to be used as a dedicated process.  Without this, the very first
> navigation in a new tab caused a process swap from the initial SiteInstance,
> causing lots of test failures.

Yeah, I wonder if there's a better concept to be tracking here.  Would something
like IsUnused work, with documentation that it gets cleared in any of the
following cases?
1) Commits a page.
2) Given to a SiteInstance that already has a site assigned.
3) Hosts a ServiceWorker or SharedWorker.

That makes it potentially more generally useful, without being specific to the
dedicated process case.

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
On 2017/06/12 17:35:07, alexmos wrote:
> These are public because of MockRenderProcessHost. :(

Sigh.  Some day we'll fix that.

>  I struggled with finding
> good names here.  What I really wanted this to mean was that this is true up
> until the first LockToOrigin call (which might come either when the first
> navigation commits, or when a SiteInstance with a site that's already set
calls
> LockToOrigin).  I.e., this intends to be true for
> uninitialized/untainted/site-less SiteInstances so that IsSuitableHost allows
> their process to be used as a dedicated process.  Without this, the very first
> navigation in a new tab caused a process swap from the initial SiteInstance,
> causing lots of test failures.

https://codereview.chromium.org/2921063003/diff/80001/content/public/test/moc...
File content/public/test/mock_render_process_host.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/test/moc...
content/public/test/mock_render_process_host.cc:300:
UnsetCanBecomeDedicatedProcess();
It's not obvious why these need to be here in MockRenderProcessHost but not
RenderProcessHostImpl.  If it's because the mock case won't have the same
context as the real case, it might be worth a comment to that effect.

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-14 22:22:18 UTC) #27

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/100001

3 years, 6 months ago (2017-06-14 22:22:46 UTC) #28

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-14 22:24:53 UTC) #29

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/232106) ios-simulator on ...

3 years, 6 months ago (2017-06-14 22:24:54 UTC) #30

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-14 22:36:56 UTC) #31

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/120001

3 years, 6 months ago (2017-06-14 22:37:16 UTC) #32

alexmos

Thanks Charlie - responses below. For HasWrongProcessForURL and CPSP API, I'll wait for your feedback ...

3 years, 6 months ago (2017-06-14 22:39:05 UTC) #33

Thanks Charlie - responses below.  For HasWrongProcessForURL and CPSP API, I'll
wait for your feedback on my responses before making changes.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
File content/browser/isolated_origin_browsertest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:373:
NavigateIframeToURL(web_contents(), "test_iframe", isolated_foo_url);
On 2017/06/12 23:08:17, Charlie Reis wrote:
> On 2017/06/12 17:35:07, alexmos wrote:
> > Interesting note: without this CL, this actually hangs.  This is because we
> > first attempted the transfer but then, because of (incorrectly) trying to
> reuse
> > the existing http://www.foo.com process, we hit the is_transfer_to_same
path:
> >    bool is_transfer_to_same =
> >         is_transfer &&
> >         entry.transferred_global_request_id().child_id ==
> >             dest_render_frame_host->GetProcess()->GetID();
> > 
> > because we compare process IDs and not SiteInstances (the SiteInstances are
> > different but process ID is the same due to the reuse).  This causes us to
not
> > send the Navigate IPC.  I wonder if this can come up in other cases as well,
> and
> > whether we should avoid comparing process IDs in the is_transfer_to_same
check
> > due to this.
> 
> Wow, nice find-- that could definitely come up in other cases.  It's perhaps
not
> as common as it could be since most sites don't trigger a cross-process
redirect
> and those that do (e.g., CWS) tend to require a different process.  But I
could
> see it happening for a redirect from one extension to another when we're over
> the limit and the extensions end up sharing a process.  Can you file a bug?

Done - filed https://crbug.com/733023.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:402: isolated_foo_instance);
On 2017/06/12 23:08:17, Charlie Reis wrote:
> Can you also check that isolated_foo_instance and the previous subframe
> SiteInstance are not related, as the comment mentions?

Done.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/isolate...
content/browser/isolated_origin_browsertest.cc:415:
EXPECT_EQ(child->current_frame_host()->GetProcess(), isolated_foo_process);
On 2017/06/12 23:08:17, Charlie Reis wrote:
> On 2017/06/12 17:35:07, alexmos wrote:
> > I decided to deal with this for a followup to keep this CL simpler.  The
> > sequence of events is:
> > - start with all processes hosting only dedicated sites
> > - create new tab with site-less SiteInstance
> > - call GetProcess() on that SiteInstance.  This creates a new process since
> none
> > of the dedicated processes can host it.
> > - navigate new tab to a site that has an existing dedicated process.  Even
> > though process reuse is possible, we hit the checks that say that a
site-less
> > SiteInstance that sets a site should stay in the same process.
> > 
> > It doesn't matter much in practice, as opening a new tab has the NTP and
never
> > hits this path.  Potential fixes could be to checking for process reuse when
a
> > site-less SI sets a site, or to allow dedicated processes to host
> uninitialized
> > SiteInstances (they should transfer out if necessary when they navigate).
> 
> For potential fix #1, what would we do at the time of setting the site? 
That's
> at commit time, right?  (That would be too late to transfer to an existing
> process.)

Yeah, agreed that commit-time SetSite is too late (without PlzNavigate).

> 
> For potential fix #2, I'm not sure I understand-- would we randomly put a
blank
> SiteInstance into a dedicated process and hope that's the site we end up on? 
> That seems unlikely to succeed in most cases, but maybe I'm missing the idea.

Yes, the idea was to allow a blank SiteInstance into a dedicated process and
then transfer it out if it turns out to be for a different site.  This does seem
somewhat fragile though.

> 
> I could see setting up a transfer once we know we're going to commit a URL
that
> needs a dedicated (existing) process.  Or rather, letting PlzNavigate switch
to
> an RFH in that process, since there's not much point in fixing it in the
> non-PlzNavigate case.  Either way, I agree there's probably no rush.

I agree this will be easier with PlzNavigate, so maybe we can just wait until
PlzNavigate ships before tackling this, since there's no rush.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2862: // TODO(nick):
Consult the SiteIsolationPolicy here. https://crbug.com/513036
On 2017/06/12 23:08:17, Charlie Reis wrote:
> Is this TODO resolved now?

Indeed, removed.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2874: return
policy->IsLockedToOrigin(host->GetID(), site_url);
On 2017/06/12 23:08:17, Charlie Reis wrote:
> I'm a little concerned about TOCTTOU problems here, since HasOriginLock and
> IsLockedToOrigin separately acquire and release the AutoLock.  I'm not sure I
> can picture a scenario where it's a problem (since once HasOriginLock returns
> true, the value of either method isn't going to change), but it feels fragile.
> 
> I suppose we could document the risks on the declarations.  Or we could store
> which site the process is locked to on RPHI, so we don't need to acquire the
> AutoLock to query it.  Thoughts?

Yes, good questions.  Since we always set origin locks on the UI thread, I don't
think this will be a problem in practice.  We could add documentation/DCHECK to
that effect in ChildProcessSecurityPolicyImpl::LockToOrigin.

I did think about this, and in an earlier PS, I actually had a version of this
which simply queried policy->GetOriginLock(id), which we can then check for
is_empty() and for site_url equality.  I ultimately moved away from that, as I
thought it was better to keep origin lock checks encapsulated inside CPSP.

Another option could be for IsLockedToOrigin(id, site_url) to return an enum 
corresponding to the three {no_lock, wrong_lock, right_lock} cases.  And perhaps
rename it to CheckOriginLock.  This makes it a bit more difficult to use, but it
avoids the TOCTTOU concerns completely, and also forces call sites to explicitly
reason about whether or not they care about an empty origin lock.

WDYT?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:3009:
!IsSuitableHost(host, browser_context, site_url))) {
On 2017/06/12 23:08:17, Charlie Reis wrote:
> On 2017/06/12 17:35:07, alexmos wrote:
> > After my IsSuitableHost modifications, a few extension tests (such as
> > WebNavigationApiTest.UserAction) caught this bug, which caused unneeded
> process
> > swaps (when site_url was used for origin lock comparison).

Sorry, typo here - this should've just been "when |url| was used for origin lock
comparison."

> 
> Nice.  Can you give an example of what the previous values were?  I'm not sure
I
> follow what the bug was.
> 
> (It may also be worth mentioning something to this effect in the comment
above,
> not necessarily with the example.)

So, for example, the old code would pass in
  chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/a.html?32773
to IsSuitableHost, which would compare that against the origin lock of
  chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/
and fail, which would result in returning null from here and not staying in the
existing extension process.

I added a short comment about this; I think it's just a straightforward case of
passing the wrong thing.  It'd be nice if site URLs used a different type to
avoid these kinds of bugs, since this isn't the first time it happened (I think
Nick fixed another one somewhere).

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_unittest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_unittest.cc:181:
EXPECT_FALSE(site_instance->GetProcess()->CanBecomeDedicatedProcess());
On 2017/06/12 23:08:17, Charlie Reis (slow) wrote:
> What would happen if kUrl1 was a site requiring a dedicated process?  The name
> is a bit odd here, since GetProcess() would be a dedicated process, but
> CanBecomeDedicatedProcess() would presumably return false.
> 
> (Maybe the later suggestion about the name might help.)

Yes, I renamed the method and reworded the comment to avoid this confusion.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
On 2017/06/12 23:08:17, Charlie Reis wrote:
> On 2017/06/12 17:35:07, alexmos wrote:
> > Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
> > accessibility tests) expect to navigate A->about:blank->A and for all this
to
> > end up in same process (i.e., no swap for about:blank).  Without this check,
> > with --site-per-process, we'd go into IsSuitableHost, which would compare
the
> > origin lock against "about:" (site_url for about:blank) and fail.  So I
added
> a
> > special case here.
> 
> This makes me a bit nervous.  In other places, we've had to deal with
> about:srcdoc and data: URLs alongside about:blank (e.g., later in
> DetermineSiteInstanceForURL).  Those seem like they would have the same
problem.
> 
> Is there a way we can reorder or share code to avoid having to copy/paste all
> these cases in both places?

This is a bit tricky.  Just to clarify, my intent here was to fix only
browser-initiated navigations (the tests were doing this via NavigateToURL), and
the DetermineSiteInstanceForURL special cases only matter for renderer-initiated
navigations (since they kick in only when source_instance is not null).  The
renderer-initiated navigations don't really reach the change here: subframes
will replace about:blank, data:, and other unique-origin URLs with the source
SiteInstance's site URL in CanSubframeSwapProcess before reaching
IsRendererTransferNeededForNavigation -> IsCurrentlySameSite ->
HasWrongProcessForURL.  For top-level frames, we will hit the logic you
mentioned in DetermineSiteInstanceForURL before ever getting here.  But for
browser-initiated navigations, DetermineSiteInstanceForURL doesn't use the
about:blank logic (since source_instance is null), we fall through to check
IsCurrentlySameSite, which first checks HasWrongProcessForURL and then
SiteInstance::IsSameWebSite.  Before this CL, HasWrongProcessForURL was false
(i.e., process was good), and SiteInstance::IsSameWebSite handled this case
here:

  // If the destination url is just a blank page, we treat them as part of the
  // same site.
  GURL blank_page(url::kAboutBlankURL);
  if (dest_url == blank_page)
    return true;

After this CL, the origin locks caused HasWrongProcessForURL to return true
before we could get to that about:blank special case in IsSameWebSite and say
that it's really same-site.  So the reason I only covered about:blank here was
to mirror the case in IsSameWebSite.

I could merge the about:blank usage in HasWrongProcessForURL and in
IsSameWebSite into something like ShouldConsiderURLSameSite, but do you think
this is worth it for just about:blank?  I don't think we want to do this for
data:, which isn't same-site, and it feels safer if HasWrongProcessForURL forces
a process swap for a browser-initiated data: navigation from a site with a
dedicated process.  (Also, actually handling it and returning false would have
no effect, as  it would force a swap anyway later on, because IsSameWebSite will
return false.)  It also doesn't seem like you can get a browser-initiated
about:srcdoc navigation, can you?

It's possible that we could avoid this special case and just force a process
swap for browser-initiated about:blank navigations (e.g., if user types in
about:blank into omnibox), and fix the accessibility test harness to expect the
process swap, though I don't know how difficult that'd be, and swapping on
about:blank still seems a bit wasteful.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:472:
process_->UnsetCanBecomeDedicatedProcess();
On 2017/06/12 23:08:17, Charlie Reis wrote:
> I found it confusing that this was inside "LockToOrigin" but relies on that
> method being called in all cases, including normal web sites.  Maybe we should
> call it MaybeLockToOrigin or LockToOriginIfNeeded?
> 
> There almost might be a bug here-- there are a bunch of early returns in the
> block above.  Should this line be the first line of the method, so that it
> happens regardless?
> 
> (I wondered if it should be done before the call to this method, since it's
more
> related to the commit/set-site behavior than the origin lock behavior, but
maybe
> that would require too much duplicating logic across two call sites.)

Great questions, and I agree with your concerns.  I moved up setting of the flag
to be done as first thing in the method and renamed it to LockToOriginIfNeeded. 
And yes, I put it here because I didn't want to duplicate the logic/comment
across the two call sites, and was also concerned with us missing a new call
site down the road.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/web_con...
File content/browser/web_contents/web_contents_impl_unittest.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/web_con...
content/browser/web_contents/web_contents_impl_unittest.cc:1030:
ui::PAGE_TRANSITION_TYPED);
On 2017/06/12 23:08:17, Charlie Reis wrote:
> It's unfortunate this wasn't failing already before your CL.  I suppose that's
> because we aren't yet killing for committing the wrong URL, due to error
pages?
> 
> (I wonder if we can start killing for illegal URLs with an exception for error
> pages.  It won't provide a security boundary until we handle error pages, but
it
> will at least help us catch bugs in code that has the wrong expectations.)

Yes, and that's a really good idea to start catching any other bugs with these
kills and not have the enforcement be blocked on error pages.  Let's explore
that in a followup.

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
File content/public/browser/render_process_host.h (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
On 2017/06/12 23:08:17, Charlie Reis (slow) wrote:
> On 2017/06/12 17:35:07, alexmos wrote:
> > These are public because of MockRenderProcessHost. :(  I struggled with
> finding
> > good names here.  What I really wanted this to mean was that this is true up
> > until the first LockToOrigin call (which might come either when the first
> > navigation commits, or when a SiteInstance with a site that's already set
> calls
> > LockToOrigin).  I.e., this intends to be true for
> > uninitialized/untainted/site-less SiteInstances so that IsSuitableHost
allows
> > their process to be used as a dedicated process.  Without this, the very
first
> > navigation in a new tab caused a process swap from the initial SiteInstance,
> > causing lots of test failures.
> 
> Yeah, I wonder if there's a better concept to be tracking here.  Would
something
> like IsUnused work, with documentation that it gets cleared in any of the
> following cases?
> 1) Commits a page.
> 2) Given to a SiteInstance that already has a site assigned.
> 3) Hosts a ServiceWorker or SharedWorker.
> 
> That makes it potentially more generally useful, without being specific to the
> dedicated process case.

I like this, especially given the potential for this to be used elsewhere. 
Changed.

(Note that I didn't add (3) to the comment - see my explanation in
https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere....
 Though I guess I could say that ServiceWorkers are implicitly handled via (2),
and SharedWorkers are implicitly handled via (1).)

https://codereview.chromium.org/2921063003/diff/80001/content/public/test/moc...
File content/public/test/mock_render_process_host.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/test/moc...
content/public/test/mock_render_process_host.cc:300:
UnsetCanBecomeDedicatedProcess();
On 2017/06/12 23:08:17, Charlie Reis wrote:
> It's not obvious why these need to be here in MockRenderProcessHost but not
> RenderProcessHostImpl.  If it's because the mock case won't have the same
> context as the real case, it might be worth a comment to that effect.

Ah, good catch.  These shouldn't be here, I just forgot to clean them up.  I
started out by doing this in both RenderProcessHostImpl and
MockRenderProcessHost but then persuaded myself that it isn't needed (see
https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...)

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-14 23:44:24 UTC) #34

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/449908)

3 years, 6 months ago (2017-06-14 23:44:25 UTC) #35

alexmos

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_instance_impl.cc File content/browser/site_instance_impl.cc (right): https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_instance_impl.cc#newcode472 content/browser/site_instance_impl.cc:472: process_->UnsetCanBecomeDedicatedProcess(); On 2017/06/14 22:39:05, alexmos wrote: > On 2017/06/12 ...

3 years, 6 months ago (2017-06-15 16:40:54 UTC) #36

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-15 17:24:37 UTC) #37

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/140001

3 years, 6 months ago (2017-06-15 17:24:56 UTC) #38

alexmos

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_instance_impl.cc File content/browser/site_instance_impl.cc (right): https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_instance_impl.cc#newcode472 content/browser/site_instance_impl.cc:472: process_->UnsetCanBecomeDedicatedProcess(); On 2017/06/15 16:40:54, alexmos wrote: > On 2017/06/14 ...

3 years, 6 months ago (2017-06-15 17:27:59 UTC) #39

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:472:
process_->UnsetCanBecomeDedicatedProcess();
On 2017/06/15 16:40:54, alexmos wrote:
> On 2017/06/14 22:39:05, alexmos wrote:
> > On 2017/06/12 23:08:17, Charlie Reis wrote:
> > > I found it confusing that this was inside "LockToOrigin" but relies on
that
> > > method being called in all cases, including normal web sites.  Maybe we
> should
> > > call it MaybeLockToOrigin or LockToOriginIfNeeded?
> > > 
> > > There almost might be a bug here-- there are a bunch of early returns in
the
> > > block above.  Should this line be the first line of the method, so that it
> > > happens regardless?
> > > 
> > > (I wondered if it should be done before the call to this method, since
it's
> > more
> > > related to the commit/set-site behavior than the origin lock behavior, but
> > maybe
> > > that would require too much duplicating logic across two call sites.)
> > 
> > Great questions, and I agree with your concerns.  I moved up setting of the
> flag
> > to be done as first thing in the method and renamed it to
> LockToOriginIfNeeded. 
> > And yes, I put it here because I didn't want to duplicate the logic/comment
> > across the two call sites, and was also concerned with us missing a new call
> > site down the road.
> 
> Hmm, the trybots show that this wasn't as straightforward as I hoped.  Moving
it
> up exposed an issue with these early return cases, such as for WebUI.  These
> return true from RequiresDedicatedProcess (when running with
--site-per-process
> for WebUI), yet they don't set the origin lock.  My new check in
IsSuitableHost
> would incorrectly return false in those cases, since from its point of view,
the
> process looks like it's "used" and the site_url requires a dedicated process. 
> I'm working on a fix.

I've just uploaded the fix (PS8), which is essentially to share the logic of
whether an origin lock is required between LockToOrigin and IsSuitableHost. 
Let's see what the trybots say.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-15 18:43:55 UTC) #40

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 6 months ago (2017-06-15 18:43:56 UTC) #41

Charlie Reis

Thanks! This basically looks good, and let's go with your enum idea for the CPSP ...

3 years, 6 months ago (2017-06-17 23:13:53 UTC) #42

Thanks!  This basically looks good, and let's go with your enum idea for the
CPSP API.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2874: return
policy->IsLockedToOrigin(host->GetID(), site_url);
On 2017/06/14 22:39:05, alexmos wrote:
> On 2017/06/12 23:08:17, Charlie Reis wrote:
> > I'm a little concerned about TOCTTOU problems here, since HasOriginLock and
> > IsLockedToOrigin separately acquire and release the AutoLock.  I'm not sure
I
> > can picture a scenario where it's a problem (since once HasOriginLock
returns
> > true, the value of either method isn't going to change), but it feels
fragile.
> > 
> > I suppose we could document the risks on the declarations.  Or we could
store
> > which site the process is locked to on RPHI, so we don't need to acquire the
> > AutoLock to query it.  Thoughts?
> 
> Yes, good questions.  Since we always set origin locks on the UI thread, I
don't
> think this will be a problem in practice.  We could add documentation/DCHECK
to
> that effect in ChildProcessSecurityPolicyImpl::LockToOrigin.
> 
> I did think about this, and in an earlier PS, I actually had a version of this
> which simply queried policy->GetOriginLock(id), which we can then check for
> is_empty() and for site_url equality.  I ultimately moved away from that, as I
> thought it was better to keep origin lock checks encapsulated inside CPSP.
> 
> Another option could be for IsLockedToOrigin(id, site_url) to return an enum 
> corresponding to the three {no_lock, wrong_lock, right_lock} cases.  And
perhaps
> rename it to CheckOriginLock.  This makes it a bit more difficult to use, but
it
> avoids the TOCTTOU concerns completely, and also forces call sites to
explicitly
> reason about whether or not they care about an empty origin lock.
> 
> WDYT?

I really like the enum idea, especially because it makes people think about the
various cases when checking the origin lock.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:3009:
!IsSuitableHost(host, browser_context, site_url))) {
On 2017/06/14 22:39:05, alexmos wrote:
> On 2017/06/12 23:08:17, Charlie Reis wrote:
> > On 2017/06/12 17:35:07, alexmos wrote:
> > > After my IsSuitableHost modifications, a few extension tests (such as
> > > WebNavigationApiTest.UserAction) caught this bug, which caused unneeded
> > process
> > > swaps (when site_url was used for origin lock comparison).
> 
> Sorry, typo here - this should've just been "when |url| was used for origin
lock
> comparison."
> 
> > 
> > Nice.  Can you give an example of what the previous values were?  I'm not
sure
> I
> > follow what the bug was.
> > 
> > (It may also be worth mentioning something to this effect in the comment
> above,
> > not necessarily with the example.)
> 
> So, for example, the old code would pass in
>   chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/a.html?32773
> to IsSuitableHost, which would compare that against the origin lock of
>   chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/
> and fail, which would result in returning null from here and not staying in
the
> existing extension process.

I see now-- thanks!

> I added a short comment about this; I think it's just a straightforward case
of
> passing the wrong thing.  It'd be nice if site URLs used a different type to
> avoid these kinds of bugs, since this isn't the first time it happened (I
think
> Nick fixed another one somewhere).

Agreed.  We wanted to introduce a notion of a principal earlier (I think in
https://crbug.com/109792 and perhaps another bug having to do with
BrowserPlugin's guest SiteInstances), but it was vetoed at the time.  I think
we'll want to revisit that, especially as (1) the definition of site changes to
sometimes include origins and/or ports, (2) we want to introduce knowledge of
StoragePartition to support cross-process navigations and OOPIFs in guests, and
(3) we continue to discover more reasons that GURL or Origin alone are not
sufficient to represent a security principal.

I think this will be a fun project, once we have a chance.  Want to file a bug
for it?

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
On 2017/06/14 22:39:05, alexmos wrote:
> On 2017/06/12 23:08:17, Charlie Reis wrote:
> > On 2017/06/12 17:35:07, alexmos wrote:
> > > Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
> > > accessibility tests) expect to navigate A->about:blank->A and for all this
> to
> > > end up in same process (i.e., no swap for about:blank).  Without this
check,
> > > with --site-per-process, we'd go into IsSuitableHost, which would compare
> the
> > > origin lock against "about:" (site_url for about:blank) and fail.  So I
> added
> > a
> > > special case here.
> > 
> > This makes me a bit nervous.  In other places, we've had to deal with
> > about:srcdoc and data: URLs alongside about:blank (e.g., later in
> > DetermineSiteInstanceForURL).  Those seem like they would have the same
> problem.
> > 
> > Is there a way we can reorder or share code to avoid having to copy/paste
all
> > these cases in both places?
> 
> This is a bit tricky.  Just to clarify, my intent here was to fix only
> browser-initiated navigations (the tests were doing this via NavigateToURL),
and
> the DetermineSiteInstanceForURL special cases only matter for
renderer-initiated
> navigations (since they kick in only when source_instance is not null).  The
> renderer-initiated navigations don't really reach the change here: subframes
> will replace about:blank, data:, and other unique-origin URLs with the source
> SiteInstance's site URL in CanSubframeSwapProcess before reaching
> IsRendererTransferNeededForNavigation -> IsCurrentlySameSite ->
> HasWrongProcessForURL.  For top-level frames, we will hit the logic you
> mentioned in DetermineSiteInstanceForURL before ever getting here.  But for
> browser-initiated navigations, DetermineSiteInstanceForURL doesn't use the
> about:blank logic (since source_instance is null), we fall through to check
> IsCurrentlySameSite, which first checks HasWrongProcessForURL and then
> SiteInstance::IsSameWebSite.  Before this CL, HasWrongProcessForURL was false
> (i.e., process was good), and SiteInstance::IsSameWebSite handled this case
> here:
> 
>   // If the destination url is just a blank page, we treat them as part of the
>   // same site.
>   GURL blank_page(url::kAboutBlankURL);
>   if (dest_url == blank_page)
>     return true;
> 
> After this CL, the origin locks caused HasWrongProcessForURL to return true
> before we could get to that about:blank special case in IsSameWebSite and say
> that it's really same-site.  So the reason I only covered about:blank here was
> to mirror the case in IsSameWebSite.
> 
> I could merge the about:blank usage in HasWrongProcessForURL and in
> IsSameWebSite into something like ShouldConsiderURLSameSite, but do you think
> this is worth it for just about:blank?  I don't think we want to do this for
> data:, which isn't same-site, and it feels safer if HasWrongProcessForURL
forces
> a process swap for a browser-initiated data: navigation from a site with a
> dedicated process.  (Also, actually handling it and returning false would have
> no effect, as  it would force a swap anyway later on, because IsSameWebSite
will
> return false.)  It also doesn't seem like you can get a browser-initiated
> about:srcdoc navigation, can you?
> 
> It's possible that we could avoid this special case and just force a process
> swap for browser-initiated about:blank navigations (e.g., if user types in
> about:blank into omnibox), and fix the accessibility test harness to expect
the
> process swap, though I don't know how difficult that'd be, and swapping on
> about:blank still seems a bit wasteful.

Interesting points.  I think you're right that we can keep this as you have it. 
I'm sure I won't remember the distinction between this and the
renderer-initiated case (which does care about data: and about:srcdoc), so maybe
there's a way to leave a short explanation in a comment why checking about:blank
is sufficient here?

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
File content/public/browser/render_process_host.h (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
On 2017/06/14 22:39:05, alexmos wrote:
> On 2017/06/12 23:08:17, Charlie Reis (slow) wrote:
> > On 2017/06/12 17:35:07, alexmos wrote:
> > > These are public because of MockRenderProcessHost. :(  I struggled with
> > finding
> > > good names here.  What I really wanted this to mean was that this is true
up
> > > until the first LockToOrigin call (which might come either when the first
> > > navigation commits, or when a SiteInstance with a site that's already set
> > calls
> > > LockToOrigin).  I.e., this intends to be true for
> > > uninitialized/untainted/site-less SiteInstances so that IsSuitableHost
> allows
> > > their process to be used as a dedicated process.  Without this, the very
> first
> > > navigation in a new tab caused a process swap from the initial
SiteInstance,
> > > causing lots of test failures.
> > 
> > Yeah, I wonder if there's a better concept to be tracking here.  Would
> something
> > like IsUnused work, with documentation that it gets cleared in any of the
> > following cases?
> > 1) Commits a page.
> > 2) Given to a SiteInstance that already has a site assigned.
> > 3) Hosts a ServiceWorker or SharedWorker.
> > 
> > That makes it potentially more generally useful, without being specific to
the
> > dedicated process case.
> 
> I like this, especially given the potential for this to be used elsewhere. 
> Changed.
> 
> (Note that I didn't add (3) to the comment - see my explanation in
>
https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere....
>  Though I guess I could say that ServiceWorkers are implicitly handled via
(2),
> and SharedWorkers are implicitly handled via (1).)

Yes, saying it's implicit in those cases might be worthwhile, just to show that
those cases are covered.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/render...
File content/browser/renderer_host/render_process_host_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/render...
content/browser/renderer_host/render_process_host_impl.h:626: // not any web
content, and it has not been given to a SiteInstance that has
nit: has not committed any web content

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.cc:415: // Guest processes cannot be locked
to its site because guests always have
nit: s/its/their/

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.cc:423: // TODO(creis, nick)
https://crbug.com/510588 Chrome UI pages use the same
Oh, I see.  Actually, dbeam@ was just working on a CL
(https://codereview.chromium.org/2931243002/) that may unblock a lot of that, if
I understood correctly.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:152: // Returns true unless a process for
|site_url| cannot be locked to just that
nit: s/unless/if/, s/cannot/should/
(Otherwise it's too hard to tell if the comment and method name agree.)

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:153: // site. Returning true here implies
that |site_url| requires a dedicated
nit: also implies

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:154: // process. However, the converse does
not hold: if |site_url| requires a
I'm finding this pretty hard to read, since "lock to origin" and "requires
dedicated process" sound synonymous.  Can you explain the difference in the
comment a bit more?

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:156: // cases, most of which are temporary
bug workarounds.
"Temporary bug workarounds" is confusing me a bit.  I thought WebUI and
extensions were the cases that required "dedicated" processes but weren't locked
to origin, but those don't sound like temporary bug workarounds.  (I thought
that at least the extension case was about letting extensions share with each
other but not other types of processes.)  Are there plans to change that?

alexmos

The CQ bit was checked by alexmos@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-19 20:01:23 UTC) #43

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/160001

3 years, 6 months ago (2017-06-19 20:01:38 UTC) #44

alexmos

Thanks for reviewing! More responses below. https://codereview.chromium.org/2921063003/diff/80001/content/browser/renderer_host/render_process_host_impl.cc File content/browser/renderer_host/render_process_host_impl.cc (right): https://codereview.chromium.org/2921063003/diff/80001/content/browser/renderer_host/render_process_host_impl.cc#newcode2874 content/browser/renderer_host/render_process_host_impl.cc:2874: return policy->IsLockedToOrigin(host->GetID(), site_url); ...

3 years, 6 months ago (2017-06-19 20:03:59 UTC) #45

Thanks for reviewing!  More responses below.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
File content/browser/renderer_host/render_process_host_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:2874: return
policy->IsLockedToOrigin(host->GetID(), site_url);
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> On 2017/06/14 22:39:05, alexmos wrote:
> > On 2017/06/12 23:08:17, Charlie Reis wrote:
> > > I'm a little concerned about TOCTTOU problems here, since HasOriginLock
and
> > > IsLockedToOrigin separately acquire and release the AutoLock.  I'm not
sure
> I
> > > can picture a scenario where it's a problem (since once HasOriginLock
> returns
> > > true, the value of either method isn't going to change), but it feels
> fragile.
> > > 
> > > I suppose we could document the risks on the declarations.  Or we could
> store
> > > which site the process is locked to on RPHI, so we don't need to acquire
the
> > > AutoLock to query it.  Thoughts?
> > 
> > Yes, good questions.  Since we always set origin locks on the UI thread, I
> don't
> > think this will be a problem in practice.  We could add documentation/DCHECK
> to
> > that effect in ChildProcessSecurityPolicyImpl::LockToOrigin.
> > 
> > I did think about this, and in an earlier PS, I actually had a version of
this
> > which simply queried policy->GetOriginLock(id), which we can then check for
> > is_empty() and for site_url equality.  I ultimately moved away from that, as
I
> > thought it was better to keep origin lock checks encapsulated inside CPSP.
> > 
> > Another option could be for IsLockedToOrigin(id, site_url) to return an enum

> > corresponding to the three {no_lock, wrong_lock, right_lock} cases.  And
> perhaps
> > rename it to CheckOriginLock.  This makes it a bit more difficult to use,
but
> it
> > avoids the TOCTTOU concerns completely, and also forces call sites to
> explicitly
> > reason about whether or not they care about an empty origin lock.
> > 
> > WDYT?
> 
> I really like the enum idea, especially because it makes people think about
the
> various cases when checking the origin lock.

Done.

https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere...
content/browser/renderer_host/render_process_host_impl.cc:3009:
!IsSuitableHost(host, browser_context, site_url))) {
On 2017/06/17 23:13:52, Charlie Reis (slow) wrote:
> On 2017/06/14 22:39:05, alexmos wrote:
> > On 2017/06/12 23:08:17, Charlie Reis wrote:
> > > On 2017/06/12 17:35:07, alexmos wrote:
> > > > After my IsSuitableHost modifications, a few extension tests (such as
> > > > WebNavigationApiTest.UserAction) caught this bug, which caused unneeded
> > > process
> > > > swaps (when site_url was used for origin lock comparison).
> > 
> > Sorry, typo here - this should've just been "when |url| was used for origin
> lock
> > comparison."
> > 
> > > 
> > > Nice.  Can you give an example of what the previous values were?  I'm not
> sure
> > I
> > > follow what the bug was.
> > > 
> > > (It may also be worth mentioning something to this effect in the comment
> > above,
> > > not necessarily with the example.)
> > 
> > So, for example, the old code would pass in
> >   chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/a.html?32773
> > to IsSuitableHost, which would compare that against the origin lock of
> >   chrome-extension://plbgfcnklnmpjkhjcfeaibohlaohmjhc/
> > and fail, which would result in returning null from here and not staying in
> the
> > existing extension process.
> 
> I see now-- thanks!
> 
> 
> > I added a short comment about this; I think it's just a straightforward case
> of
> > passing the wrong thing.  It'd be nice if site URLs used a different type to
> > avoid these kinds of bugs, since this isn't the first time it happened (I
> think
> > Nick fixed another one somewhere).
> 
> Agreed.  We wanted to introduce a notion of a principal earlier (I think in
> https://crbug.com/109792 and perhaps another bug having to do with
> BrowserPlugin's guest SiteInstances), but it was vetoed at the time.  I think
> we'll want to revisit that, especially as (1) the definition of site changes
to
> sometimes include origins and/or ports, (2) we want to introduce knowledge of
> StoragePartition to support cross-process navigations and OOPIFs in guests,
and
> (3) we continue to discover more reasons that GURL or Origin alone are not
> sufficient to represent a security principal.
> 
> I think this will be a fun project, once we have a chance.  Want to file a bug
> for it?

Done - filed https://bugs.chromium.org/p/chromium/issues/detail?id=734722

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> On 2017/06/14 22:39:05, alexmos wrote:
> > On 2017/06/12 23:08:17, Charlie Reis wrote:
> > > On 2017/06/12 17:35:07, alexmos wrote:
> > > > Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
> > > > accessibility tests) expect to navigate A->about:blank->A and for all
this
> > to
> > > > end up in same process (i.e., no swap for about:blank).  Without this
> check,
> > > > with --site-per-process, we'd go into IsSuitableHost, which would
compare
> > the
> > > > origin lock against "about:" (site_url for about:blank) and fail.  So I
> > added
> > > a
> > > > special case here.
> > > 
> > > This makes me a bit nervous.  In other places, we've had to deal with
> > > about:srcdoc and data: URLs alongside about:blank (e.g., later in
> > > DetermineSiteInstanceForURL).  Those seem like they would have the same
> > problem.
> > > 
> > > Is there a way we can reorder or share code to avoid having to copy/paste
> all
> > > these cases in both places?
> > 
> > This is a bit tricky.  Just to clarify, my intent here was to fix only
> > browser-initiated navigations (the tests were doing this via NavigateToURL),
> and
> > the DetermineSiteInstanceForURL special cases only matter for
> renderer-initiated
> > navigations (since they kick in only when source_instance is not null).  The
> > renderer-initiated navigations don't really reach the change here: subframes
> > will replace about:blank, data:, and other unique-origin URLs with the
source
> > SiteInstance's site URL in CanSubframeSwapProcess before reaching
> > IsRendererTransferNeededForNavigation -> IsCurrentlySameSite ->
> > HasWrongProcessForURL.  For top-level frames, we will hit the logic you
> > mentioned in DetermineSiteInstanceForURL before ever getting here.  But for
> > browser-initiated navigations, DetermineSiteInstanceForURL doesn't use the
> > about:blank logic (since source_instance is null), we fall through to check
> > IsCurrentlySameSite, which first checks HasWrongProcessForURL and then
> > SiteInstance::IsSameWebSite.  Before this CL, HasWrongProcessForURL was
false
> > (i.e., process was good), and SiteInstance::IsSameWebSite handled this case
> > here:
> > 
> >   // If the destination url is just a blank page, we treat them as part of
the
> >   // same site.
> >   GURL blank_page(url::kAboutBlankURL);
> >   if (dest_url == blank_page)
> >     return true;
> > 
> > After this CL, the origin locks caused HasWrongProcessForURL to return true
> > before we could get to that about:blank special case in IsSameWebSite and
say
> > that it's really same-site.  So the reason I only covered about:blank here
was
> > to mirror the case in IsSameWebSite.
> > 
> > I could merge the about:blank usage in HasWrongProcessForURL and in
> > IsSameWebSite into something like ShouldConsiderURLSameSite, but do you
think
> > this is worth it for just about:blank?  I don't think we want to do this for
> > data:, which isn't same-site, and it feels safer if HasWrongProcessForURL
> forces
> > a process swap for a browser-initiated data: navigation from a site with a
> > dedicated process.  (Also, actually handling it and returning false would
have
> > no effect, as  it would force a swap anyway later on, because IsSameWebSite
> will
> > return false.)  It also doesn't seem like you can get a browser-initiated
> > about:srcdoc navigation, can you?
> > 
> > It's possible that we could avoid this special case and just force a process
> > swap for browser-initiated about:blank navigations (e.g., if user types in
> > about:blank into omnibox), and fix the accessibility test harness to expect
> the
> > process swap, though I don't know how difficult that'd be, and swapping on
> > about:blank still seems a bit wasteful.
> 
> Interesting points.  I think you're right that we can keep this as you have
it. 
> I'm sure I won't remember the distinction between this and the
> renderer-initiated case (which does care about data: and about:srcdoc), so
maybe
> there's a way to leave a short explanation in a comment why checking
about:blank
> is sufficient here?

Comment added - please let me know if it's helpful enough.

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
File content/public/browser/render_process_host.h (right):

https://codereview.chromium.org/2921063003/diff/80001/content/public/browser/...
content/public/browser/render_process_host.h:359: virtual bool
CanBecomeDedicatedProcess() = 0;
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> On 2017/06/14 22:39:05, alexmos wrote:
> > On 2017/06/12 23:08:17, Charlie Reis (slow) wrote:
> > > On 2017/06/12 17:35:07, alexmos wrote:
> > > > These are public because of MockRenderProcessHost. :(  I struggled with
> > > finding
> > > > good names here.  What I really wanted this to mean was that this is
true
> up
> > > > until the first LockToOrigin call (which might come either when the
first
> > > > navigation commits, or when a SiteInstance with a site that's already
set
> > > calls
> > > > LockToOrigin).  I.e., this intends to be true for
> > > > uninitialized/untainted/site-less SiteInstances so that IsSuitableHost
> > allows
> > > > their process to be used as a dedicated process.  Without this, the very
> > first
> > > > navigation in a new tab caused a process swap from the initial
> SiteInstance,
> > > > causing lots of test failures.
> > > 
> > > Yeah, I wonder if there's a better concept to be tracking here.  Would
> > something
> > > like IsUnused work, with documentation that it gets cleared in any of the
> > > following cases?
> > > 1) Commits a page.
> > > 2) Given to a SiteInstance that already has a site assigned.
> > > 3) Hosts a ServiceWorker or SharedWorker.
> > > 
> > > That makes it potentially more generally useful, without being specific to
> the
> > > dedicated process case.
> > 
> > I like this, especially given the potential for this to be used elsewhere. 
> > Changed.
> > 
> > (Note that I didn't add (3) to the comment - see my explanation in
> >
>
https://codereview.chromium.org/2921063003/diff/80001/content/browser/rendere....
> >  Though I guess I could say that ServiceWorkers are implicitly handled via
> (2),
> > and SharedWorkers are implicitly handled via (1).)
> 
> Yes, saying it's implicit in those cases might be worthwhile, just to show
that
> those cases are covered.

Done.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/render...
File content/browser/renderer_host/render_process_host_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/render...
content/browser/renderer_host/render_process_host_impl.h:626: // not any web
content, and it has not been given to a SiteInstance that has
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> nit: has not committed any web content

Done.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.cc:415: // Guest processes cannot be locked
to its site because guests always have
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> nit: s/its/their/

Done.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.cc:423: // TODO(creis, nick)
https://crbug.com/510588 Chrome UI pages use the same
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> Oh, I see.  Actually, dbeam@ was just working on a CL
> (https://codereview.chromium.org/2931243002/) that may unblock a lot of that,
if
> I understood correctly.

Ack - thanks for the pointer!

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:152: // Returns true unless a process for
|site_url| cannot be locked to just that
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> nit: s/unless/if/, s/cannot/should/
> (Otherwise it's too hard to tell if the comment and method name agree.)

Done.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:153: // site. Returning true here implies
that |site_url| requires a dedicated
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> nit: also implies

Done.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:154: // process. However, the converse does
not hold: if |site_url| requires a
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> I'm finding this pretty hard to read, since "lock to origin" and "requires
> dedicated process" sound synonymous.  Can you explain the difference in the
> comment a bit more?

Done, and also see below.

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:156: // cases, most of which are temporary
bug workarounds.
On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> "Temporary bug workarounds" is confusing me a bit.  I thought WebUI and
> extensions were the cases that required "dedicated" processes but weren't
locked
> to origin, but those don't sound like temporary bug workarounds.  (I thought
> that at least the extension case was about letting extensions share with each
> other but not other types of processes.)  Are there plans to change that?

I was mostly thinking about this comment on
ContentBrowserClient::ShouldLockToOrigin when I wrote this:

  // Returns true unless the effective URL is part of a site that cannot live in
  // a process restricted to just that site.  This is only called if site
  // isolation is enabled for this URL, and is a bug workaround.
  //
  // TODO(nick): Remove this function once https://crbug.com/160576 is fixed,
  // and origin lock can be applied to all URLs.
  virtual bool ShouldLockToOrigin(BrowserContext* browser_context,
                                  const GURL& effective_url);

The other two parts of it in content/ (guests and chrome://) also have TODOs and
sound like we can eventually remove them.  But overall, I agree that it's not
clear how "temporary" some of these are (e.g., extension process reuse case), so
I've rephrased this -- let me know if that's better.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-19 21:01:02 UTC) #46

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-clang/builds/120785)

3 years, 6 months ago (2017-06-19 21:01:04 UTC) #47

Charlie Reis

Wow, sorry for the longer-than-expected delay. Thanks for updating the comments and CPSP API. LGTM! ...

3 years, 5 months ago (2017-06-28 00:08:38 UTC) #48

Wow, sorry for the longer-than-expected delay.  Thanks for updating the comments
and CPSP API.  LGTM!

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
File content/browser/site_instance_impl.cc (right):

https://codereview.chromium.org/2921063003/diff/80001/content/browser/site_in...
content/browser/site_instance_impl.cc:215: if (url == url::kAboutBlankURL)
On 2017/06/19 20:03:58, alexmos wrote:
> On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> > On 2017/06/14 22:39:05, alexmos wrote:
> > > On 2017/06/12 23:08:17, Charlie Reis wrote:
> > > > On 2017/06/12 17:35:07, alexmos wrote:
> > > > > Several tests (like DumpAccessibilityTreeTest.AccessibilityA or other
> > > > > accessibility tests) expect to navigate A->about:blank->A and for all
> this
> > > to
> > > > > end up in same process (i.e., no swap for about:blank).  Without this
> > check,
> > > > > with --site-per-process, we'd go into IsSuitableHost, which would
> compare
> > > the
> > > > > origin lock against "about:" (site_url for about:blank) and fail.  So
I
> > > added
> > > > a
> > > > > special case here.
> > > > 
> > > > This makes me a bit nervous.  In other places, we've had to deal with
> > > > about:srcdoc and data: URLs alongside about:blank (e.g., later in
> > > > DetermineSiteInstanceForURL).  Those seem like they would have the same
> > > problem.
> > > > 
> > > > Is there a way we can reorder or share code to avoid having to
copy/paste
> > all
> > > > these cases in both places?
> > > 
> > > This is a bit tricky.  Just to clarify, my intent here was to fix only
> > > browser-initiated navigations (the tests were doing this via
NavigateToURL),
> > and
> > > the DetermineSiteInstanceForURL special cases only matter for
> > renderer-initiated
> > > navigations (since they kick in only when source_instance is not null). 
The
> > > renderer-initiated navigations don't really reach the change here:
subframes
> > > will replace about:blank, data:, and other unique-origin URLs with the
> source
> > > SiteInstance's site URL in CanSubframeSwapProcess before reaching
> > > IsRendererTransferNeededForNavigation -> IsCurrentlySameSite ->
> > > HasWrongProcessForURL.  For top-level frames, we will hit the logic you
> > > mentioned in DetermineSiteInstanceForURL before ever getting here.  But
for
> > > browser-initiated navigations, DetermineSiteInstanceForURL doesn't use the
> > > about:blank logic (since source_instance is null), we fall through to
check
> > > IsCurrentlySameSite, which first checks HasWrongProcessForURL and then
> > > SiteInstance::IsSameWebSite.  Before this CL, HasWrongProcessForURL was
> false
> > > (i.e., process was good), and SiteInstance::IsSameWebSite handled this
case
> > > here:
> > > 
> > >   // If the destination url is just a blank page, we treat them as part of
> the
> > >   // same site.
> > >   GURL blank_page(url::kAboutBlankURL);
> > >   if (dest_url == blank_page)
> > >     return true;
> > > 
> > > After this CL, the origin locks caused HasWrongProcessForURL to return
true
> > > before we could get to that about:blank special case in IsSameWebSite and
> say
> > > that it's really same-site.  So the reason I only covered about:blank here
> was
> > > to mirror the case in IsSameWebSite.
> > > 
> > > I could merge the about:blank usage in HasWrongProcessForURL and in
> > > IsSameWebSite into something like ShouldConsiderURLSameSite, but do you
> think
> > > this is worth it for just about:blank?  I don't think we want to do this
for
> > > data:, which isn't same-site, and it feels safer if HasWrongProcessForURL
> > forces
> > > a process swap for a browser-initiated data: navigation from a site with a
> > > dedicated process.  (Also, actually handling it and returning false would
> have
> > > no effect, as  it would force a swap anyway later on, because
IsSameWebSite
> > will
> > > return false.)  It also doesn't seem like you can get a browser-initiated
> > > about:srcdoc navigation, can you?
> > > 
> > > It's possible that we could avoid this special case and just force a
process
> > > swap for browser-initiated about:blank navigations (e.g., if user types in
> > > about:blank into omnibox), and fix the accessibility test harness to
expect
> > the
> > > process swap, though I don't know how difficult that'd be, and swapping on
> > > about:blank still seems a bit wasteful.
> > 
> > Interesting points.  I think you're right that we can keep this as you have
> it. 
> > I'm sure I won't remember the distinction between this and the
> > renderer-initiated case (which does care about data: and about:srcdoc), so
> maybe
> > there's a way to leave a short explanation in a comment why checking
> about:blank
> > is sufficient here?
> 
> Comment added - please let me know if it's helpful enough.

Thanks-- that works!

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:156: // cases, most of which are temporary
bug workarounds.
On 2017/06/19 20:03:59, alexmos wrote:
> On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> > "Temporary bug workarounds" is confusing me a bit.  I thought WebUI and
> > extensions were the cases that required "dedicated" processes but weren't
> locked
> > to origin, but those don't sound like temporary bug workarounds.  (I thought
> > that at least the extension case was about letting extensions share with
each
> > other but not other types of processes.)  Are there plans to change that?
> 
> I was mostly thinking about this comment on
> ContentBrowserClient::ShouldLockToOrigin when I wrote this:
> 
>   // Returns true unless the effective URL is part of a site that cannot live
in
>   // a process restricted to just that site.  This is only called if site
>   // isolation is enabled for this URL, and is a bug workaround.
>   //
>   // TODO(nick): Remove this function once https://crbug.com/160576 is fixed,
>   // and origin lock can be applied to all URLs.
>   virtual bool ShouldLockToOrigin(BrowserContext* browser_context,
>                                   const GURL& effective_url);

Huh.  How would we apply origin lock to all URLs?  Maybe the idea there was to
allow a process to be locked to multiple origins, and keep track of them all via
origin lock?

> The other two parts of it in content/ (guests and chrome://) also have TODOs
and
> sound like we can eventually remove them.  But overall, I agree that it's not
> clear how "temporary" some of these are (e.g., extension process reuse case),
so
> I've rephrased this -- let me know if that's better.

Yes, thanks.  The new comment works for me.

alexmos

Thanks for the review! https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_instance_impl.h File content/browser/site_instance_impl.h (right): https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_instance_impl.h#newcode156 content/browser/site_instance_impl.h:156: // cases, most of which ...

3 years, 5 months ago (2017-06-28 00:33:26 UTC) #49

Thanks for the review!

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
File content/browser/site_instance_impl.h (right):

https://codereview.chromium.org/2921063003/diff/140001/content/browser/site_i...
content/browser/site_instance_impl.h:156: // cases, most of which are temporary
bug workarounds.
On 2017/06/28 00:08:37, Charlie Reis (slow) wrote:
> On 2017/06/19 20:03:59, alexmos wrote:
> > On 2017/06/17 23:13:53, Charlie Reis (slow) wrote:
> > > "Temporary bug workarounds" is confusing me a bit.  I thought WebUI and
> > > extensions were the cases that required "dedicated" processes but weren't
> > locked
> > > to origin, but those don't sound like temporary bug workarounds.  (I
thought
> > > that at least the extension case was about letting extensions share with
> each
> > > other but not other types of processes.)  Are there plans to change that?
> > 
> > I was mostly thinking about this comment on
> > ContentBrowserClient::ShouldLockToOrigin when I wrote this:
> > 
> >   // Returns true unless the effective URL is part of a site that cannot
live
> in
> >   // a process restricted to just that site.  This is only called if site
> >   // isolation is enabled for this URL, and is a bug workaround.
> >   //
> >   // TODO(nick): Remove this function once https://crbug.com/160576 is
fixed,
> >   // and origin lock can be applied to all URLs.
> >   virtual bool ShouldLockToOrigin(BrowserContext* browser_context,
> >                                   const GURL& effective_url);
> 
> Huh.  How would we apply origin lock to all URLs?  Maybe the idea there was to
> allow a process to be locked to multiple origins, and keep track of them all
via
> origin lock?

I assumed this just meant "origin lock can be applied to all URLs that require a
dedicated process", given that ContentBrowserClient::ShouldLockToOrigin() is
only called when RequiresDedicatedProcess() is true in LockToOrigin().  But
maybe we should double-check with Nick.

> 
> > The other two parts of it in content/ (guests and chrome://) also have TODOs
> and
> > sound like we can eventually remove them.  But overall, I agree that it's
not
> > clear how "temporary" some of these are (e.g., extension process reuse
case),
> so
> > I've rephrased this -- let me know if that's better.
> 
> Yes, thanks.  The new comment works for me.

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/160001

3 years, 5 months ago (2017-06-28 00:34:07 UTC) #51

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 5 months ago (2017-06-28 00:36:29 UTC) #52

commit-bot: I haz the power

Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/241082) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, ...

3 years, 5 months ago (2017-06-28 00:36:31 UTC) #53

alexmos

The patchset sent to the CQ was uploaded after l-g-t-m from creis@chromium.org Link to the ...

3 years, 5 months ago (2017-06-28 00:52:19 UTC) #55

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/180001

3 years, 5 months ago (2017-06-28 00:52:47 UTC) #56

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 5 months ago (2017-06-28 02:42:57 UTC) #57

commit-bot: I haz the power

Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/489574)

3 years, 5 months ago (2017-06-28 02:42:58 UTC) #58

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2921063003/180001

3 years, 5 months ago (2017-06-28 03:42:19 UTC) #60

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 180001, "attempt_start_ts": 1498621320788840, "parent_rev": "ac4fed459c00385fa0c86a041259f0bad16b029a", "commit_rev": "13fe1960e0819a5b5e642de1b5b845875601f769"}

3 years, 5 months ago (2017-06-28 04:25:15 UTC) #61

commit-bot: I haz the power

Description was changed from ========== Fix process reuse for dedicated processes when over process limit. ...

3 years, 5 months ago (2017-06-28 04:25:28 UTC) #62

Message was sent while issue was closed.

Description was changed from

==========
Fix process reuse for dedicated processes when over process limit.

Previously, process reuse was broken for --isolate-origins, where an
isolated origin was allowed to reuse a process containing content from
other non-isolated sites when over process limit.  It was also overly
restrictive for --site-per-process, where a site could never reuse an
existing process when over process limit, even if the process was
dedicated to the same site.

This CL fixes these cases by modifying
RenderProcessHostImpl::IsSuitableHost() to consult
ChildProcessSecurityPolicy and check whether the process is locked to
a compatible site.  For --site-per-process, it removes the "return
false" in ShouldTryToUseExistingProcessHost(), which allows using
IsSuitableHost().

One tricky case was caused by lazily-initialied SiteInstances: when a
new WebContents is created with a new SiteInstance that has no site,
that SiteInstance's process should be allowed to host a site that
requires a dedicated process until something commits in it and the
SiteInstance sets its site.  Looking purely at origin locks isn't
sufficient here, as we can't tell apart the case where the process is
brand new and hasn't yet hosted any content vs. process that has
committed URLs that don't require dedicated processes -- both end up
with empty origin locks.  To track this state, a new flag is added to
RenderProcessHost.

Another tricky case is when a page navigates itself to about:blank.
This ought to be allowed to stay in-process, even for dedicated
processes.  This required adding an extra check to
HasWrongProcessForURL() so it allows about:blank to stay in-process
before it consults IsSuitableHost(), which would otherwise prevent
this.

The changes in this CL also revealed a bug in
RenderProcessHostImpl::GetProcessHostForSite, which passed in a full
URL instead of site URL to IsSuitableHost.

BUG=513036,713444
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
==========

to

==========
Fix process reuse for dedicated processes when over process limit.

Previously, process reuse was broken for --isolate-origins, where an
isolated origin was allowed to reuse a process containing content from
other non-isolated sites when over process limit.  It was also overly
restrictive for --site-per-process, where a site could never reuse an
existing process when over process limit, even if the process was
dedicated to the same site.

This CL fixes these cases by modifying
RenderProcessHostImpl::IsSuitableHost() to consult
ChildProcessSecurityPolicy and check whether the process is locked to
a compatible site.  For --site-per-process, it removes the "return
false" in ShouldTryToUseExistingProcessHost(), which allows using
IsSuitableHost().

One tricky case was caused by lazily-initialied SiteInstances: when a
new WebContents is created with a new SiteInstance that has no site,
that SiteInstance's process should be allowed to host a site that
requires a dedicated process until something commits in it and the
SiteInstance sets its site.  Looking purely at origin locks isn't
sufficient here, as we can't tell apart the case where the process is
brand new and hasn't yet hosted any content vs. process that has
committed URLs that don't require dedicated processes -- both end up
with empty origin locks.  To track this state, a new flag is added to
RenderProcessHost.

Another tricky case is when a page navigates itself to about:blank.
This ought to be allowed to stay in-process, even for dedicated
processes.  This required adding an extra check to
HasWrongProcessForURL() so it allows about:blank to stay in-process
before it consults IsSuitableHost(), which would otherwise prevent
this.

The changes in this CL also revealed a bug in
RenderProcessHostImpl::GetProcessHostForSite, which passed in a full
URL instead of site URL to IsSuitableHost.

BUG=513036,713444
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/2921063003
Cr-Commit-Position: refs/heads/master@{#482879}
Committed:
https://chromium.googlesource.com/chromium/src/+/13fe1960e0819a5b5e642de1b5b8...
==========

commit-bot: I haz the power

3 years, 5 months ago (2017-06-28 04:25:30 UTC) #63

Message was sent while issue was closed.

Committed patchset #10 (id:180001) as
https://chromium.googlesource.com/chromium/src/+/13fe1960e0819a5b5e642de1b5b8...

Issue 2921063003: Fix process reuse for dedicated processes when over process limit. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Test fixes #

Patch Set 3 : More fixes #

Patch Set 4 : Fix some about:blank cases #

Patch Set 5 : Cleanup #

Patch Set 6 : Charlie's comments #

Patch Set 7 : Rebase #

Patch Set 8 : Fix IsSuitableHost for sites that require a dedicated process but don't set an origin lock #

Patch Set 9 : Charlie's comments #

Patch Set 10 : Rebase #

Messages