|
|
DescriptionCreate FormatUrlForSecurityDisplay helper function.
FormatUrlForSecurityDisplay provides a canonical, concise, human-friendly
display for origins.
Based heavily on earlier work by dewittj and felt. Thanks!
BUG=402698
Committed: https://crrev.com/d72cc48699ed600c26911b0061f44aa9cfc39e62
Cr-Commit-Position: refs/heads/master@{#332736}
Patch Set 1 #Patch Set 2 : Add a caller. #Patch Set 3 : Change both call sites in content_setting_bubble_model, and supply the preferred languages! #
Total comments: 18
Patch Set 4 : Respond to comments, add tests with languages. #
Total comments: 26
Patch Set 5 : Respond to comments. #Patch Set 6 : Add a test to ensure we show trailing dots in DNS names. #
Total comments: 6
Patch Set 7 : Recurse instead of duplicate; clean-up. #
Total comments: 9
Patch Set 8 : Respond to comments. #Patch Set 9 : Rename to FormatUrlForSecurityDisplay. #
Total comments: 16
Patch Set 10 : Update some test expectations, and fix nits. #
Total comments: 6
Patch Set 11 : Change testing strategy, respond to comments. #
Messages
Total messages: 41 (4 generated)
palmer@chromium.org changed reviewers: + felt@chromium.org, msw@chromium.org
Please take a look: A new function for presenting origins to users. With tests and an example caller (replacing 1 of only 2 call sites for net::AppendFormattedHost).
> and an example caller (replacing 1 of only 2 call sites for > net::AppendFormattedHost). Er, 2 files have call sites; there may be more than 2 total call sites.
msw@chromium.org changed reviewers: + asanka@chromium.org, brettw@chromium.org, dewittj@chromium.org, rsleevi@chromium.org, thakis@chromium.org
I haven't done a full review... Why didn't you add the reviewers of the original CL (now added)? They already have some context for this change and feedback on the work. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... File chrome/browser/ui/content_settings/content_setting_bubble_model.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:42: #include "net/base/net_util.h" Can this be removed now? https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:686: url, profile()->GetPrefs()->GetString(prefs::kAcceptLanguages)) Doesn't this need a bool argument and a semicolon? https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:687: std::string display_host(base::UTF16ToUTF8(display_host_utf16)); nit: fix indent. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:312: return (omit_scheme ? base::string16() : base::ASCIIToUTF16("file://")) + If we aren't omitting the scheme from a file URL, why not just return the entire URL instead of rebuilding it from the scheme and path? https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:324: } else { nit: no else after return. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:56: // nit: remove this blank comment line; the next paragraph meshes with this well. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:59: // necessity. As examples: nit: "For example when GURL::SchemeIsCryptographic() or content::IsOriginSecure() is true." and then remove the example lines below. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:66: bool omit_scheme); Do we need an |omit_scheme| argument already? Why can't it be determined automatically based on the scheme security? https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:210: const char* languages; None of the test cases supply languages, can this be removed or should another test check some language cases?
https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... File chrome/browser/ui/content_settings/content_setting_bubble_model.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:42: #include "net/base/net_util.h" On 2015/05/29 00:18:51, msw wrote: > Can this be removed now? Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:686: url, profile()->GetPrefs()->GetString(prefs::kAcceptLanguages)) On 2015/05/29 00:18:51, msw wrote: > Doesn't this need a bool argument and a semicolon? Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/conte... chrome/browser/ui/content_settings/content_setting_bubble_model.cc:687: std::string display_host(base::UTF16ToUTF8(display_host_utf16)); On 2015/05/29 00:18:51, msw wrote: > nit: fix indent. Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:312: return (omit_scheme ? base::string16() : base::ASCIIToUTF16("file://")) + On 2015/05/29 00:18:51, msw wrote: > If we aren't omitting the scheme from a file URL, why not just return the entire > URL instead of rebuilding it from the scheme and path? Because it might have non-origin material such as query string or fragment. And the host is not part of the origin for a file: URL. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:324: } else { On 2015/05/29 00:18:51, msw wrote: > nit: no else after return. Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:56: // On 2015/05/29 00:18:51, msw wrote: > nit: remove this blank comment line; the next paragraph meshes with this well. Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:59: // necessity. As examples: On 2015/05/29 00:18:51, msw wrote: > nit: "For example when GURL::SchemeIsCryptographic() or > content::IsOriginSecure() is true." and then remove the example lines below. Done. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:66: bool omit_scheme); > Do we need an |omit_scheme| argument already? Why can't it be determined > automatically based on the scheme security? 1. Flexibility. Callers may have different needs. 2. Because the web is not yet at the place where people can safely assume that "no scheme shown must mean it was a secure scheme". Someday, hopefully soon, but not yet. In the short term I expect many/most callers to pass false even if GURL::SchemeIsCryptographic returns true. https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/40001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:210: const char* languages; On 2015/05/29 00:18:51, msw wrote: > None of the test cases supply languages, can this be removed or should another > test check some language cases? Done.
> I haven't done a full review... Why didn't you add the reviewers of the original > CL (now added)? They already have some context for this change and feedback on > the work. brettw and thakis were owners of files that, in the previous CL, I only changed comments in. To avoid bothering them, I decided not to add those not-exactly-crucial comments in this CL. I hope that the net/ people are happy with this CL, since it removes call sites for a function they want to move out of net/. (Indeed, its only callers are outside of net/.)
Does it make sense to also include an ElideOrigin method in this file as well? I know that in my use case (notifications) long origins are not always going to fit in the space available.
> Does it make sense to also include an ElideOrigin method in this file as well? > I know that in my use case (notifications) long origins are not always going to > fit in the space available. Yes, quite likely. But in another CL, please.
Yes, definitely happy this is landing outside of //net, and this seems like a reasonable place for it to land unless/until someone needs it elsewhere, so that works. From the original review, there's still several sharp edges regarding the usage of //url here, so I've left some suggestions on how to avoid those, as well as some places where they're causing bugs (from an API contract view). Mostly, this is just the special hell of URLs. Oh, and there's one more thing to consider about - how you want to canonicalize URLs with dots in trailing hostnames http://accounts.google.com/ http://accounts.google.com./ The trailing '.' is just a DNS resolver hint. On the DNS wire, these two hostnames are equivalent, but up until they hit the wire, they aren't (e.g. calling the gethostbyname() functions, these are semantically and syntactically different hosts, until they go out via UDP:53) Because they're different presently from a perspective of the Same Origin Policy (we haven't sorted out the canonicalization), it's probably OK to show these differently. But if there's a UX component to this, maybe it makes sense to show them the same. I'm not sure which you want to have happen here, but worth calling out. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:308: if (!url.IsStandard() || url.is_empty() || !url.is_valid()) BUG: You want to reverse the order of these checks. While arguably an API wart, IsStandard() is only safe to be called if is_valid() is true. You can see this in the //url implementation, and in those that use //url [Yes, it's "weird" that GURL can have partially initialized state - this is the grand debate about Factory Functions vs Constructors, for which reasonable arguments exist for both] Because IsStandard() requires a scheme component to be parsed, and that won't be true for a !is_valid() URL (canonical example being "GURL url;"), you want to work through each of these checks if (!url.is_valid() || url.is_empty() || !url.IsStandard()) return ... https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:313: base::UTF8ToUTF16(url.path()); So this is a little weird for a few reasons As crazy as it may sound, file URLs can be interpreted as having a host portion (e.g. a UNC path can be parsed as a host+path), and this would make them appear as local files Off the top-of-my-head, I'm not quite sure what you want to happen, so I'm not quite sure what to suggest here, other than to point out the host is relevant. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:322: base::UTF8ToUTF16(inner_url->path()) + Does it make sense to recurse in to this function for inner_url? It seems like you're duplicating the logic of these two lines and 312/313, and you'll want them to stay in sync? Only if and when you need to distinguish these two would it seem good to break this out. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:325: return base::ASCIIToUTF16("filesystem:") + I'd recommend you replace all of these constants with the ones from //url/url_constants.h Beyond being existing constants (for example, you've duplicated "filesystem" and "file" here a few times in this function), it also gives the compiler enough grace to know these are intentionally duplicate constants, rather than hoping the linker will fold them with string constant reduction (which it wouldn't be able to do across translation units in a shared library build, but it would be able to do if you used the constants) https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:332: if (scheme.empty() || host.empty()) Perhaps you could expand a comment about why this is falling in to net::FormatURL? What are the situations where this could happen, and why is FormatUrl (which would include the path, presumably) appropriate? https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:337: result = base::UTF8ToUTF16(scheme) + base::ASCIIToUTF16("://"); This is kStandardSchemeSeparator, fwiw https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:344: if (origin.port().length() > 0 && port != 0 && port != default_port) This != 0 check seems odd. If the port isn't specified, the API contract is that it returns PORT_UNSPECIFIED. Is that what you meant to check? Also, it seems odd to check the .length() > 0 - is that short-hand for checking port != PORT_UNSPECIFIED? It just seems like you're mixing two different API contracts here https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:59: // content::IsOriginSecure() return true. Mostly a question for the enamel people - but should we omit the scheme for something like localhost? http://localhost and https://localhost would have different storage and permission models (at least, that's my understanding from Rayme's thread), and despite them both being apriori secure, they wouldn't be considered equivalent origins (this is also reiterated in the Cookie and URL specs) So it seems like if you're meaning to display origin, maybe eliding scheme should only be done for SchemeIsCryptographic? And if that is true, then like msw said, should this bool be dropped and that logic moved internal to here? That would allow you, for example, to change the policies on scheme-display in the future without having to update all call sites. Just trying to work through the eventual goal, as from an API shape, it does seem a little weird to foist that on the caller when it sounds like really, this is something that -enamel would unilaterally decide ("Today, we'll omit the scheme for all https:// URLs because https:// is the new norm") https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:216: const OriginTestData tests[] = { Suggestions: 1) Add IDN test cases 2) Add IPv6 test cases 3) Add some of the file:// test cases (UNC, no UNC) https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:542: base::string16 formatted = FormatOriginForDisplay(GURL(), "", false); nit: std::string() is preferred over "" (it's slightly more optimized; we talked about landing an auto-reformater as part of presubmit but opted not to add the execution costs to all developers, but still preferred practice)
https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:411: "en-US,en,am,zh", thought: I think it's important to test this with an RTL language. So that even if we aren't 100% sure we're doing the right thing (because there's lots of RTL work to do in general), we have documented behavior.
https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:335: "file://localhost/usr/example/file.html", As Ryan mentioned, we should worry about UNC filenames here. They will look like: file://server/some/path and should come out looking the same.
https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:308: if (!url.IsStandard() || url.is_empty() || !url.is_valid()) On 2015/05/29 18:28:31, Ryan Sleevi wrote: > BUG: You want to reverse the order of these checks. > > While arguably an API wart, IsStandard() is only safe to be called if is_valid() > is true. You can see this in the //url implementation, and in those that use > //url [Yes, it's "weird" that GURL can have partially initialized state - this > is the grand debate about Factory Functions vs Constructors, for which > reasonable arguments exist for both] > > Because IsStandard() requires a scheme component to be parsed, and that won't be > true for a !is_valid() URL (canonical example being "GURL url;"), you want to > work through each of these checks > > if (!url.is_valid() || url.is_empty() || !url.IsStandard()) > return ... Done. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:313: base::UTF8ToUTF16(url.path()); On 2015/05/29 18:28:31, Ryan Sleevi wrote: > So this is a little weird for a few reasons > > As crazy as it may sound, file URLs can be interpreted as having a host portion > (e.g. a UNC path can be parsed as a host+path), and this would make them appear > as local files > > Off the top-of-my-head, I'm not quite sure what you want to happen, so I'm not > quite sure what to suggest here, other than to point out the host is relevant. Although we humans know that (imagine Windows shares here) \\CONTOSO\someshare\goat.gif is potentially a different resource than \\127.0.0.1\someshare\goat.gif, I think it's the case that Chrome treats only the pathname as the origin. (Not that GURL::GetOrigin agrees...) I'm happy to change the code to do whatever the right thing is. But if the hostname does not change the origin of 2 otherwise identical file: URLs, then I think we should not display it. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:322: base::UTF8ToUTF16(inner_url->path()) + On 2015/05/29 18:28:31, Ryan Sleevi wrote: > Does it make sense to recurse in to this function for inner_url? I don't think so; not sure. inner_url behaves extremely oddly. This is the best I could do — short of string bashing, which you liked even less. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:325: return base::ASCIIToUTF16("filesystem:") + On 2015/05/29 18:28:31, Ryan Sleevi wrote: > I'd recommend you replace all of these constants with the ones from > //url/url_constants.h > > Beyond being existing constants (for example, you've duplicated "filesystem" and > "file" here a few times in this function), it also gives the compiler enough > grace to know these are intentionally duplicate constants, rather than hoping > the linker will fold them with string constant reduction (which it wouldn't be > able to do across translation units in a shared library build, but it would be > able to do if you used the constants) Done. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:332: if (scheme.empty() || host.empty()) On 2015/05/29 18:28:31, Ryan Sleevi wrote: > Perhaps you could expand a comment about why this is falling in to > net::FormatURL? > > What are the situations where this could happen, and why is FormatUrl (which > would include the path, presumably) appropriate? I was trying to be maximally defensive, but it turns out not to be necessary. Removed this code. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:337: result = base::UTF8ToUTF16(scheme) + base::ASCIIToUTF16("://"); On 2015/05/29 18:28:31, Ryan Sleevi wrote: > This is kStandardSchemeSeparator, fwiw Done. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:344: if (origin.port().length() > 0 && port != 0 && port != default_port) On 2015/05/29 18:28:31, Ryan Sleevi wrote: > This != 0 check seems odd. If the port isn't specified, the API contract is that > it returns PORT_UNSPECIFIED. Is that what you meant to check? > > Also, it seems odd to check the .length() > 0 - is that short-hand for checking > port != PORT_UNSPECIFIED? It just seems like you're mixing two different API > contracts here Just trying to be maximally defensive, again. When given "https://173.194.65.103:000" (test case "Invalid port 1"), the port is in fact 0. That test case is set to have the expected return value be L"https://173.194.65.103"; should I change it to L"https://173.194.65.103:0"? https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.h:59: // content::IsOriginSecure() return true. > Just trying to work through the eventual goal, as from an API shape, it does > seem a little weird to foist that on the caller when it sounds like really, this > is something that -enamel would unilaterally decide ("Today, we'll omit the > scheme for all https:// URLs because https:// is the new norm") OK, I'm convinced this policy should be internal to the function. Making that change. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:216: const OriginTestData tests[] = { On 2015/05/29 18:28:31, Ryan Sleevi wrote: > Suggestions: > > 1) Add IDN test cases > 2) Add IPv6 test cases > 3) Add some of the file:// test cases (UNC, no UNC) Done. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:335: "file://localhost/usr/example/file.html", On 2015/05/29 20:12:16, brettw wrote: > As Ryan mentioned, we should worry about UNC filenames here. They will look > like: > file://server/some/path > and should come out looking the same. I agree, if hostnames count as part of the origin for file: URLs. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:411: "en-US,en,am,zh", On 2015/05/29 18:31:02, felt wrote: > thought: > > I think it's important to test this with an RTL language. So that even if we > aren't 100% sure we're doing the right thing (because there's lots of RTL work > to do in general), we have documented behavior. Done. https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url_unittest.cc:542: base::string16 formatted = FormatOriginForDisplay(GURL(), "", false); On 2015/05/29 18:28:31, Ryan Sleevi wrote: > nit: std::string() is preferred over "" (it's slightly more optimized; we talked > about landing an auto-reformater as part of presubmit but opted not to add the > execution costs to all developers, but still preferred practice) Done.
https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:313: base::UTF8ToUTF16(url.path()); On 2015/05/29 20:40:10, palmer wrote: > On 2015/05/29 18:28:31, Ryan Sleevi wrote: > > So this is a little weird for a few reasons > > > > As crazy as it may sound, file URLs can be interpreted as having a host > portion > > (e.g. a UNC path can be parsed as a host+path), and this would make them > appear > > as local files > > > > Off the top-of-my-head, I'm not quite sure what you want to happen, so I'm not > > quite sure what to suggest here, other than to point out the host is relevant. > > Although we humans know that (imagine Windows shares here) > \\CONTOSO\someshare\goat.gif is potentially a different resource than > \\127.0.0.1\someshare\goat.gif, I think it's the case that Chrome treats only > the pathname as the origin. If true, this seems like a big bug.
> Oh, and there's one more thing to consider about - how you want to canonicalize > URLs with dots in trailing hostnames > > http://accounts.google.com/ > http://accounts.google.com./ > > The trailing '.' is just a DNS resolver hint. On the DNS wire, these two > hostnames are equivalent, but up until they hit the wire, they aren't (e.g. > calling the gethostbyname() functions, these are semantically and syntactically > different hosts, until they go out via UDP:53) > > Because they're different presently from a perspective of the Same Origin Policy > (we haven't sorted out the canonicalization), it's probably OK to show these > differently. But if there's a UX component to this, maybe it makes sense to show > them the same. > > I'm not sure which you want to have happen here, but worth calling out. If SOP treats them differently, then I want to show them differently. I'm going to add a test case to make sure we do.
On 2015/05/29 20:43:14, palmer wrote: > If SOP treats them differently, then I want to show them differently. I'm going > to add a test case to make sure we do. Well, this is the whole SOP debate about file:// URLs, which are magical little snowflakes for Chrome. RFC 6454 (abarth@'s baby) notes that file: URLs may return implementation-defined values for the URL, reflecting the historical debate. Section 4, part 4, of https://www.ietf.org/rfc/rfc6454.txt The "replacement" (ah standards) for 6454 is https://url.spec.whatwg.org/ , which is where we get the notion of how to handle blob URIs and filesystem URIs, which are also special little snowflakes that don't fit in / lead to incompatible results with 6454. Specifically, https://url.spec.whatwg.org/#origin It too notes that file:// URIs are special little snowflakes. I guess I point this out that you can't reason about file:// from the spec, you have to consider what Blink is doing in its SecurityOrigin space. Mike would know more about that, but I thought we did some real quasi-weird stuff for this. From a quick scan, we have two modes - whether or not we enforce file:// path separation. The default is we don't, and all file://'s are treated just as a single origin (specifically, file://). When we do, we treat them all as unique origins. But we definitely don't distinguish file URLs by host - OR by path - when the enforcement is off, so it seems like the presentation for that mode would just be file:// ?
On 2015/05/29 20:43:14, palmer wrote: > If SOP treats them differently, then I want to show them differently. I'm going > to add a test case to make sure we do. Sorry, comment #16 was a response to the file:// case. For this case, we currently treat them as distinct from SOP, although not through intentionality but 'happy accident'. I think the solution to this problem will eventually be solved if we're (Mike, Brett and I, although he hasn't talked to Brett yet to avoid bothering him :P) able to figure out a good solution for representing Origin's on the Chrome side. There's still some issues being worked out there. If we are, then having this function use Origin will "do the right thing" (probably)
https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/60001/chrome/browser/ui/elide... chrome/browser/ui/elide_url.cc:344: if (origin.port().length() > 0 && port != 0 && port != default_port) On 2015/05/29 20:40:10, palmer wrote: > On 2015/05/29 18:28:31, Ryan Sleevi wrote: > > This != 0 check seems odd. If the port isn't specified, the API contract is > that > > it returns PORT_UNSPECIFIED. Is that what you meant to check? > > > > Also, it seems odd to check the .length() > 0 - is that short-hand for > checking > > port != PORT_UNSPECIFIED? It just seems like you're mixing two different API > > contracts here > > Just trying to be maximally defensive, again. > > When given "https://173.194.65.103:000" (test case "Invalid port 1"), the port > is in fact 0. That test case is set to have the expected return value be > L"https://173.194.65.103"; should I change it to L"https://173.194.65.103:0"? Yeah, I think :0 is correct, because that's actually a different origin than "https://173.194.65.103" [0 is an invalid, out of range port - there's a whole lot of URL spec angst going around how to handle that, but effectively, the URL is invalid at that point, because :0 isn't a valid port. As such, it's good to preserve the :0] https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:316: return (base::ASCIIToUTF16(url::kFileScheme)) + scheme_separator + nit: You don't need the wrapping () https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:327: base::UTF8ToUTF16(url.path()); I'm not sure I fully grokked your reply about this. That is, it seems like it could be return base::ASCIIToUTF16(url::kFileSystemScheme) + colon + FormatOriginForDisplay(inner_url) + base::UTF8ToUTF16(url.path()); You're doing string manipulation either way, but I think the type of string manipulation you're doing is fine for this purpose. The old code was more akin to GURL::GetWithEmptyPath(), which not as safe as GWEP Alternatively, it seems like you could also replace this (lines 323 - 329) with return base::ASCIIToUTf16(url::kFileSystemScheme) + colon + FormatOriginForDisplay(inner_origin, languages) + (inner_origin.SchemeIsFile() ? base::UTF8ToUTF16(url.path()) : string16()); https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:337: base::string16 result = base::UTF8ToUTF16(scheme) + scheme_separator; EXTREME PEDANTRY: This does result in an extra copy in a naive form compared to base::string16 result = base::UTF8ToUTF16(scheme); result += scheme_separator; result += base::UTF8ToUTF16(host); [this form allows for R-V-O to be used for the 'result', and then appends the scheme_separator using +=; your current form penalizes RVO by forcing a temporary to be constructed to perform the "+ scheme_separator" prior to initializing |result|. Technically, the compiler is allowed to optimize it to be equivalent to the code I suggested, but I haven't really seen any do it]
https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:316: return (base::ASCIIToUTF16(url::kFileScheme)) + scheme_separator + On 2015/05/29 21:13:51, Ryan Sleevi wrote: > nit: You don't need the wrapping () Done. https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:327: base::UTF8ToUTF16(url.path()); > That is, it seems like it could be > > return base::ASCIIToUTF16(url::kFileSystemScheme) + colon + > FormatOriginForDisplay(inner_url) + base::UTF8ToUTF16(url.path()); Ahh, yes, thanks. Done. https://codereview.chromium.org/1147363005/diff/100001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:337: base::string16 result = base::UTF8ToUTF16(scheme) + scheme_separator; On 2015/05/29 21:13:51, Ryan Sleevi wrote: > EXTREME PEDANTRY: This does result in an extra copy in a naive form compared to > > base::string16 result = base::UTF8ToUTF16(scheme); > result += scheme_separator; > result += base::UTF8ToUTF16(host); > > [this form allows for R-V-O to be used for the 'result', and then appends the > scheme_separator using +=; your current form penalizes RVO by forcing a > temporary to be constructed to perform the "+ scheme_separator" prior to > initializing |result|. Technically, the compiler is allowed to optimize it to be > equivalent to the code I suggested, but I haven't really seen any do it] Done.
> > When given "https://173.194.65.103:000" (test case "Invalid port 1"), the port > > is in fact 0. That test case is set to have the expected return value be > > L"https://173.194.65.103"; should I change it to L"https://173.194.65.103:0"? > > Yeah, I think :0 is correct, because that's actually a different origin than > "https://173.194.65.103" [0 is an invalid, out of range port - there's a whole > lot of URL spec angst going around how to handle that, but effectively, the URL > is invalid at that point, because :0 isn't a valid port. As such, it's good to > preserve the :0] Yep, I changed the test accordingly.
> From a quick scan, we have two modes - whether or not we enforce file:// path > separation. The default is we don't, and all file://'s are treated just as a > single origin (specifically, file://). When we do, we treat them all as unique > origins. > > But we definitely don't distinguish file URLs by host - OR by path - when the > enforcement is off, so it seems like the presentation for that mode would just > be file:// ? Yow. brettw considers it a bad bug that we apparently don't distinguish origins by the hostnames in file: URLs. I'm agnostic there because (a) I believe that it is hard to realistically, programatically distinguish \\CONTOSO\sharename\foo.html from \\localhost\sharename\foo.html; yet, (b) distinguishing them would be strictly safer, and that's always nice in my book. Basically, a marginal use-case ("contoso is localhost and I develop web apps as local files") would fail closed. It would protect against a marginal (?) attack ("LAN server contoso is malicious and is name-squatting on my local rich apps and I navigated to the contoso page"). I'm sure we can all agree that it's a bad bug that we apparently don't always treat distinct pathnames as distinct origins. :\ I'm going to dig around in SecurityOrigin and see if I can come up with the same analysis rsleevi has. If so, we'll have a separate bug on our hands.
On 2015/05/29 22:13:38, palmer wrote: > > From a quick scan, we have two modes - whether or not we enforce file:// path > > separation. The default is we don't, and all file://'s are treated just as a > > single origin (specifically, file://). When we do, we treat them all as unique > > origins. > > > > But we definitely don't distinguish file URLs by host - OR by path - when the > > enforcement is off, so it seems like the presentation for that mode would just > > be file:// ? > > Yow. > > brettw considers it a bad bug that we apparently don't distinguish origins by > the hostnames in file: URLs. I'm agnostic there because (a) I believe that it is > hard to realistically, programatically distinguish \\CONTOSO\sharename\foo.html > from \\localhost\sharename\foo.html; yet, (b) distinguishing them would be > strictly safer, and that's always nice in my book. Basically, a marginal > use-case ("contoso is localhost and I develop web apps as local files") would > fail closed. It would protect against a marginal (?) attack ("LAN server contoso > is malicious and is name-squatting on my local rich apps and I navigated to the > contoso page"). > > I'm sure we can all agree that it's a bad bug that we apparently don't always > treat distinct pathnames as distinct origins. :\ I'm going to dig around in > SecurityOrigin and see if I can come up with the same analysis rsleevi has. If > so, we'll have a separate bug on our hands. https://code.google.com/p/chromium/issues/detail?id=455882 is the bug for that, IIRC :)
On 2015/05/29 22:13:38, palmer wrote: > Yow. > > brettw considers it a bad bug that we apparently don't distinguish origins by > the hostnames in file: URLs. I'm agnostic there because (a) I believe that it is > hard to realistically, programatically distinguish \\CONTOSO\sharename\foo.html > from \\localhost\sharename\foo.html; yet, (b) distinguishing them would be > strictly safer, and that's always nice in my book. Basically, a marginal > use-case ("contoso is localhost and I develop web apps as local files") would > fail closed. It would protect against a marginal (?) attack ("LAN server contoso > is malicious and is name-squatting on my local rich apps and I navigated to the > contoso page"). To be clear, it's not necessarily LAN servers. Such UNC paths can work with CIFS URLs, and there's been a whole raft of ICANN SSAC "findings" regarding how the new gTLDs + Windows path resolver work (some of which still haven't been disclosed other than they're "Super Bad" and Microsoft released patches for EVERYTHING to try to fix) Because of that, and because of the related bit on https://code.google.com/p/chromium/issues/detail?id=455882 , I think we should try to fully preserve the host & path information for file:// URLs for presentation. The only thing we should be dropping would be the query string and fragments - all other parts get preserved. Would that make you terribly sad? Seems like a net-win, especially if/when we make file origins fully unique, since that host component would factor in.
TL;DR: I *think* we only need to worry about the hostnames issue. As rsleevi says, the attack scenarios may not be as marginal as I thought, so we do. I am totally OK with showing the hostname in file: origins, iff we do distinguish hostnames on that basis. Since we currently do not, I don't want to make such a user-visible claim. Thus, the person who fixes that bug (https://code.google.com/p/chromium/issues/detail?id=493935) should also change |FormatOriginForDisplay|. As discussed below, I think the current form of |FormatOriginForDisplay| is correct *given the current behavior of file: origins*. Yes, there exists a function |SecurityOrigin::enforceFilePathSeparation| (it just sets a bool to true). Its sole call site is: 4710 if (Settings* settings = initializer.settings()) { 4711 if (!settings->webSecurityEnabled()) { 4712 // Web security is turned off. We should let this document access every other document. This is used primary by testing 4713 // harnesses for web sites. 4714 securityOrigin()->grantUniversalAccess(); 4715 } else if (securityOrigin()->isLocal()) { 4716 if (settings->allowUniversalAccessFromFileURLs()) { 4717 // Some clients want local URLs to have universal access, but that setting is dangerous for other clients. 4718 securityOrigin()->grantUniversalAccess(); 4719 } else if (!settings->allowFileAccessFromFileURLs()) { 4720 // Some clients want local URLs to have even tighter restrictions by default, and not be able to access other local files. 4721 // FIXME 81578: The naming of this is confusing. Files with restricted access to other local files 4722 // still can have other privileges that can be remembered, thereby not making them unique origins. 4723 securityOrigin()->enforceFilePathSeparation(); 4724 } 4725 } 4726 } In Chrome, |switches::kAllowFileAccessFromFiles| 420 prefs.allow_file_access_from_file_urls = 421 command_line.HasSwitch(switches::kAllowFileAccessFromFiles); changes |allow_file_access_from_file_urls| to true. It defaults to false: 83 WebPreferences::WebPreferences() // ... 120 allow_file_access_from_file_urls(false), It's true in places you'd expect it, e.g.: content/shell/common/test_runner/test_preferences.cc:22: allow_file_access_from_file_urls = true; So, AFAICT, the fact that this function exists seems scary :) but production Chrome sets it to false, so distinct file: pathnames are still distinct origins. That is not quite the same thing as having file: origins be *unique* (as discussed in https://code.google.com/p/chromium/issues/detail?id=455882), which means things like (from SecurityOrigin.h): 145 bool canAccessDatabase() const { return !isUnique(); }; 146 bool canAccessLocalStorage() const { return !isUnique(); }; 147 bool canAccessSharedWorkers() const { return !isUnique(); } 148 bool canAccessCookies() const { return !isUnique(); } 149 bool canAccessPasswordManager() const { return !isUnique(); } 150 bool canAccessFileSystem() const { return !isUnique(); } 151 bool canAccessCacheStorage() const { return !isUnique(); }; 152 Policy canShowNotifications() const;
Mostly good, although a few more things that came up with fresh eyes reviewing. https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:309: return net::FormatUrl(url, languages); In writing my comment about the function comment, it made me realize that we don't account for Blob URLs here. The effective security origin of a Blob URL is similar to that of a Filesystem URL. However, //url doesn't have awareness of Blob. Separately, Mike and I have been talking about whether or not it's "right" for //url to know about Filesystem URLs and their parsing, or whether //url should be doing something like Blink's Security Origin does, which is allow a way for a higher layer to add parsing knowledge. Given that 1) //url doesn't know that blob's share an inner_url like filesystem does 2) Consumers of //url (excepting Blink's security origin) are coded against SchemeIsFilesystem rather than something more generic like SchemeHasInnerUrl() or something weird like that 3) It's not clear how much //url should even know about (blob, filesystem) I think it's fine to punt on that problem, but I leave that up to you on which is more pressing for you. [Note I think it'd be weird to solve 1/2 without first having an idea about what to do for 3, since you'd end up with twice as much codechurn if it's decided //url shouldn't know about these sorts of 'web platform-y' URLs. RFC 6454 is already woefully obsolete in this respect, and the URL Standard a more accurate reflection of reality, but very much a "web platform" thing and not a "generic standard URI parser" that //url originally started out as] Just highlighting this issue. https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:336: base::string16 result(base::UTF8ToUTF16(scheme)); EXTREME PEDANTRY REDUX: I suggested the "=" initialization over this form because this form is explicitly copy-ctor initialization, and at least MSVC in the past ha some stupidity around here. Both forms base::string16 result = base::UTF8ToUTF16(scheme) base::string16 result(base::UTF8ToUTF16(scheme)) follow the same "initializer" syntax in C++03 8.5, and thus both should be subject to the "as-if" rule allowing for RVO, but again, past experience (and mind you, it's been nearly 8 years since I took the time to dig into the assembly - this stuff may be fixed by now) was that the " = " syntax got optimized, when the () syntax didn't. That's why I prefer the " = " (Yes, I realize I'm discussing about a microoptimization that for this case is likely entirely lost in the noise; but like the "" vs std::string(), it's one of those 'useful' things to know/realize, and generally has no side-effect on readability but can add up) Like I said, EXTREME PEDANTRY https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:53: // - Omits the port if it is the default for the scheme. This is more a comment nit, but after the discussion about filesystem, and the fact that we're including path for filesystem, it does feel weird to call this origin. That is, it's not an origin in the scheme/host/port tuple, and it's not an origin in the sense of "globally unique" origin (e.g. as in RFC 6454 for non scheme/host/port , or data URls) This is more about formatting the effective security boundary, but "ish". Ontologies and naming are hard, and I don't think we'd want to take a hypothetical url::Origin or blink::SecurityOrigin here in this function (which feels somewhat implied by FormatOrigin and naming |origin|), so maybe FormatUrlForSecurityDisplay? Note, I absolutely hate that naming, but at least it works as a strawman to explore the slight unease I feel at naming. ElideUrlForSecurityDisplay? https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:211: const wchar_t* output; PEDANTRY nit: All of these could (should?) be const char* const (that is, const pointers to constant chars). There was a discussion on chromium-dev around 2.5-3.5 years ago talking about how the compiler is able/not able to optimize for the case when you can't use "const char[]" (I believe Siggi or Joi did that investigation as part of a memory shave yak) Again, not strictly necessary for a unit test code like this, but I think probably useful in encouraging a common idiom across test & prod code, hence why I mention it.
https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:309: return net::FormatUrl(url, languages); Most pressing for me right now is fixing bugs in Chrome's UX, for which the fix depends on a function like FormatOriginForDisplay. AFAIK, blob: URLs are not currently the most likely to surface those bugs. But clearly we ultimately will need to handle them as gracefully as possible. The current code does handle blob: URLs; they immediately fall through to FormatUrl on line 309. I have added a test case to illustrate this. https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:336: base::string16 result(base::UTF8ToUTF16(scheme)); On 2015/06/01 21:18:46, Ryan Sleevi wrote: > EXTREME PEDANTRY REDUX: > > I suggested the "=" initialization over this form because this form is > explicitly copy-ctor initialization, and at least MSVC in the past ha some > stupidity around here. > > Both forms > > base::string16 result = base::UTF8ToUTF16(scheme) > base::string16 result(base::UTF8ToUTF16(scheme)) > > follow the same "initializer" syntax in C++03 8.5, and thus both should be > subject to the "as-if" rule allowing for RVO, but again, past experience (and > mind you, it's been nearly 8 years since I took the time to dig into the > assembly - this stuff may be fixed by now) was that the " = " syntax got > optimized, when the () syntax didn't. > > That's why I prefer the " = " > > (Yes, I realize I'm discussing about a microoptimization that for this case is > likely entirely lost in the noise; but like the "" vs std::string(), it's one of > those 'useful' things to know/realize, and generally has no side-effect on > readability but can add up) > > Like I said, EXTREME PEDANTRY Done. https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:53: // - Omits the port if it is the default for the scheme. > This is more a comment nit, but after the discussion about filesystem, and the Do you mean the discussion about file: ? > fact that we're including path for filesystem, As currently implemented in this CL, the code only does that when the inner URL is a file: URL. > / it does feel weird to call this > origin. That is, it's not an origin in the scheme/host/port tuple, and it's not > an origin in the sense of "globally unique" origin (e.g. as in RFC 6454 for non > scheme/host/port , or data URls) As I understand, paths are (the only) part of an origin for file: URLs. (As discussed previously.) So, I do think that this function returns a string that displays, as human-readably as is currently possible, the origin-as-web-security-boundary-as-enforced-right-now-in-Chrome. > This is more about formatting the effective security boundary, but "ish". Right, but, what is the "ish"? Are there aspects of certain kinds of origins that form part of the effective security boundary that we are not displaying? And/or, are we displaying aspects of certain kinds of origins that do not form part of the security boundary? (Perhaps you mean I should leave off the "filesystem:" prefix for filesystem: URLs?) > Ontologies and naming are hard, and I don't think we'd want to take a > hypothetical url::Origin or blink::SecurityOrigin here in this function (which > feels somewhat implied by FormatOrigin and naming |origin|), so maybe Why not? I do have some concern that FormatOriginForDisplay is closer to correct than GURL::GetOrigin, if that's what you mean. (ISTR that GURL::GetOrigin returns objects whose spec() is "file://" for *all* file: URLs, which implies untrue things about Chrome's behavior. But I don't own GURL.) (In my dream world, FWIW, url.GetOrigin().spec() would be pretty close to FormatOriginForDisplay(url).) > FormatUrlForSecurityDisplay? I mean, sure, I could do that, with the appropriate comment changes too. But first I want to make sure I have understood your concern correctly. > ElideUrlForSecurityDisplay? No; "elide" here means "replace long strings with '...'." https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:211: const wchar_t* output; On 2015/06/01 21:18:46, Ryan Sleevi wrote: > PEDANTRY nit: All of these could (should?) be > > const char* const (that is, const pointers to constant chars). There was a > discussion on chromium-dev around 2.5-3.5 years ago talking about how the > compiler is able/not able to optimize for the case when you can't use "const > char[]" (I believe Siggi or Joi did that investigation as part of a memory shave > yak) > > Again, not strictly necessary for a unit test code like this, but I think > probably useful in encouraging a common idiom across test & prod code, hence why > I mention it. Done.
https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/120001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:53: // - Omits the port if it is the default for the scheme. On 2015/06/01 23:30:34, palmer wrote: > > This is more a comment nit, but after the discussion about filesystem, and the > > Do you mean the discussion about file: ? > No, I meant filesystem. Filesystem is a non-standard URL that has an inner URL that is the true origin, as discussed on url.spec.whatwg.org (and not handled by RFC 6454) > As I understand, paths are (the only) part of an origin for file: URLs. (As > discussed previously.) > > So, I do think that this function returns a string that displays, as > human-readably as is currently possible, the > origin-as-web-security-boundary-as-enforced-right-now-in-Chrome. > Put differently, you aren't returning globally unique identifiers, which is the true origin (in RFC 6454 sense) for data URIs, for example. And you are reaching inside filesystem: (because //URL knows about it, but 6454 says unique identifier), but not blob (because //URL doesn't know about it, even though it is the same as blob), which are the true origins-as-a-principle. I left this a nit because, as discussed with Mike, the origin terminology is heavily overloaded (two specs, three concepts at play, multiple implementations) In Windows terms, I think it is like using "Account" to refer to both the SID and the NtLogonName. They overlap, but not completely, you can have duplicate but distinct SIDs (in the Local space arc, distinct machines can have the same sid, but different principles; in origins, the null origin is unique and non-identity), you can have abbreviations (omitting Domain or DC and omitting port/path), things like that. Just like Account is confusing, I think there is reasonable confusion here. > > This is more about formatting the effective security boundary, but "ish". > > Right, but, what is the "ish"? Are there aspects of certain kinds of origins > that form part of the effective security boundary that we are not displaying? > And/or, are we displaying aspects of certain kinds of origins that do not form > part of the security boundary? (Perhaps you mean I should leave off the > "filesystem:" prefix for filesystem: URLs?) See above. > Why not? I do have some concern that FormatOriginForDisplay is closer to correct > than GURL::GetOrigin, if that's what you mean. (ISTR that GURL::GetOrigin > returns objects whose spec() is "file://" for *all* file: URLs, which implies > untrue things about Chrome's behavior. But I don't own GURL.) Because a SecurityOrigin is lossy. It could be simply a unique number or "null". It doesn't convert back to a user friendly URL. > > (In my dream world, FWIW, url.GetOrigin().spec() would be pretty close to > FormatOriginForDisplay(url).) I don't have the CL handy (mobile), but if you look at the introduction of URL::origin, you can see abarths explanation to tsepez on why that is bad. > > > FormatUrlForSecurityDisplay? > > I mean, sure, I could do that, with the appropriate comment changes too. But > first I want to make sure I have understood your concern correctly. > Origin is a loaded term. If you got a GURL for a sandboxed iframe, what is the expectation? An empty string? That's the true origin.
Calling it FormatUrlForSecurityDisplay now. msw and felt, as OWNERS, any thoughts?
https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:313: (base::ASCIIToUTF16(url::kStandardSchemeSeparator))); nit: drop extra parens around base::ASCII... https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:57: base::string16 FormatUrlForSecurityDisplay(const GURL& origin, nit: maybe replace security for "simple" or "concise"? https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:210: const char* const languages; afaict, the languages never affect the output for any of these test cases (all outputs match the corresponding cases with no languages specified). Can you make the loop test an empty languages output against a constant (or per-case...) non-empty languages output to avoid all the duplicate cases? Can you add one or more test cases (in a separate test?) that verifies some output is actually affected by the languages string? (net::FormatURL says it'll support unicode IDN host names only with certain languages). https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:217: {"Empty URL with languages", "", "en-US,zh-TW,zh", L""}, nit: every other desc including "with languages" has a comma before. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:389: {"HTTPS IP address, non-default port, with languages", nit: no corresponding case without languages https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:465: {"Trailing dot in DNS name", nit: no corresponding case without languages, languages not mentioned in desc. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:469: {"Blob URL", nit: no corresponding case without languages, languages not mentioned in desc.
https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:313: (base::ASCIIToUTF16(url::kStandardSchemeSeparator))); On 2015/06/02 23:45:16, msw wrote: > nit: drop extra parens around base::ASCII... Done. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:57: base::string16 FormatUrlForSecurityDisplay(const GURL& origin, On 2015/06/02 23:45:16, msw wrote: > nit: maybe replace security for "simple" or "concise"? So far I think security UX is the main driver for this, so I'd like to leave it. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:210: const char* const languages; On 2015/06/02 23:45:16, msw wrote: > afaict, the languages never affect the output for any of these test cases (all > outputs match the corresponding cases with no languages specified). Can you make > the loop test an empty languages output against a constant (or per-case...) > non-empty languages output to avoid all the duplicate cases? Can you add one or > more test cases (in a separate test?) that verifies some output is actually > affected by the languages string? (net::FormatURL says it'll support unicode IDN > host names only with certain languages). I'm not sure I know how to do that. :) Ultimately, we end up at this code (in net/base/net_util_icu.cc): 360 // TODO(brettw) bug 734373: check the scripts for each host component and 361 // don't un-IDN-ize if there is more than one. Alternatively, only IDN for 362 // scripts that the user has installed. For now, just put the entire 363 // path through IDN. Maybe this feature can be implemented in ICU itself? 364 // 365 // We may want to skip this step in the case of file URLs to allow unicode 366 // UNC hostnames regardless of encodings. 367 base::string16 IDNToUnicodeWithAdjustments( 368 const std::string& host, 369 const std::string& languages, 370 base::OffsetAdjuster::Adjustments* adjustments) { which references an ancient bug (in the Google internal bug tracker). It seems to describe the problem that there is not yet a clear policy for how we support Unicode code host names? https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:217: {"Empty URL with languages", "", "en-US,zh-TW,zh", L""}, On 2015/06/02 23:45:16, msw wrote: > nit: every other desc including "with languages" has a comma before. Done. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:389: {"HTTPS IP address, non-default port, with languages", On 2015/06/02 23:45:16, msw wrote: > nit: no corresponding case without languages Done. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:465: {"Trailing dot in DNS name", On 2015/06/02 23:45:16, msw wrote: > nit: no corresponding case without languages, languages not mentioned in desc. Done. https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:469: {"Blob URL", On 2015/06/02 23:45:16, msw wrote: > nit: no corresponding case without languages, languages not mentioned in desc. Done.
https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:210: const char* const languages; On 2015/06/03 01:01:46, palmer wrote: > On 2015/06/02 23:45:16, msw wrote: > > afaict, the languages never affect the output for any of these test cases (all > > outputs match the corresponding cases with no languages specified). Can you > make > > the loop test an empty languages output against a constant (or per-case...) > > non-empty languages output to avoid all the duplicate cases? Can you add one > or > > more test cases (in a separate test?) that verifies some output is actually > > affected by the languages string? (net::FormatURL says it'll support unicode > IDN > > host names only with certain languages). > > I'm not sure I know how to do that. :) Ultimately, we end up at this code (in > net/base/net_util_icu.cc): > > 360 // TODO(brettw) bug 734373: check the scripts for each host component and > 361 // don't un-IDN-ize if there is more than one. Alternatively, only IDN for > 362 // scripts that the user has installed. For now, just put the entire > 363 // path through IDN. Maybe this feature can be implemented in ICU itself? > 364 // > 365 // We may want to skip this step in the case of file URLs to allow unicode > 366 // UNC hostnames regardless of encodings. > 367 base::string16 IDNToUnicodeWithAdjustments( > 368 const std::string& host, > 369 const std::string& languages, > 370 base::OffsetAdjuster::Adjustments* adjustments) { > > which references an ancient bug (in the Google internal bug tracker). It seems > to describe the problem that there is not yet a clear policy for how we support > Unicode code host names? Maybe ask Brett about that. You can follow my first recommendation: A) Omit |languages| from this struct and the test cases. B) Delete all the tests cases "with languages" (they're dups). C) Make the loop check with and without languages like: base::string16 formatted = FormatUrlForSecurityDisplay(GURL(tests[i].input), std::string()); EXPECT_EQ(base::WideToUTF16(tests[i].output), formatted) << tests[i].description; base::string16 formatted_with_languages = FormatUrlForSecurityDisplay(GURL(tests[i].input), "en-US,zh-TW,zh"); // Or choose some other set of languages here... EXPECT_EQ(base::WideToUTF16(tests[i].output), formatted_with_languages) << tests[i].description;
https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:344: result += base::ASCIIToUTF16(":") + base::UTF8ToUTF16(origin.port()); should this use the |colon| var?
ui/content_settings lgtm
I'll leave msw for the tests, but LGTM (knowing the caveats already discussed) Note: A possible BUG below (or possibly a nit; I think a BUG though) and a comment nit https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:322: const GURL& inner_origin = inner_url->GetOrigin(); nit: Strictly speaking, it'd probably be better/safer to just pass inner_url on line 329, and don't use inner_origin at all. You end up calling GetOrigin on line 332, so this just avoids creating an extra temporary for the duration of that execution. POSSIBLE BUG is that you're taking a const-ref to GetOrigin, which returns a new GURL. The GURL would be a temporary scoped to this full statement, so I think it'd go out of scope by line 323? Valgrind (but not MSAN) would likely confirm this. https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:57: base::string16 FormatUrlForSecurityDisplay(const GURL& origin, Document languages? (jshin can update it when he tackles crbug.com/336973 ) Something like // otherwise-simplified URLs from eachother) // // The IDN may be presented in Unicode if |languages| accepts the // Unicode representation (see net::FormatUrl for more details on // the algorithm).
https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url_unittest.cc (right): https://codereview.chromium.org/1147363005/diff/160001/chrome/browser/ui/elid... chrome/browser/ui/elide_url_unittest.cc:210: const char* const languages; On 2015/06/03 01:19:21, msw wrote: > On 2015/06/03 01:01:46, palmer wrote: > > On 2015/06/02 23:45:16, msw wrote: > > > afaict, the languages never affect the output for any of these test cases > (all > > > outputs match the corresponding cases with no languages specified). Can you > > make > > > the loop test an empty languages output against a constant (or per-case...) > > > non-empty languages output to avoid all the duplicate cases? Can you add one > > or > > > more test cases (in a separate test?) that verifies some output is actually > > > affected by the languages string? (net::FormatURL says it'll support unicode > > IDN > > > host names only with certain languages). > > > > I'm not sure I know how to do that. :) Ultimately, we end up at this code (in > > net/base/net_util_icu.cc): > > > > 360 // TODO(brettw) bug 734373: check the scripts for each host component and > > 361 // don't un-IDN-ize if there is more than one. Alternatively, only IDN for > > 362 // scripts that the user has installed. For now, just put the entire > > 363 // path through IDN. Maybe this feature can be implemented in ICU itself? > > 364 // > > 365 // We may want to skip this step in the case of file URLs to allow unicode > > 366 // UNC hostnames regardless of encodings. > > 367 base::string16 IDNToUnicodeWithAdjustments( > > 368 const std::string& host, > > 369 const std::string& languages, > > 370 base::OffsetAdjuster::Adjustments* adjustments) { > > > > which references an ancient bug (in the Google internal bug tracker). It seems > > to describe the problem that there is not yet a clear policy for how we > support > > Unicode code host names? > > Maybe ask Brett about that. You can follow my first recommendation: > A) Omit |languages| from this struct and the test cases. > B) Delete all the tests cases "with languages" (they're dups). > C) Make the loop check with and without languages like: > base::string16 formatted = FormatUrlForSecurityDisplay(GURL(tests[i].input), > std::string()); > EXPECT_EQ(base::WideToUTF16(tests[i].output), formatted) << > tests[i].description; > base::string16 formatted_with_languages = > FormatUrlForSecurityDisplay(GURL(tests[i].input), "en-US,zh-TW,zh"); // Or > choose some other set of languages here... > EXPECT_EQ(base::WideToUTF16(tests[i].output), formatted_with_languages) << > tests[i].description; Done. https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.cc (right): https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:322: const GURL& inner_origin = inner_url->GetOrigin(); > nit: Strictly speaking, it'd probably be better/safer to just pass inner_url on > line 329, and don't use inner_origin at all. Done. https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.cc:344: result += base::ASCIIToUTF16(":") + base::UTF8ToUTF16(origin.port()); On 2015/06/03 01:31:58, felt wrote: > should this use the |colon| var? Done. https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... File chrome/browser/ui/elide_url.h (right): https://codereview.chromium.org/1147363005/diff/180001/chrome/browser/ui/elid... chrome/browser/ui/elide_url.h:57: base::string16 FormatUrlForSecurityDisplay(const GURL& origin, On 2015/06/03 19:09:04, Ryan Sleevi wrote: > Document languages? > > (jshin can update it when he tackles crbug.com/336973 ) > > Something like > > // otherwise-simplified URLs from eachother) > // > // The IDN may be presented in Unicode if |languages| accepts the > // Unicode representation (see net::FormatUrl for more details on > // the algorithm). Done.
lgtm
The CQ bit was checked by palmer@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from felt@chromium.org, rsleevi@chromium.org Link to the patchset: https://codereview.chromium.org/1147363005/#ps200001 (title: "Change testing strategy, respond to comments.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1147363005/200001
Message was sent while issue was closed.
Committed patchset #11 (id:200001)
Message was sent while issue was closed.
Patchset 11 (id:??) landed as https://crrev.com/d72cc48699ed600c26911b0061f44aa9cfc39e62 Cr-Commit-Position: refs/heads/master@{#332736} |