|
|
Created:
4 years, 2 months ago by brettw Modified:
4 years, 2 months ago Reviewers:
Peter Kasting CC:
chromium-reviews Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionMark URLs with empty schemes as invalid.
Previously Chrome allowed empty schemes. This meant that URLs such as ":" or
":foo" were marked as valid.
According to WHATWG:
A scheme must be one ASCII alpha, followed by zero or more of ASCII
alphanumeric, "+", "-", and ".".
This change marks URLs with empty schemes as invalid.
This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query.
With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine.
To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present.
BUG=515497
Committed: https://crrev.com/46f9b83f18486b8b735afbc108b75ed0d9537123
Cr-Commit-Position: refs/heads/master@{#423249}
Patch Set 1 #Patch Set 2 : Fixes #Patch Set 3 : . #
Total comments: 5
Patch Set 4 : . #
Total comments: 4
Patch Set 5 : Comments #
Messages
Total messages: 41 (25 generated)
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
Description was changed from ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. BUG=515497 ========== to ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query. With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine. To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present. BUG=515497 ==========
.
brettw@chromium.org changed reviewers: + pkasting@chromium.org
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
What's the practical effect of our current behavior? I'm not necessarily opposed to fixing it but I'd like to understand why this is a real bug that fixing will help, and not e.g. something that should change in the WHATWG spec (or, less likely, that we should just ignore the discrepancy with). https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... File components/omnibox/browser/autocomplete_input.cc (right): https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... components/omnibox/browser/autocomplete_input.cc:158: // Treat input that beings with a colon as a query. Otherwise the URL Nit: begins https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... components/omnibox/browser/autocomplete_input.cc:162: return metrics::OmniboxInputType::QUERY; It would be nice to avoid doing this check here, and maybe instead detect this lower down, ideally without actually looking for an explicit colon. Where does control flow go if we don't do this here? You implied ":w" will become a navigation, but I don't know what exit point below we actually take in that case. (Especially depending on whether you make other changes to stuff I comment on elsewhere.) https://codereview.chromium.org/2378213002/diff/40001/components/url_formatte... File components/url_formatter/url_fixer.cc (right): https://codereview.chromium.org/2378213002/diff/40001/components/url_formatte... components/url_formatter/url_fixer.cc:483: if (first_nonwhite == text->end() || *first_nonwhite != ':') Not sure I really understand this change. Why convert ":foo" to "http:foo" and not "http://foo" or "http://:foo"? Seems like "http://foo" is better if we're trying to make something invalid become valid, and "http://:foo" is better if we're intentionally trying to not care about this case and let it fail. https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_etc.cc File url/url_canon_etc.cc (right): https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_etc.cc#ne... url/url_canon_etc.cc:91: output->push_back(':'); What happens if we don't bother to insert a colon? Canonicalization becomes non-idempotent? https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_unittest.cc File url/url_canon_unittest.cc (right): https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_unittest.... url/url_canon_unittest.cc:1762: {":\":This /is interesting;?#", false, ":\":This /is interesting;?#"}, Should this have instead dropped the initial colon and checked that the result was true? That seems like maybe a more interesting test? Or maybe we should just do both versions.
On 2016/09/29 04:54:12, Peter Kasting (busy Sep 29) wrote: > What's the practical effect of our current behavior? I'm not necessarily > opposed to fixing it but I'd like to understand why this is a real bug that > fixing will help, and not e.g. something that should change in the WHATWG spec > (or, less likely, that we should just ignore the discrepancy with). It's surprising (which is what the bug report was about), the URL can't be loaded no matter what so it seems good to mark it invalid, and the spec agrees with these previous two things. This won't affect links on web pages since ":foo.com/" is actually a relative URL and it will be resolved as such relative to the current scheme. > https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... > File components/omnibox/browser/autocomplete_input.cc (right): > > https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... > components/omnibox/browser/autocomplete_input.cc:158: // Treat input that beings > with a colon as a query. Otherwise the URL > Nit: begins > > https://codereview.chromium.org/2378213002/diff/40001/components/omnibox/brow... > components/omnibox/browser/autocomplete_input.cc:162: return > metrics::OmniboxInputType::QUERY; > It would be nice to avoid doing this check here, and maybe instead detect this > lower down, ideally without actually looking for an explicit colon. > > Where does control flow go if we don't do this here? You implied ":w" will > become a navigation, but I don't know what exit point below we actually take in > that case. (Especially depending on whether you make other changes to stuff I > comment on elsewhere.) The URL fixer will add a scheme because the URL is unparseable, and we'll end up with "http:w" once that call is made. That's why I had to add this check before the URL fixer is called. I think this is the most reasonable behavior for the URL fixer given this input which is why I didn't change it. The other possibility is to make GetValidScheme in url_fixer.cc special case empty schemes and return true to prevent SegmentURL from trying to add schemes. This would match the previous behavior, and would mean there are no changes required in AutocompleteInput. But that seemed wrong since the scheme really isn't valid, and AutocompleteInput is already special-casing the "begins with a colon" case anyway (just in a roundabout way). > https://codereview.chromium.org/2378213002/diff/40001/components/url_formatte... > File components/url_formatter/url_fixer.cc (right): > > https://codereview.chromium.org/2378213002/diff/40001/components/url_formatte... > components/url_formatter/url_fixer.cc:483: if (first_nonwhite == text->end() || > *first_nonwhite != ':') > Not sure I really understand this change. Why convert ":foo" to "http:foo" and > not "http://foo" or "http://:foo"? Seems like "http://foo" is better if we're > trying to make something invalid become valid, and "http://:foo" is better if > we're intentionally trying to not care about this case and let it fail. Without this change it was doing "http://:foo" which seemed quite wrong to me. "http:foo" works fine (GURL will fix it) so I didn't feel like it was necessary to go out of my way to fix it further. > https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_etc.cc > File url/url_canon_etc.cc (right): > > https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_etc.cc#ne... > url/url_canon_etc.cc:91: output->push_back(':'); > What happens if we don't bother to insert a colon? Canonicalization becomes > non-idempotent? Yes, then we'd canonicalize ":foo/bar" to "foo/bar" which would change the meaning of everything. > https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_unittest.cc > File url/url_canon_unittest.cc (right): > > https://codereview.chromium.org/2378213002/diff/40001/url/url_canon_unittest.... > url/url_canon_unittest.cc:1762: {":\":This /is interesting;?#", false, ":\":This > /is interesting;?#"}, > Should this have instead dropped the initial colon and checked that the result > was true? That seems like maybe a more interesting test? Or maybe we should > just do both versions. I changed this test. I moved the "empty scheme" test to the scheme unittest cases. This test is for path URLs, of which an empty scheme is not. Removing the initial colon doesn't make it a path URL either, that's a test of an invalid scheme character of which we already have some.
.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
(new snap up if that wasn't clear)
Will re-review. Nice-to-have: if you can reply on the codereview tool, rather than by email, so responses are captured inline on patch sets as comment chains, that makes it a lot easier to track down things later compared to reading through email threads :). I've been doing a lot of spelunking back through old code reviews lately so I'm more aware of this than usual...
ping
On 2016/10/03 18:09:37, brettw (ping on IM after 24h) wrote: > ping Will try to get to this Tue 10/4
This all seems fine save the autocomplete_input.cc issue. https://codereview.chromium.org/2378213002/diff/60001/components/omnibox/brow... File components/omnibox/browser/autocomplete_input.cc (right): https://codereview.chromium.org/2378213002/diff/60001/components/omnibox/brow... components/omnibox/browser/autocomplete_input.cc:162: return metrics::OmniboxInputType::QUERY; I'm fine with doing what you did to the URLFixerUpper and I agree that changing GetValidScheme() would be wrong. I'm still a little uncomfortable adding this up here though. I think it's probably OK if the fixer upper fixes ":w" into "http:w". The result of that will go in |canonicalized_url|, and we don't automatically navigate just because the canonicalized URL contains something valid (because there are lots of other cases where the input can be canonicalized, but was intended as a search -- we distinguish those below, which is why I'm hoping to catch this case below as well). What I was trying to ask before was, if you don't touch this file, and you apply the rest of this CL, then where is the specific exit point below) from this function for ":w"? I'd assume we'd drop all the way to the bottom of the function and return UNKNOWN, which seems like a perfectly fine behavior to have. (Though it might require changing some tests.) This doesn't duplicate the previous behavior precisely, but that's OK -- the old conditional was there to prevent a crash trying to read from a nonexistent host, not because we specifically wanted to ensure inputs starting with a colon were forced to be queries. https://codereview.chromium.org/2378213002/diff/60001/components/url_formatte... File components/url_formatter/url_fixer.cc (right): https://codereview.chromium.org/2378213002/diff/60001/components/url_formatte... components/url_formatter/url_fixer.cc:484: inserted_text.append(url::kStandardSchemeSeparator); // Add "://". Nit: After your explanation, I understand why this works. GURL fixing up "foo:bar" into "foo://bar" was the missing piece. This seems a little unclear, so it'd be nice to do one of the following: (1) Skip over the colon in the input: inserted_text.append(url::kStandardSchemeSeparator); // Assume a leading colon was meant to be a scheme separator (which we can // thus skip, since we just added this). if (first_nonwhite != text->end() && *first_nonwhite == ':') ++first_nonwhite; (2) Add a comment about GURL's behavior here: // Assume a leading colon was meant to be a scheme separator (which GURL will // fix up for us into the full "://"). Otherwise add the separator ourselves. if (first_nonwhite == text->end() || *first_nonwhite != ':') inserted_text.append(url::kStandardSchemeSeparator);
Comments
New snap up. https://codereview.chromium.org/2378213002/diff/60001/components/omnibox/brow... File components/omnibox/browser/autocomplete_input.cc (right): https://codereview.chromium.org/2378213002/diff/60001/components/omnibox/brow... components/omnibox/browser/autocomplete_input.cc:162: return metrics::OmniboxInputType::QUERY; It does return UNKNOWN at the bottom, and I was trying to keep the current behavior. GURL will return an empty string if the host doesn't exist, so you don't need to check beforehand. I removed the empty host check below. https://codereview.chromium.org/2378213002/diff/60001/components/url_formatte... File components/url_formatter/url_fixer.cc (right): https://codereview.chromium.org/2378213002/diff/60001/components/url_formatte... components/url_formatter/url_fixer.cc:484: inserted_text.append(url::kStandardSchemeSeparator); // Add "://". I did the second.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM
The CQ bit was checked by brettw@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query. With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine. To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present. BUG=515497 ========== to ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query. With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine. To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present. BUG=515497 ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001)
Message was sent while issue was closed.
Description was changed from ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query. With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine. To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present. BUG=515497 ========== to ========== Mark URLs with empty schemes as invalid. Previously Chrome allowed empty schemes. This meant that URLs such as ":" or ":foo" were marked as valid. According to WHATWG: A scheme must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, "+", "-", and ".". This change marks URLs with empty schemes as invalid. This forced some changes to the URL fixer and autocomplete systems to keep from breaking the behavior of input like ":w" (which we want to do a search). Previously the URL fixer would see ":w" as having a valid (but empty) scheme. Therefore, it would not attempt to prepend "http://" and the autocomplete system would make a GURL out of that literal string. GURL would interpret it as a "path" URL (with no host). AutocompleteInput::Parse checks for URLs with no host and a comment that this only happens for URLs beginning with a colon, and converts those to a query. With this change, ":w" becomes invalid and url_fixer attempts to prepend schemes. This changes the behavior for certain invalid URLs like ":b005::68]" in the tests which used to mean "give up" and now is treated like "http" like most other garbage input that doesn't start with a colon. For these types of inputs, I think either behavior should be fine. To preserve ":w" means query this patch just checks for a colon at the beginning of the input and removes the empty host check. I was originally trying to avoid this, but since there was already a special check for this unusual case, it seems inevitable that some check for this case will always be present. BUG=515497 Committed: https://crrev.com/46f9b83f18486b8b735afbc108b75ed0d9537123 Cr-Commit-Position: refs/heads/master@{#423249} ==========
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/46f9b83f18486b8b735afbc108b75ed0d9537123 Cr-Commit-Position: refs/heads/master@{#423249} |