|
|
Chromium Code Reviews
DescriptionURL Formatter: Add StripSubdomain method that preserves eTLD + 1.
We already had a StripWWW and StripWWWFromHost method, but they are
designed to be simple and may turn "www.com" into "com", which is
undesirable for some use cases.
This method is more sophisticated and designed to to elide subdomains
from hosts for the purposes of Omnibox suggestion display.
BUG=732582
Patch Set 1 #Patch Set 2 : reformat #Patch Set 3 : remove spurious change #Patch Set 4 : fix #
Total comments: 14
Patch Set 5 : update comment #Patch Set 6 : fix #
Total comments: 4
Patch Set 7 : fix #Patch Set 8 : Return std::string instead of base::StringPiece (which makes no sense) #
Dependent Patchsets: Messages
Total messages: 38 (24 generated)
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
tommycli@chromium.org changed reviewers: + pkasting@chromium.org
pkasting: PTAL. This patch is laying the groundwork for the trivially-informative subdomain elision experiment.
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:587: std::string etld_plus_one = Nit: Prefer to use the same terminology as the RCDS, "domain and registry". https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:597: return host_piece.substr(4); This early-returns if the host is www.m.foo.com, which seems wrong. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:601: base::StartsWith(host_piece, "m.", base::CompareCase::SENSITIVE)) { This fails to fix up the most common mobile host I encounter, en.m.wikipedia.org. It seems like the solution to these last two comments is to remove these subdomains anywhere they occur prior to the eTLD+1. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter.h (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.h:141: // Even if |text| is a valid host, this method may return an invalid host. I think these two sentences use "valid" in a way that differs from the GURL sense. For example, in GURL, "www." is a valid host, as is "com". I think what you mean is something like "Both |text| and the returned value are arbitrary strings and are not guaranteed to be navigable hostnames." https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1238: // Only strip one least-significant subdomain. Yeah, this seems not-what-we-want to me. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1241: There are no tests for "strip www from m.foo.com" and "strip m from www.foo.com". https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1249: for (size_t i = 0; i < arraysize(cases); ++i) { Nit: Use range-based for instead
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
pkasting: Thanks! I re-jiggered this based on your comments. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:587: std::string etld_plus_one = On 2017/06/17 02:46:55, Peter Kasting wrote: > Nit: Prefer to use the same terminology as the RCDS, "domain and registry". Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:597: return host_piece.substr(4); On 2017/06/17 02:46:55, Peter Kasting wrote: > This early-returns if the host is http://www.m.foo.com, which seems wrong. Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.cc:601: base::StartsWith(host_piece, "m.", base::CompareCase::SENSITIVE)) { On 2017/06/17 02:46:55, Peter Kasting wrote: > This fails to fix up the most common mobile host I encounter, > http://en.m.wikipedia.org. > > It seems like the solution to these last two comments is to remove these > subdomains anywhere they occur prior to the eTLD+1. Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter.h (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter.h:141: // Even if |text| is a valid host, this method may return an invalid host. On 2017/06/17 02:46:55, Peter Kasting wrote: > I think these two sentences use "valid" in a way that differs from the GURL > sense. For example, in GURL, "www." is a valid host, as is "com". > > I think what you mean is something like "Both |text| and the returned value are > arbitrary strings and are not guaranteed to be navigable hostnames." Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1238: // Only strip one least-significant subdomain. On 2017/06/17 02:46:55, Peter Kasting wrote: > Yeah, this seems not-what-we-want to me. Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1241: On 2017/06/17 02:46:55, Peter Kasting wrote: > There are no tests for "strip www from m.foo.com" and "strip m from > http://www.foo.com%22. Done. https://codereview.chromium.org/2939423003/diff/60001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:1249: for (size_t i = 0; i < arraysize(cases); ++i) { On 2017/06/17 02:46:55, Peter Kasting wrote: > Nit: Use range-based for instead Done.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
LGTM https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... components/url_formatter/url_formatter.cc:599: if ((subdomains & kStripWWW) != 0) Nit: Personally I find omitting the "!= 0" more readable, since these bitmasks function more like bools than ints. (2 places) https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... components/url_formatter/url_formatter_unittest.cc:1251: // Sanity check that we respect other registries. Are these tests just extensions of the block below? Maybe put them into the end of it?
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Awesome thanks, I'm sending this in now. https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... components/url_formatter/url_formatter.cc:599: if ((subdomains & kStripWWW) != 0) On 2017/06/19 23:42:32, Peter Kasting wrote: > Nit: Personally I find omitting the "!= 0" more readable, since these bitmasks > function more like bools than ints. (2 places) Done. Cool. Me too. I was just following precedent elsewhere. https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2939423003/diff/100001/components/url_formatt... components/url_formatter/url_formatter_unittest.cc:1251: // Sanity check that we respect other registries. On 2017/06/19 23:42:32, Peter Kasting wrote: > Are these tests just extensions of the block below? Maybe put them into the end > of it? Done.
The CQ bit was unchecked by tommycli@chromium.org
The CQ bit was checked by tommycli@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2939423003/#ps120001 (title: "fix")
Description was changed from ========== URL Formatter: Add StripSubdomain method that preserves eTLD + 1. We already had a StripWWW and StripWWWFromHost method, but they are designed to be simple and may turn "www.com" into "com", which is undesirable for some use cases. This method is more sophisticated and designed to to elide subdomains from hosts for the purposes of Omnibox suggestion display. BUG=732582 ========== to ========== URL Formatter: Add StripSubdomain method that preserves eTLD + 1. We already had a StripWWW and StripWWWFromHost method, but they are designed to be simple and may turn "www.com" into "com", which is undesirable for some use cases. This method is more sophisticated and designed to to elide subdomains from hosts for the purposes of Omnibox suggestion display. BUG=732582 ==========
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
+cc jdonnelly - this will be the logic used for the subdomain elision experiment.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
tommycli@chromium.org changed reviewers: + felt@chromium.org
felt: PTAL components/ssl_errors, thanks!
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Hi, we generally don't encourage removing subdomains from Chrome UI, with the exception of "www" and "m". Is this being added with one project specifically in mind? I'm worried it will end up being used elsewhere in ways that are not appropriate.
On 2017/06/20 20:50:03, felt wrote: > Hi, we generally don't encourage removing subdomains from Chrome UI, with the > exception of "www" and "m". Is this being added with one project specifically in > mind? I'm worried it will end up being used elsewhere in ways that are not > appropriate. Hey felt@ -- The code only permits removal of 'www' and 'm' currently. This is specifically for the Omnibox suggestions dropdown list, which is not supposed to be a security surface. (Treated as insecure). We could potentially move this code into component/omnibox, but it really feels like it belongs in url_formatter. Are there any specific scarier warnings we should be adding to the .h file comments?
On 2017/06/20 20:52:11, tommycli wrote: > On 2017/06/20 20:50:03, felt wrote: > > Hi, we generally don't encourage removing subdomains from Chrome UI, with the > > exception of "www" and "m". Is this being added with one project specifically > in > > mind? I'm worried it will end up being used elsewhere in ways that are not > > appropriate. > > Hey felt@ -- The code only permits removal of 'www' and 'm' currently. This is > specifically for the Omnibox suggestions dropdown list, which is not supposed to > be a security surface. (Treated as insecure). > > We could potentially move this code into component/omnibox, but it really feels > like it belongs in url_formatter. Are there any specific scarier warnings we > should be adding to the .h file comments? felt: ping, thanks!
On 2017/06/22 19:10:09, tommycli wrote: > On 2017/06/20 20:52:11, tommycli wrote: > > On 2017/06/20 20:50:03, felt wrote: > > > Hi, we generally don't encourage removing subdomains from Chrome UI, with > the > > > exception of "www" and "m". Is this being added with one project > specifically > > in > > > mind? I'm worried it will end up being used elsewhere in ways that are not > > > appropriate. > > > > Hey felt@ -- The code only permits removal of 'www' and 'm' currently. This is > > specifically for the Omnibox suggestions dropdown list, which is not supposed > to > > be a security surface. (Treated as insecure). > > > > We could potentially move this code into component/omnibox, but it really > feels > > like it belongs in url_formatter. Are there any specific scarier warnings we > > should be adding to the .h file comments? > > felt: ping, thanks! hi felt: friendly ping thanks
On 2017/06/23 18:22:10, tommycli wrote: > On 2017/06/22 19:10:09, tommycli wrote: > > On 2017/06/20 20:52:11, tommycli wrote: > > > On 2017/06/20 20:50:03, felt wrote: > > > > Hi, we generally don't encourage removing subdomains from Chrome UI, with > > the > > > > exception of "www" and "m". Is this being added with one project > > specifically > > > in > > > > mind? I'm worried it will end up being used elsewhere in ways that are not > > > > appropriate. > > > > > > Hey felt@ -- The code only permits removal of 'www' and 'm' currently. This > is > > > specifically for the Omnibox suggestions dropdown list, which is not > supposed > > to > > > be a security surface. (Treated as insecure). > > > > > > We could potentially move this code into component/omnibox, but it really > > feels > > > like it belongs in url_formatter. Are there any specific scarier warnings we > > > should be adding to the .h file comments? > > > > felt: ping, thanks! > > hi felt: friendly ping thanks Actually DO NOT REVIEW or SUBMIT. This CL contains the wrong approach, as it fails for the wwww.google.com case (four w's). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
