|
|
Chromium Code Reviews|
Created:
3 years, 5 months ago by tommycli Modified:
3 years, 5 months ago Reviewers:
Peter Kasting CC:
chromium-reviews Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionOmnibox UI Experiments: Strip trivial subdomains
This revised patch strips the trivial subdomains while preserving the
registry and domain. It also correctly handles the wwww.foo.com (4 w's),
which the previous version did not.
Previous version: https://codereview.chromium.org/2939423003/
BUG=732582
Review-Url: https://codereview.chromium.org/2966233002
Cr-Commit-Position: refs/heads/master@{#484695}
Committed: https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f61931882408c7
Patch Set 1 #Patch Set 2 : fix #Patch Set 3 : make cast explicit for windows #
Total comments: 20
Patch Set 4 : merge #Patch Set 5 : add comment #Patch Set 6 : fix #
Total comments: 2
Patch Set 7 : address one more comment #
Messages
Total messages: 35 (24 generated)
Description was changed from ========== Omnibox UI Experiments: Add subdomain stripping preserving eTLD+1. BUG=732582 ========== to ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. BUG=732582 ==========
tommycli@chromium.org changed reviewers: + pkasting@chromium.org
Description was changed from ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. BUG=732582 ========== to ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. Previous version: https://codereview.chromium.org/2939423003/ BUG=732582 ==========
pkasting: PTAL - I think this version is a lot better than the old one. the biggest benefit is that it's more correct.
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...)
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
One substantive behavior concern. LGTM if I misunderstood how the code should work and it's not actually broken. https://codereview.chromium.org/2966233002/diff/40001/components/omnibox/brow... File components/omnibox/browser/autocomplete_match_unittest.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/omnibox/brow... components/omnibox/browser/autocomplete_match_unittest.cc:114: TEST(AutocompleteMatchTest, FormatUrlForSuggestionDisplay) { Nit: Might want a comment somewhere in this function to the effect that you don't need in-depth tests for the functionality of each formatting flag because those happen in <cross-reference location>; you just need to check that flipping the right feature flags does in fact toggle the behavior at this level. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:50: // with an inline temporary (see http://gcc.gnu.org/bugs/#cxx%5Frvalbind ). FWIW, I think we could probably replace this comment with DISALLOW_COPY_AND_ASSIGN since I think we no longer compile with gcc anywhere? But probably best done in a separate patch :) https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:74: base::StringTokenizer t(component_text.begin(), Nit: t -> tokenizer (avoid abbreviation in general; see http://google.github.io/styleguide/cppguide.html#General_Naming_Rules ) https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:79: std::string new_subdomain_string; Nit: |transformed_subdomain|? https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:84: new_subdomain_string.append(t.token()); Nit: append() is fine, I find += a little more idiomatic myself https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:100: offset += t.token().length() + 1; Wait, is this right? I think t.token() is pointing at the delimiter here, so this always adds 2. I think you wanted trivial_subdomain_length + 1 here, didn't you? But in that case I don't know how your adjustment unittest passes. It feels like this is symptomatic of a design problem: |offset| itself seems like it's just trying to track where in the original string you are. But I think you can read that off the tokenizer (via e.g. t.token_begin() - component_text.begin(), or maybe using token_end(), depending on when you're reading it and what you want). So rather than iterating an offset and also iterating through the string, you can just do one. Or you could eliminate the tokenizer, keep the read offset, and use a simple loop that calls find() starting from that offset. Either way would probably work. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:653: HostComponentTransform(false /* trim_trivial_subdomains */), output, (I still dislike these kinda comments but whatever) https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:952: {"omit trivial subdomains but leave subdomains ending in www", Nit: Maybe "don't do blind substring matches for www" would be a more accurate description? Similarly for "m" below. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:958: {"omit trivial subdomains and consume one trailing delimiter only", Is it important that we test this? Such URLs aren't really valid, are they? If not I feel like I'd rather not have a test mandate any particular behavior.
pkasting: Thanks! One pushback below (last one) https://codereview.chromium.org/2966233002/diff/40001/components/omnibox/brow... File components/omnibox/browser/autocomplete_match_unittest.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/omnibox/brow... components/omnibox/browser/autocomplete_match_unittest.cc:114: TEST(AutocompleteMatchTest, FormatUrlForSuggestionDisplay) { On 2017/07/06 06:01:21, Peter Kasting wrote: > Nit: Might want a comment somewhere in this function to the effect that you > don't need in-depth tests for the functionality of each formatting flag because > those happen in <cross-reference location>; you just need to check that flipping > the right feature flags does in fact toggle the behavior at this level. Done. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:50: // with an inline temporary (see http://gcc.gnu.org/bugs/#cxx%5Frvalbind ). On 2017/07/06 06:01:21, Peter Kasting wrote: > FWIW, I think we could probably replace this comment with > DISALLOW_COPY_AND_ASSIGN since I think we no longer compile with gcc anywhere? > But probably best done in a separate patch :) Acknowledged. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:74: base::StringTokenizer t(component_text.begin(), On 2017/07/06 06:01:21, Peter Kasting wrote: > Nit: t -> tokenizer (avoid abbreviation in general; see > http://google.github.io/styleguide/cppguide.html#General_Naming_Rules ) Done. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:79: std::string new_subdomain_string; On 2017/07/06 06:01:21, Peter Kasting wrote: > Nit: |transformed_subdomain|? Done. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:84: new_subdomain_string.append(t.token()); On 2017/07/06 06:01:21, Peter Kasting wrote: > Nit: append() is fine, I find += a little more idiomatic myself Done. Ah i did not realize that string had operator+=. I thought it was the equivalent of s = s + t; https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:100: offset += t.token().length() + 1; On 2017/07/06 06:01:21, Peter Kasting wrote: > Wait, is this right? I think t.token() is pointing at the delimiter here, so > this always adds 2. I think you wanted trivial_subdomain_length + 1 here, > didn't you? But in that case I don't know how your adjustment unittest passes. > > It feels like this is symptomatic of a design problem: |offset| itself seems > like it's just trying to track where in the original string you are. But I > think you can read that off the tokenizer (via e.g. t.token_begin() - > component_text.begin(), or maybe using token_end(), depending on when you're > reading it and what you want). So rather than iterating an offset and also > iterating through the string, you can just do one. > > Or you could eliminate the tokenizer, keep the read offset, and use a simple > loop that calls find() starting from that offset. Either way would probably > work. Done. Good suggestion on re-using the tokenizer iterator rather than keeping track of "offset" separately. Much cleaner now. The unit tests worked because the adjustment verification unit tests for trivial subdomains only trimmed a single subdomain (which was correct). only subsequent trimmings would be incorrect. I beefed up the test. It now fails on the old code and passes on the new. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:653: HostComponentTransform(false /* trim_trivial_subdomains */), output, On 2017/07/06 06:01:21, Peter Kasting wrote: > (I still dislike these kinda comments but whatever) Done. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:952: {"omit trivial subdomains but leave subdomains ending in www", On 2017/07/06 06:01:21, Peter Kasting wrote: > Nit: Maybe "don't do blind substring matches for www" would be a more accurate > description? Similarly for "m" below. Done. https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:958: {"omit trivial subdomains and consume one trailing delimiter only", On 2017/07/06 06:01:21, Peter Kasting wrote: > Is it important that we test this? Such URLs aren't really valid, are they? If > not I feel like I'd rather not have a test mandate any particular behavior. It's really to verify that the code doesn't crash. I updated the description.
The CQ bit was checked by tommycli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
LGTM! https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter.cc:100: offset += t.token().length() + 1; On 2017/07/06 16:25:46, tommycli wrote: > On 2017/07/06 06:01:21, Peter Kasting wrote: > > Wait, is this right? I think t.token() is pointing at the delimiter here, so > > this always adds 2. I think you wanted trivial_subdomain_length + 1 here, > > didn't you? But in that case I don't know how your adjustment unittest > passes. > > > > It feels like this is symptomatic of a design problem: |offset| itself seems > > like it's just trying to track where in the original string you are. But I > > think you can read that off the tokenizer (via e.g. t.token_begin() - > > component_text.begin(), or maybe using token_end(), depending on when you're > > reading it and what you want). So rather than iterating an offset and also > > iterating through the string, you can just do one. > > > > Or you could eliminate the tokenizer, keep the read offset, and use a simple > > loop that calls find() starting from that offset. Either way would probably > > work. > > Done. Good suggestion on re-using the tokenizer iterator rather than keeping > track of "offset" separately. Much cleaner now. > > The unit tests worked because the adjustment verification unit tests for trivial > subdomains only trimmed a single subdomain (which was correct). only subsequent > trimmings would be incorrect. > > I beefed up the test. It now fails on the old code and passes on the new. Yeah, the new version feels a lot cleaner to me now. High five! https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2966233002/diff/40001/components/url_formatte... components/url_formatter/url_formatter_unittest.cc:958: {"omit trivial subdomains and consume one trailing delimiter only", On 2017/07/06 16:25:46, tommycli wrote: > On 2017/07/06 06:01:21, Peter Kasting wrote: > > Is it important that we test this? Such URLs aren't really valid, are they? > If > > not I feel like I'd rather not have a test mandate any particular behavior. > > It's really to verify that the code doesn't crash. I updated the description. Oh, did it crash in some version of your code? That seems like an OK reason. https://codereview.chromium.org/2966233002/diff/100001/components/url_formatt... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2966233002/diff/100001/components/url_formatt... components/url_formatter/url_formatter.cc:102: transformed_subdomain + domain_and_registry; Nit: Or optionally just inline this into the statement below
thanks! https://codereview.chromium.org/2966233002/diff/100001/components/url_formatt... File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2966233002/diff/100001/components/url_formatt... components/url_formatter/url_formatter.cc:102: transformed_subdomain + domain_and_registry; On 2017/07/06 16:31:47, Peter Kasting wrote: > Nit: Or optionally just inline this into the statement below Done.
The CQ bit was checked by tommycli@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2966233002/#ps120001 (title: "address one more comment")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 120001, "attempt_start_ts": 1499361904530210,
"parent_rev": "511dbccd99e3c586c18de63e640ed4bdff51d6df", "commit_rev":
"92807e9faaff9f742a6f45c946f61931882408c7"}
Message was sent while issue was closed.
Description was changed from ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. Previous version: https://codereview.chromium.org/2939423003/ BUG=732582 ========== to ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. Previous version: https://codereview.chromium.org/2939423003/ BUG=732582 Review-Url: https://codereview.chromium.org/2966233002 Cr-Commit-Position: refs/heads/master@{#484695} Committed: https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001) as https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6...
Message was sent while issue was closed.
On 2017/07/06 18:55:28, commit-bot: I haz the power wrote: > Committed patchset #7 (id:120001) as > https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... I might be blind, but how does this code handle the "wwww" case? This line in the test seems to imply it does not: 953 "http://wwww.google.com/", kFormatUrlExperimentalOmitTrivialSubdomains, 954 net::UnescapeRule::NORMAL, L"http://wwww.google.com/", 7}, --mark
Message was sent while issue was closed.
Description was changed from ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. Previous version: https://codereview.chromium.org/2939423003/ BUG=732582 Review-Url: https://codereview.chromium.org/2966233002 Cr-Commit-Position: refs/heads/master@{#484695} Committed: https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... ========== to ========== Omnibox UI Experiments: Strip trivial subdomains This revised patch strips the trivial subdomains while preserving the registry and domain. It also correctly handles the wwww.foo.com (4 w's), which the previous version did not. Previous version: https://codereview.chromium.org/2939423003/ BUG=732582 Review-Url: https://codereview.chromium.org/2966233002 Cr-Commit-Position: refs/heads/master@{#484695} Committed: https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... ==========
Message was sent while issue was closed.
On 2017/07/06 19:05:16, Mark P wrote: > On 2017/07/06 18:55:28, commit-bot: I haz the power wrote: > > Committed patchset #7 (id:120001) as > > > https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... > > I might be blind, but how does this code handle the "wwww" case? This line in > the test seems to imply it does not: > 953 "http://wwww.google.com/", > kFormatUrlExperimentalOmitTrivialSubdomains, > 954 net::UnescapeRule::NORMAL, L"http://wwww.google.com/", 7}, "Handle" in this case means "don't touch", as opposed to the last patch that erroneously converted "wwww.foo" to "w.foo" (or was it "wfoo"?). This code successfully leaves "wwww." alone :)
Message was sent while issue was closed.
On 2017/07/06 23:51:10, Peter Kasting wrote: > On 2017/07/06 19:05:16, Mark P wrote: > > On 2017/07/06 18:55:28, commit-bot: I haz the power wrote: > > > Committed patchset #7 (id:120001) as > > > > > > https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... > > > > I might be blind, but how does this code handle the "wwww" case? This line in > > the test seems to imply it does not: > > 953 "http://wwww.google.com/", > > kFormatUrlExperimentalOmitTrivialSubdomains, > > 954 net::UnescapeRule::NORMAL, L"http://wwww.google.com/", 7}, > > "Handle" in this case means "don't touch", as opposed to the last patch that > erroneously converted "wwww.foo" to "w.foo" (or was it "wfoo"?). > > This code successfully leaves "wwww." alone :) It used to change it to wfoo.com, which was really bad. ;)
Message was sent while issue was closed.
On 2017/07/06 23:59:48, tommycli wrote: > On 2017/07/06 23:51:10, Peter Kasting wrote: > > On 2017/07/06 19:05:16, Mark P wrote: > > > On 2017/07/06 18:55:28, commit-bot: I haz the power wrote: > > > > Committed patchset #7 (id:120001) as > > > > > > > > > > https://chromium.googlesource.com/chromium/src/+/92807e9faaff9f742a6f45c946f6... > > > > > > I might be blind, but how does this code handle the "wwww" case? This line > in > > > the test seems to imply it does not: > > > 953 "http://wwww.google.com/", > > > kFormatUrlExperimentalOmitTrivialSubdomains, > > > 954 net::UnescapeRule::NORMAL, L"http://wwww.google.com/", 7}, > > > > "Handle" in this case means "don't touch", as opposed to the last patch that > > erroneously converted "wwww.foo" to "w.foo" (or was it "wfoo"?). > > > > This code successfully leaves "wwww." alone :) > > It used to change it to http://wfoo.com, which was really bad. ;) Ah, thanks for the clarification. --mark |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
