|
|
Created:
5 years, 6 months ago by Mark P Modified:
5 years, 6 months ago Reviewers:
Peter Kasting CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionOmnibox - Mark As Duplicates URLs that only differ by a trailing slash
Make the stripped URL always end the path component in a trailing slash.
BUG=490084
Committed: https://crrev.com/86d6ea9832f60721adc36a3372b51016947dfcb2
Cr-Commit-Position: refs/heads/master@{#334238}
Patch Set 1 #Patch Set 2 : finally works #
Total comments: 7
Patch Set 3 : peter's minor comments #Patch Set 4 : mac has no std::string.back() #Patch Set 5 : peter's optimizations #
Total comments: 10
Patch Set 6 : minor improvements #
Messages
Total messages: 22 (6 generated)
mpearson@chromium.org changed reviewers: + pkasting@chromium.org
Hi Peter, Please take a look. thanks, mark
LGTM https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the path does not have one. Nit: I might add "It's OK if this adds a slash even where one would be inappropriate if we were really going to navigate to this URL, e.g. "foo.com/index.html/"." https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.h (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.h:310: // The destination URL somewhat normalized for better dupe finding. Nit: Add comma before "somewhat"
https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the path does not have one. On 2015/06/11 23:05:37, Peter Kasting wrote: > Nit: I might add "It's OK if this adds a slash even where one would be > inappropriate if we were really going to navigate to this URL, e.g. > "foo.com/index.html/"." Done. (I thought it was clear from preexisting comments that we never use the stripped_destination_url as a destination as it may often be non-navigable. Nonetheless, added this comment here because it cannot hurt.) https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.h (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.h:310: // The destination URL somewhat normalized for better dupe finding. On 2015/06/11 23:05:37, Peter Kasting wrote: > Nit: Add comma before "somewhat" Done.
The CQ bit was checked by mpearson@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/1169173005/#ps40001 (title: "peter's minor comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1169173005/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_chromium_compile_dbg_ng on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the path does not have one. On 2015/06/12 17:25:57, Mark P wrote: > On 2015/06/11 23:05:37, Peter Kasting wrote: > > Nit: I might add "It's OK if this adds a slash even where one would be > > inappropriate if we were really going to navigate to this URL, e.g. > > "foo.com/index.html/"." > > Done. (I thought it was clear from preexisting comments that we never use the > stripped_destination_url as a destination as it may often be non-navigable. > Nonetheless, added this comment here because it cannot hurt.) It's clear we're not going to navigate there -- but the other changes at least look like plausible sorts of transforms to the URL, whereas appending a slash sometimes makes no sense at all (except in terms of dupe elimination). For this reason I'd actually prefer to always strip the slash than always add one. That's a more plausible-looking transform and I don't think would need an extra comment. The only reason I didn't suggest this is that you can't strip the slash if it's the only thing in the path, and I was worried that the inconsistency there might be confusing. But if you think you could feasibly remove slashes (except for lone ones) as easily as add them, I'd go with that. It has the added bonus that you won't have to allocate space for a new string and copy the existing path, too (yay efficiency).
https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the path does not have one. On 2015/06/12 17:49:04, Peter Kasting wrote: > On 2015/06/12 17:25:57, Mark P wrote: > > On 2015/06/11 23:05:37, Peter Kasting wrote: > > > Nit: I might add "It's OK if this adds a slash even where one would be > > > inappropriate if we were really going to navigate to this URL, e.g. > > > "foo.com/index.html/"." > > > > Done. (I thought it was clear from preexisting comments that we never use the > > stripped_destination_url as a destination as it may often be non-navigable. > > Nonetheless, added this comment here because it cannot hurt.) > > It's clear we're not going to navigate there -- but the other changes at least > look like plausible sorts of transforms to the URL, whereas appending a slash > sometimes makes no sense at all (except in terms of dupe elimination). > > For this reason I'd actually prefer to always strip the slash than always add > one. That's a more plausible-looking transform and I don't think would need an > extra comment. The only reason I didn't suggest this is that you can't strip > the slash if it's the only thing in the path, and I was worried that the > inconsistency there might be confusing. > > But if you think you could feasibly remove slashes (except for lone ones) as > easily as add them, I'd go with that. It has the added bonus that you won't > have to allocate space for a new string and copy the existing path, too (yay > efficiency). I think the problem with lone slashes cannot be easily fixed. I too wanted to strip the slash rather than add one, but thought of the lone slash issue. I could fix that issue by not stripping lone slashes, but I think the "path non-empty" restriction only applies to certain types of URLs (http and the like), not all kinds. Indeed, that's why I asked on the bug whether you wanted to do this for all protocols. Given that you said yes, I thought adding the "/" was the safer, better thing to do.
https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the path does not have one. On 2015/06/12 18:04:59, Mark P wrote: > On 2015/06/12 17:49:04, Peter Kasting wrote: > > On 2015/06/12 17:25:57, Mark P wrote: > > > On 2015/06/11 23:05:37, Peter Kasting wrote: > > > > Nit: I might add "It's OK if this adds a slash even where one would be > > > > inappropriate if we were really going to navigate to this URL, e.g. > > > > "foo.com/index.html/"." > > > > > > Done. (I thought it was clear from preexisting comments that we never use > the > > > stripped_destination_url as a destination as it may often be non-navigable. > > > Nonetheless, added this comment here because it cannot hurt.) > > > > It's clear we're not going to navigate there -- but the other changes at least > > look like plausible sorts of transforms to the URL, whereas appending a slash > > sometimes makes no sense at all (except in terms of dupe elimination). > > > > For this reason I'd actually prefer to always strip the slash than always add > > one. That's a more plausible-looking transform and I don't think would need > an > > extra comment. The only reason I didn't suggest this is that you can't strip > > the slash if it's the only thing in the path, and I was worried that the > > inconsistency there might be confusing. > > > > But if you think you could feasibly remove slashes (except for lone ones) as > > easily as add them, I'd go with that. It has the added bonus that you won't > > have to allocate space for a new string and copy the existing path, too (yay > > efficiency). > > I think the problem with lone slashes cannot be easily fixed. I too wanted to > strip > the slash rather than add one, but thought of the lone slash issue. I could fix > that issue by not stripping lone slashes, but I think the "path non-empty" > restriction only applies to certain types of URLs (http and the like), not all > kinds. > Indeed, that's why I asked on the bug whether you wanted to do this for all > protocols. Given that you said yes, I thought adding the "/" was the safer, > better > thing to do. Well, just because another protocol may allow an empty path doesn't mean that having a slash in that case would be unsafe or break anything. The logic could be something like this: if (!path.empty() && path.back() == '/') strip slash; if (path.empty()) path = '/'; This ensures no paths have trailing slashes except the empty path, and also ensures that "" and "/" both get transformed to "/". (You could easily write a single-conditional form of the logic that guaranteed the first condition, but not the second.) I think that would be fine for all protocols.
On 2015/06/12 18:25:37, Peter Kasting wrote: > https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... > File components/omnibox/autocomplete_match.cc (right): > > https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... > components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the > path does not have one. > On 2015/06/12 18:04:59, Mark P wrote: > > On 2015/06/12 17:49:04, Peter Kasting wrote: > > > On 2015/06/12 17:25:57, Mark P wrote: > > > > On 2015/06/11 23:05:37, Peter Kasting wrote: > > > > > Nit: I might add "It's OK if this adds a slash even where one would be > > > > > inappropriate if we were really going to navigate to this URL, e.g. > > > > > "foo.com/index.html/"." > > > > > > > > Done. (I thought it was clear from preexisting comments that we never use > > the > > > > stripped_destination_url as a destination as it may often be > non-navigable. > > > > Nonetheless, added this comment here because it cannot hurt.) > > > > > > It's clear we're not going to navigate there -- but the other changes at > least > > > look like plausible sorts of transforms to the URL, whereas appending a > slash > > > sometimes makes no sense at all (except in terms of dupe elimination). > > > > > > For this reason I'd actually prefer to always strip the slash than always > add > > > one. That's a more plausible-looking transform and I don't think would need > > an > > > extra comment. The only reason I didn't suggest this is that you can't > strip > > > the slash if it's the only thing in the path, and I was worried that the > > > inconsistency there might be confusing. > > > > > > But if you think you could feasibly remove slashes (except for lone ones) as > > > easily as add them, I'd go with that. It has the added bonus that you won't > > > have to allocate space for a new string and copy the existing path, too (yay > > > efficiency). > > > > I think the problem with lone slashes cannot be easily fixed. I too wanted to > > strip > > the slash rather than add one, but thought of the lone slash issue. I could > fix > > that issue by not stripping lone slashes, but I think the "path non-empty" > > restriction only applies to certain types of URLs (http and the like), not all > > kinds. > > Indeed, that's why I asked on the bug whether you wanted to do this for all > > protocols. Given that you said yes, I thought adding the "/" was the safer, > > better > > thing to do. > > Well, just because another protocol may allow an empty path doesn't mean that > having a slash in that case would be unsafe or break anything. > > The logic could be something like this: > > if (!path.empty() && path.back() == '/') > strip slash; > if (path.empty()) > path = '/'; > > This ensures no paths have trailing slashes except the empty path, and also > ensures that "" and "/" both get transformed to "/". (You could easily write a > single-conditional form of the logic that guaranteed the first condition, but > not the second.) > > I think that would be fine for all protocols. Ah, I agree this works. However, I think it's more complicated than the simple always-add solution. I don't see the appeal.
On 2015/06/12 18:41:07, Mark P wrote: > On 2015/06/12 18:25:37, Peter Kasting wrote: > > > https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... > > File components/omnibox/autocomplete_match.cc (right): > > > > > https://codereview.chromium.org/1169173005/diff/20001/components/omnibox/auto... > > components/omnibox/autocomplete_match.cc:418: // Add a trailing slash if the > > path does not have one. > > On 2015/06/12 18:04:59, Mark P wrote: > > > On 2015/06/12 17:49:04, Peter Kasting wrote: > > > > On 2015/06/12 17:25:57, Mark P wrote: > > > > > On 2015/06/11 23:05:37, Peter Kasting wrote: > > > > > > Nit: I might add "It's OK if this adds a slash even where one would be > > > > > > inappropriate if we were really going to navigate to this URL, e.g. > > > > > > "foo.com/index.html/"." > > > > > > > > > > Done. (I thought it was clear from preexisting comments that we never > use > > > the > > > > > stripped_destination_url as a destination as it may often be > > non-navigable. > > > > > Nonetheless, added this comment here because it cannot hurt.) > > > > > > > > It's clear we're not going to navigate there -- but the other changes at > > least > > > > look like plausible sorts of transforms to the URL, whereas appending a > > slash > > > > sometimes makes no sense at all (except in terms of dupe elimination). > > > > > > > > For this reason I'd actually prefer to always strip the slash than always > > add > > > > one. That's a more plausible-looking transform and I don't think would > need > > > an > > > > extra comment. The only reason I didn't suggest this is that you can't > > strip > > > > the slash if it's the only thing in the path, and I was worried that the > > > > inconsistency there might be confusing. > > > > > > > > But if you think you could feasibly remove slashes (except for lone ones) > as > > > > easily as add them, I'd go with that. It has the added bonus that you > won't > > > > have to allocate space for a new string and copy the existing path, too > (yay > > > > efficiency). > > > > > > I think the problem with lone slashes cannot be easily fixed. I too wanted > to > > > strip > > > the slash rather than add one, but thought of the lone slash issue. I could > > fix > > > that issue by not stripping lone slashes, but I think the "path non-empty" > > > restriction only applies to certain types of URLs (http and the like), not > all > > > kinds. > > > Indeed, that's why I asked on the bug whether you wanted to do this for all > > > protocols. Given that you said yes, I thought adding the "/" was the safer, > > > better > > > thing to do. > > > > Well, just because another protocol may allow an empty path doesn't mean that > > having a slash in that case would be unsafe or break anything. > > > > The logic could be something like this: > > > > if (!path.empty() && path.back() == '/') > > strip slash; > > if (path.empty()) > > path = '/'; > > > > This ensures no paths have trailing slashes except the empty path, and also > > ensures that "" and "/" both get transformed to "/". (You could easily write > a > > single-conditional form of the logic that guaranteed the first condition, but > > not the second.) > > > > I think that would be fine for all protocols. > > Ah, I agree this works. However, I think it's more complicated than the simple > always-add solution. I don't see the appeal. I guess for me the key factor would be whether I could implement the stripping solution in a way that doesn't require me to copy the path. After you sent me that bug with slow responses when pasting long strings, I'm more aware of that sort of thing. If I could do that, I'd take that solution. If not, then it doesn't matter much.
Ah, this looks fairly good. It seems to work. Let me know what you think. --mark P.S. Apparently Mac doesn't support std::string.back() so I had to remove that call.
LGTM https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Remove the trailing slash if any (if it's not a lone slash), and Nit: "Remove any trailing slash ..."; and -> or https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:419: // add a slash (to make a lone slash) if the path is empty. Nit: Maybe add: "(We can't unconditionally remove even lone slashes because for some schemes the path must consist of at least a slash.)" https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:423: std::string slash("/"); Can we do: static const char slash[] = "/"; ? Then you could put it inside the conditional if you wanted and I think it would be more efficient to boot (maybe) https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match.h (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.h:178: // to http, making sure the path ends in a "/", and stripping excess query This comment isn't accurate now. Maybe just "normalizing trailing slashes"? https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match_unittest.cc (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match_unittest.cc:141: { "http://www.google.com/1", "http://www.google.com/1/", true }, Nit: How come there are two spaces between url1 and url2 even in this case?
https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match.cc (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:418: // Remove the trailing slash if any (if it's not a lone slash), and On 2015/06/12 19:30:36, Peter Kasting wrote: > Nit: "Remove any trailing slash ..."; Done. (removed "if any", replaced "the" with "any") > and -> or Done. https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:419: // add a slash (to make a lone slash) if the path is empty. On 2015/06/12 19:30:36, Peter Kasting wrote: > Nit: Maybe add: "(We can't unconditionally remove even lone slashes because for > some schemes the path must consist of at least a slash.)" Okay. Done. https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.cc:423: std::string slash("/"); On 2015/06/12 19:30:36, Peter Kasting wrote: > Can we do: > > static const char slash[] = "/"; > > ? Then you could put it inside the conditional if you wanted and I think it > would be more efficient to boot (maybe) Apparently we can. Done. https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match.h (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match.h:178: // to http, making sure the path ends in a "/", and stripping excess query On 2015/06/12 19:30:36, Peter Kasting wrote: > This comment isn't accurate now. Maybe just "normalizing trailing slashes"? Done. https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... File components/omnibox/autocomplete_match_unittest.cc (right): https://codereview.chromium.org/1169173005/diff/80001/components/omnibox/auto... components/omnibox/autocomplete_match_unittest.cc:141: { "http://www.google.com/1", "http://www.google.com/1/", true }, On 2015/06/12 19:30:36, Peter Kasting wrote: > Nit: How come there are two spaces between url1 and url2 even in this case? Fixed extra space (in whole vertical column).
The CQ bit was checked by mpearson@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/1169173005/#ps100001 (title: "minor improvements")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1169173005/100001
Message was sent while issue was closed.
Committed patchset #6 (id:100001)
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/86d6ea9832f60721adc36a3372b51016947dfcb2 Cr-Commit-Position: refs/heads/master@{#334238} |