|
|
Created:
4 years, 1 month ago by Maria Modified:
3 years, 11 months ago Reviewers:
Peter Kasting CC:
chromium-reviews, Ted C Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd verification that google URL has a valid TLD.
There are a number of privacy-sensitive checks that we make using this
function. In order to ensure that the URLs really belong to Google, add
a check that TLD has been registered by Google.
BUG=665624
Committed: https://crrev.com/a27191167e1a0c71c19b3581894384ae07427806
Cr-Commit-Position: refs/heads/master@{#441421}
Patch Set 1 #
Total comments: 7
Patch Set 2 : Refactor to not compute tld length twice. #Patch Set 3 : Change implementation to set. #
Total comments: 9
Patch Set 4 : Address comments #Patch Set 5 : Use CR_DEFINE_STATIC_LOCAL. #Patch Set 6 : Move the list of Google TLDs into a separate header file. #
Total comments: 2
Patch Set 7 : Remove a blank line #
Messages
Total messages: 23 (7 generated)
mariakhomenko@chromium.org changed reviewers: + pkasting@chromium.org
Is there a public bug rather than an internal one for this? Chromium doesn't generally use buganizer (and I can't even access it easily from here at home). Bugs can be marked Google-only if need be. On http://crbug.com/120657 , various folks have discussed this particular solution, and it didn't seem like we had reached a conclusion to add or forbid addition of this list. (So my bias is clear: I'm personally negative.) It _did_ seem like we were generally negative on features and code that actually used this function for security- or privacy-critical things, which meant that we generally wanted to remove (or not newly add) features justifying this change where possible. I'm not sure how much of this history you're already aware of. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:42: static const char* const g_google_tld_list[] = {"ac", "ad", "ae", "af", "ag", Nit: constexpr If this remains used by only one function, prefer to declare this in the function that actually uses it, so its intent is maximally clear. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:136: net::registry_controlled_domains::GetCanonicalHostRegistryLength( This implementation computes the registry length (which is not cheap) both in IsValidHostName() and here. I'd like to see us find a way to only compute this once. It seems like besides here, the only other caller of IsValidHostName() is IsYoutubeDomainUrl() (whose very existence makes my skin crawl; if at all possible can we kill this?), which probably wants a similar TLD check. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:149: return false; This linear search is less-efficient than it could be even using your array datatype, because your TLD list is alphabetically sorted. But even more efficient would be to use a set or trie.
On 2016/11/15 22:11:57, Peter Kasting wrote: > Is there a public bug rather than an internal one for this? Chromium doesn't > generally use buganizer (and I can't even access it easily from here at home). > Bugs can be marked Google-only if need be. > > On http://crbug.com/120657 , various folks have discussed this particular > solution, and it didn't seem like we had reached a conclusion to add or forbid > addition of this list. (So my bias is clear: I'm personally negative.) It > _did_ seem like we were generally negative on features and code that actually > used this function for security- or privacy-critical things, which meant that we > generally wanted to remove (or not newly add) features justifying this change > where possible. > > I'm not sure how much of this history you're already aware of. Thanks for bringing up the bug -- I haven't seen it before and that's some good background for me. I filed a separate bug with an excerpt from buganizer here: https://bugs.chromium.org/p/chromium/issues/detail?id=665624 just to separate the discussions to specifics at hand. Instant Apps are the reasons this came up as we have agreed to some Google-Search specific behaviour and we need to guarantee that a particular action really came out of SERP. Also it seems like X-Geo headers are pretty privacy sensitive. What I'd like to understand is who are the core people required (in addition to you, who is obviously the OWNER for this) to make go/no-go decision on this, so that we can loop in the right set of people. Perhaps an in-person meeting will let us resolve this faster. > > https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... > File components/google/core/browser/google_util.cc (right): > > https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... > components/google/core/browser/google_util.cc:42: static const char* const > g_google_tld_list[] = {"ac", "ad", "ae", "af", "ag", > Nit: constexpr > > If this remains used by only one function, prefer to declare this in the > function that actually uses it, so its intent is maximally clear. > > https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... > components/google/core/browser/google_util.cc:136: > net::registry_controlled_domains::GetCanonicalHostRegistryLength( > This implementation computes the registry length (which is not cheap) both in > IsValidHostName() and here. I'd like to see us find a way to only compute this > once. > > It seems like besides here, the only other caller of IsValidHostName() is > IsYoutubeDomainUrl() (whose very existence makes my skin crawl; if at all > possible can we kill this?), which probably wants a similar TLD check. > > https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... > components/google/core/browser/google_util.cc:149: return false; > This linear search is less-efficient than it could be even using your array > datatype, because your TLD list is alphabetically sorted. > > But even more efficient would be to use a set or trie.
Description was changed from ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=b/31745847 ========== to ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=665624 ==========
Made changes / asking a few questions for technical comments. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:42: static const char* const g_google_tld_list[] = {"ac", "ad", "ae", "af", "ag", On 2016/11/15 22:11:56, Peter Kasting wrote: > Nit: constexpr > > If this remains used by only one function, prefer to declare this in the > function that actually uses it, so its intent is maximally clear. Switched to constexpr. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:136: net::registry_controlled_domains::GetCanonicalHostRegistryLength( On 2016/11/15 22:11:56, Peter Kasting wrote: > This implementation computes the registry length (which is not cheap) both in > IsValidHostName() and here. I'd like to see us find a way to only compute this > once. > > It seems like besides here, the only other caller of IsValidHostName() is > IsYoutubeDomainUrl() (whose very existence makes my skin crawl; if at all > possible can we kill this?), which probably wants a similar TLD check. Done -- switched to return tld from IsValidHostName. I don't think we can get rid of IsYoutubeDomainUrl easily. I looked at the use and it seems there are two: 1) safe browsing, which is also used for policy restrictions -- looks like it's used to block youtube in some domain policies? 2) UMA variations headers are sent there -- I don't know how that's used, but if we need that for experiment tracking, I don't see a way around that If we wanted to check TLDs on youtube domain, we could do that as well -- but that would be a separate list. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:149: return false; On 2016/11/15 22:11:57, Peter Kasting wrote: > This linear search is less-efficient than it could be even using your array > datatype, because your TLD list is alphabetically sorted. > > But even more efficient would be to use a set or trie. Agree. I was thinking about that. The reason I went with file-global const array is because it's created once and reused. Whereas, if we used a data-structure, then we need to either generate it every time the function is called or somehow manage its memory (intentionally leak it?). Do you have thoughts on the right approach here?
On 2016/11/15 22:57:08, Maria wrote: > What I'd like to understand is who are the core people required (in addition to > you, who is obviously the OWNER for this) to make go/no-go decision on this, so > that we can loop in the right set of people. Perhaps an in-person meeting will > let us resolve this faster. I would like to know that too. I CCed a couple people on the bug you filed, and I asked jschuh for help figuring this out. https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/1/components/google/core/brow... components/google/core/browser/google_util.cc:149: return false; On 2016/11/15 23:36:31, Maria wrote: > On 2016/11/15 22:11:57, Peter Kasting wrote: > > This linear search is less-efficient than it could be even using your array > > datatype, because your TLD list is alphabetically sorted. > > > > But even more efficient would be to use a set or trie. > > Agree. I was thinking about that. The reason I went with file-global const array > is because it's created once and reused. Whereas, if we used a data-structure, > then we need to either generate it every time the function is called or somehow > manage its memory (intentionally leak it?). Do you have thoughts on the right > approach here? CR_DEFINE_STATIC_LOCAL within the function is the way to do this if all access is single-threaded. If you need something threadsafe, use a base::LazyInstance. I don't know offhand what the threading requirements here are.
Hey, Peter, I updated this to change implementation to set, let me know if the implementation looks okay. I know we've talked about extracting this function elsewhere, I will send out mail with our options on that, but wanted to proceed in parallel with code review to see if the new implementation looks ok.
Implementation LGTM with a couple comments. In particular, using a set seems fine. I don't see a reason to go for something like a trie here. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:55: Nit: Don't add a blank line here. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:107: return valid; Why check "|| tld.empty()" here? It seems like this can never happen if |valid| is true, since IsValidHostName() checks for 0/npos for the tld_length and returns false. It's also confusing, since it somehow the TLD could be empty on a valid hostname, we'd return true here, which would be wrong. So it seems better to rewrite as just: if (!IsValidHostName(...)) return false; We don't even need to DCHECK(!tld.empty()), since an empty TLD would not cause correctness problems (it'd just do an unnecessary set check). https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:109: static std::set<std::string> google_tlds = {"ac", "ad", "ae", "af", "ag", Use CR_DEFINE_STATIC_LOCAL for this. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:146: return google_tlds.find(tld.as_string()) != google_tlds.end(); Nit: Reads more clearly: return base::ContainsKey(google_tlds, tld.as_string());
https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:55: On 2016/12/15 01:21:59, Peter Kasting wrote: > Nit: Don't add a blank line here. Done. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:107: return valid; On 2016/12/15 01:21:59, Peter Kasting wrote: > Why check "|| tld.empty()" here? > > It seems like this can never happen if |valid| is true, since IsValidHostName() > checks for 0/npos for the tld_length and returns false. > > It's also confusing, since it somehow the TLD could be empty on a valid > hostname, we'd return true here, which would be wrong. > > So it seems better to rewrite as just: > > if (!IsValidHostName(...)) > return false; > > We don't even need to DCHECK(!tld.empty()), since an empty TLD would not cause > correctness problems (it'd just do an unnecessary set check). Done. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:109: static std::set<std::string> google_tlds = {"ac", "ad", "ae", "af", "ag", On 2016/12/15 01:21:59, Peter Kasting wrote: > Use CR_DEFINE_STATIC_LOCAL for this. I don't think I can because CR_DEFINE_STATIC_LOCAL macro is defined as "static type& name = *new type arguments", which wouldn't work with initializer lists that I'm using here. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:146: return google_tlds.find(tld.as_string()) != google_tlds.end(); On 2016/12/15 01:21:59, Peter Kasting wrote: > Nit: Reads more clearly: > > return base::ContainsKey(google_tlds, tld.as_string()); Done.
I realized I didn't think much about whether a different kind of set (unordered_set, or the in-progress flat_set) might be better than set here. But I don't think the difference matters enough to worry about. https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:109: static std::set<std::string> google_tlds = {"ac", "ad", "ae", "af", "ag", On 2016/12/15 17:56:33, Maria wrote: > On 2016/12/15 01:21:59, Peter Kasting wrote: > > Use CR_DEFINE_STATIC_LOCAL for this. > > I don't think I can because CR_DEFINE_STATIC_LOCAL macro is defined as > "static type& name = *new type arguments", which wouldn't work with initializer > lists that I'm using here. I think this should work: CR_DEFINE_STATIC_LOCAL(std::set<std::string>, google_tlds, ({...})); I've seen this elsewhere, e.g. https://cs.chromium.org/chromium/src/chrome/browser/global_keyboard_shortcuts... .
On 2016/12/15 19:00:25, Peter Kasting wrote: > I realized I didn't think much about whether a different kind of set > (unordered_set, or the in-progress flat_set) might be better than set here. But > I don't think the difference matters enough to worry about. > > https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... > File components/google/core/browser/google_util.cc (right): > > https://codereview.chromium.org/2498113003/diff/40001/components/google/core/... > components/google/core/browser/google_util.cc:109: static std::set<std::string> > google_tlds = {"ac", "ad", "ae", "af", "ag", > On 2016/12/15 17:56:33, Maria wrote: > > On 2016/12/15 01:21:59, Peter Kasting wrote: > > > Use CR_DEFINE_STATIC_LOCAL for this. > > > > I don't think I can because CR_DEFINE_STATIC_LOCAL macro is defined as > > "static type& name = *new type arguments", which wouldn't work with > initializer > > lists that I'm using here. > > I think this should work: > > CR_DEFINE_STATIC_LOCAL(std::set<std::string>, google_tlds, ({...})); > > I've seen this elsewhere, e.g. > https://cs.chromium.org/chromium/src/chrome/browser/global_keyboard_shortcuts... > . Done. Checked that it works with unit tests, but I am surprised!
Made the change you suggested in the design doc, PTAL
Yup, LGTM https://codereview.chromium.org/2498113003/diff/100001/components/google/core... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/100001/components/google/core... components/google/core/browser/google_util.cc:109: Nit: Blank line here now probably unnecessary? I dunno
https://codereview.chromium.org/2498113003/diff/100001/components/google/core... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2498113003/diff/100001/components/google/core... components/google/core/browser/google_util.cc:109: On 2016/12/16 01:35:58, Peter Kasting wrote: > Nit: Blank line here now probably unnecessary? I dunno Done.
The CQ bit was checked by mariakhomenko@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2498113003/#ps120001 (title: "Remove a blank line")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 120001, "attempt_start_ts": 1483552735408660, "parent_rev": "9b21c11aaa6ad0e4d5bc0cd6e750b67e86ad0fbc", "commit_rev": "c3265b16645fcde45beee276e662249c4a393d8a"}
Message was sent while issue was closed.
Description was changed from ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=665624 ========== to ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=665624 Review-Url: https://codereview.chromium.org/2498113003 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=665624 Review-Url: https://codereview.chromium.org/2498113003 ========== to ========== Add verification that google URL has a valid TLD. There are a number of privacy-sensitive checks that we make using this function. In order to ensure that the URLs really belong to Google, add a check that TLD has been registered by Google. BUG=665624 Committed: https://crrev.com/a27191167e1a0c71c19b3581894384ae07427806 Cr-Commit-Position: refs/heads/master@{#441421} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/a27191167e1a0c71c19b3581894384ae07427806 Cr-Commit-Position: refs/heads/master@{#441421} |