Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(40)

Issue 2649033004: [3 of 4] Speedup GetRegistryLengthImpl() by seeding the DAFSA in reverse

Created:
3 years, 11 months ago by ncarter (slow)
Modified:
3 years, 9 months ago
Reviewers:
Ryan Sleevi
CC:
chromium-reviews, cbentzel+watch_chromium.org
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Speedup GetRegistryLengthImpl() by seeding the DAFSA in reverse. Add an option to make_dafsa.py to reverse the input words during encoding. Adapt the implementation of GetRegistryLengthImpl to the reverse-seeded DAFSA. The new implementation is more efficient, because it needs to query the DAFSA only once. Improve the documentation of GetRegistryLengthImpl, with links to relevent steps of the pseudocode algorithm specified at publicsuffix.org. Document the two places where we depart from that spec (one is INCLUDE_UNKNOWN_DOMAINS being optional, the other is the platform.sh problem). Add a TODO regarding empty-label matches. Dafsa size after this change: 40861 bytes RegistryControlledDomainTest.GetDomainAndRegistry 55ms RegistryControlledDomainTest.SameDomainOrHost_Different 95ms RegistryControlledDomainTest.SameDomainOrHost_Same 100ms BUG=678334

Patch Set 1 #

Patch Set 2 : rebase #

Total comments: 6

Patch Set 3 : python cleanup #

Patch Set 4 : More comment changes. #

Patch Set 5 : Language. #

Total comments: 6
Unified diffs Side-by-side diffs Delta from patch set Stats (+142 lines, -87 lines) Patch
M net/base/lookup_string_in_fixed_set_unittest.cc View 1 2 4 chunks +5 lines, -5 lines 2 comments Download
M net/base/registry_controlled_domains/BUILD.gn View 1 chunk +1 line, -0 lines 0 comments Download
M net/base/registry_controlled_domains/registry_controlled_domain.cc View 1 2 3 4 4 chunks +113 lines, -75 lines 4 comments Download
M net/tools/dafsa/make_dafsa.py View 1 2 2 chunks +20 lines, -6 lines 0 comments Download
M net/tools/dafsa/make_dafsa_unittest.py View 2 chunks +3 lines, -1 line 0 comments Download

Depends on Patchset:

Messages

Total messages: 15 (10 generated)
Ryan Sleevi
https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc#newcode105 net/base/registry_controlled_domains/registry_controlled_domain.cc:105: // as part of a longer rule only? "If ...
3 years, 11 months ago (2017-01-25 22:13:25 UTC) #5
ncarter (slow)
https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc#newcode105 net/base/registry_controlled_domains/registry_controlled_domain.cc:105: // as part of a longer rule only? On ...
3 years, 11 months ago (2017-01-26 00:33:02 UTC) #6
Ryan Sleevi
https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc#newcode105 net/base/registry_controlled_domains/registry_controlled_domain.cc:105: // as part of a longer rule only? On ...
3 years, 11 months ago (2017-01-26 00:41:45 UTC) #7
ncarter (slow)
PTAL https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2649033004/diff/20001/net/base/registry_controlled_domains/registry_controlled_domain.cc#newcode120 net/base/registry_controlled_domains/registry_controlled_domain.cc:120: // there's a subdomain. Are there unittests of ...
3 years, 9 months ago (2017-02-27 23:22:05 UTC) #14
Ryan Sleevi
3 years, 9 months ago (2017-03-06 17:20:04 UTC) #15
Mostly LG... I think the one thing we may want to add is something I should have
done sooner, which is the tests from publicsuffix.org -
https://github.com/publicsuffix/list/blob/master/tests/tests.txt and/or
https://github.com/publicsuffix/list/blob/master/tests/test_psl.txt

We should pass all of those, with the INCLUDE_UNKNOWN_REGISTRIES flag passed.

https://codereview.chromium.org/2649033004/diff/80001/net/base/lookup_string_...
File net/base/lookup_string_in_fixed_set_unittest.cc (right):

https://codereview.chromium.org/2649033004/diff/80001/net/base/lookup_string_...
net/base/lookup_string_in_fixed_set_unittest.cc:53: std::string
rev_key(key.rbegin(), key.rend());
Is it worth documenting why this is reversed?

https://codereview.chromium.org/2649033004/diff/80001/net/base/lookup_string_...
net/base/lookup_string_in_fixed_set_unittest.cc:76: std::string
line(sequence->rbegin(), sequence->rend());
Similarly

https://codereview.chromium.org/2649033004/diff/80001/net/base/registry_contr...
File net/base/registry_controlled_domains/registry_controlled_domain.cc (right):

https://codereview.chromium.org/2649033004/diff/80001/net/base/registry_contr...
net/base/registry_controlled_domains/registry_controlled_domain.cc:104: size_t
in_wildcard = (unknown_filter == INCLUDE_UNKNOWN_REGISTRIES);
Based on your usage (155/162/172), this should be bool, right?

https://codereview.chromium.org/2649033004/diff/80001/net/base/registry_contr...
net/base/registry_controlled_domains/registry_controlled_domain.cc:125: if
(is_last_char_in_label) {
Does it make more sense to do

if (!is_last_char_in_label)
  continue;

?

This should allow removing a level of indentation for the remaining code.

https://codereview.chromium.org/2649033004/diff/80001/net/base/registry_contr...
net/base/registry_controlled_domains/registry_controlled_domain.cc:141: // this
case would have to be '*', which is explicitly disallowed.
Maybe it's because I haven't had my morning coffee, but I'm trying to determine
the source of the "which is explicitly disallowed" case? Is it because the
implicit '*'?

(Not saying I disagree, just having trouble parsing the why)

https://codereview.chromium.org/2649033004/diff/80001/net/base/registry_contr...
net/base/registry_controlled_domains/registry_controlled_domain.cc:171: //
"a..b.c". This is unnecessarily permissive.
Can you clarify the 'currently' - do you mean in this code or the previous code?

Powered by Google App Engine
This is Rietveld 408576698