Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(349)

Issue 2816693002: [IndexedRuleset] Improve worst-case domain list matching. (Closed)

Created:
3 years, 8 months ago by pkalinnikov
Modified:
3 years, 8 months ago
CC:
chromium-reviews, subresource-filter-reviews_chromium.org
Target Ref:
refs/heads/master
Project:
chromium
Visibility:
Public.

Description

[IndexedRuleset] Improve worst-case domain list matching. Some URL rules have long domain lists (up to 100-200 items). The previous solution simply scanned the list linearly whenever the rule was being matched against a network request, and checked each domain to be a subdomain of the document's origin. The CL replaces this algorithm. Now the origin which is matched against the list is scanned. Each of its subdomains (in descending order of length) gets binary-searched in the sorted list of domains, until some of them is found. The old linear scanning still has place for short domain lists. Quick measurements have shown that the length threshold of 5 achieves a decent trade-off. The new binary-search-based algorithm activates only for domain lists longer than 5 items. Additionally, the CL changes the FlatBuffers format of the ruleset by splitting domain list into 2 parts (included/excluded) instead of embedding the exclusion bit into domain strings. This CL should improve the long-tail/worst-case matching performance. BUG=708458 Review-Url: https://codereview.chromium.org/2816693002 Cr-Commit-Position: refs/heads/master@{#464457} Committed: https://chromium.googlesource.com/chromium/src/+/154fc5ad392d229570cf7332ce8fb686da7d96c0

Patch Set 1 #

Total comments: 8

Patch Set 2 : Clean up, add unittest. #

Total comments: 40

Patch Set 3 : Address comments. #

Total comments: 14

Patch Set 4 : Address more comments. #

Patch Set 5 : Add TODO. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+266 lines, -129 lines) Patch
M components/subresource_filter/core/common/flat/rules.fbs View 1 chunk +2 lines, -3 lines 0 comments Download
M components/subresource_filter/core/common/indexed_ruleset.cc View 1 2 3 4 6 chunks +146 lines, -64 lines 0 comments Download
M components/subresource_filter/core/common/indexed_ruleset_unittest.cc View 1 2 2 chunks +118 lines, -62 lines 0 comments Download

Messages

Total messages: 43 (24 generated)
pkalinnikov
PTAL. I will add more tests with domains lists soon.
3 years, 8 months ago (2017-04-12 11:00:44 UTC) #4
pkalinnikov
https://codereview.chromium.org/2816693002/diff/1/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/1/components/subresource_filter/core/common/indexed_ruleset.cc#newcode368 components/subresource_filter/core/common/indexed_ruleset.cc:368: flatbuffers::uoffset_t left = 0; I will add more comments ...
3 years, 8 months ago (2017-04-12 11:02:07 UTC) #5
pkalinnikov
Some self-comments. Please don't review now, will fix them and return back to you. https://codereview.chromium.org/2816693002/diff/1/components/subresource_filter/core/common/indexed_ruleset.cc ...
3 years, 8 months ago (2017-04-12 11:23:21 UTC) #6
engedy
Looking forward to tests too. :-)
3 years, 8 months ago (2017-04-12 12:17:22 UTC) #9
pkalinnikov
Ready for review.
3 years, 8 months ago (2017-04-12 14:15:12 UTC) #12
pkalinnikov
https://codereview.chromium.org/2816693002/diff/1/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/1/components/subresource_filter/core/common/indexed_ruleset.cc#newcode350 components/subresource_filter/core/common/indexed_ruleset.cc:350: size_t DomainListMatch(const url::Origin& origin, const FlatDomains& domains) { On ...
3 years, 8 months ago (2017-04-12 14:19:50 UTC) #13
engedy
Could you please explain in the CL description the rationale for this change?
3 years, 8 months ago (2017-04-12 14:22:37 UTC) #14
Charlie Harrison
Generally looks good. I think it needs a bit more documentation of the high level ...
3 years, 8 months ago (2017-04-12 15:19:50 UTC) #15
engedy
LGTM % comments. https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode65 components/subresource_filter/core/common/indexed_ruleset.cc:65: // Reserve only for |domains_included| because ...
3 years, 8 months ago (2017-04-12 15:35:45 UTC) #18
Charlie Harrison
https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode71 components/subresource_filter/core/common/indexed_ruleset.cc:71: HasNoUpperAscii(domain) ? domain : base::ToLowerASCII(domain)); On 2017/04/12 15:35:45, engedy ...
3 years, 8 months ago (2017-04-12 15:41:11 UTC) #19
pkalinnikov
Addressed, thanks! Will you PTAL again? https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode65 components/subresource_filter/core/common/indexed_ruleset.cc:65: // Reserve only ...
3 years, 8 months ago (2017-04-13 12:09:09 UTC) #22
Charlie Harrison
LGTM % comments https://codereview.chromium.org/2816693002/diff/40001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/40001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode409 components/subresource_filter/core/common/indexed_ruleset.cc:409: DCHECK_LT(left, domains.size()); Might be nice to ...
3 years, 8 months ago (2017-04-13 12:39:36 UTC) #26
engedy
Still LGTM % comments, thanks! https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode89 components/subresource_filter/core/common/indexed_ruleset.cc:89: std::sort(domains_included.begin(), domains_included.end(), precedes); On ...
3 years, 8 months ago (2017-04-13 12:59:14 UTC) #29
pkalinnikov
Addressed more comments. PTAL. https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/20001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode89 components/subresource_filter/core/common/indexed_ruleset.cc:89: std::sort(domains_included.begin(), domains_included.end(), precedes); On 2017/04/13 ...
3 years, 8 months ago (2017-04-13 14:40:24 UTC) #32
Charlie Harrison
still LGTM https://codereview.chromium.org/2816693002/diff/40001/components/subresource_filter/core/common/indexed_ruleset.cc File components/subresource_filter/core/common/indexed_ruleset.cc (right): https://codereview.chromium.org/2816693002/diff/40001/components/subresource_filter/core/common/indexed_ruleset.cc#newcode409 components/subresource_filter/core/common/indexed_ruleset.cc:409: DCHECK_LT(left, domains.size()); On 2017/04/13 14:40:24, pkalinnikov wrote: ...
3 years, 8 months ago (2017-04-13 15:53:27 UTC) #35
engedy
LGTM.
3 years, 8 months ago (2017-04-13 16:33:06 UTC) #36
pkalinnikov
Added a TODO.
3 years, 8 months ago (2017-04-13 16:35:46 UTC) #37
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2816693002/80001
3 years, 8 months ago (2017-04-13 16:36:20 UTC) #40
commit-bot: I haz the power
3 years, 8 months ago (2017-04-13 17:39:16 UTC) #43
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as
https://chromium.googlesource.com/chromium/src/+/154fc5ad392d229570cf7332ce8f...

Powered by Google App Engine
This is Rietveld 408576698