|
|
Created:
4 years, 5 months ago by pkalinnikov Modified:
4 years, 5 months ago Reviewers:
engedy CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@ruleset Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionImplement matching of URL patterns.
A compound Knuth-Morris-Pratt failure function is used to match URLs against the
|pattern|.
The |pattern| is split on the '*' wildcard symbol and then a failure function is
built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction
(depending on whether the subpattern contains separator placeholders), and
appended to a vector. Some of the subpatterns can be exempted from being
indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern
can be matched by checking if it's a prefix of a URL.
Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value
1 (to distinguish them from the subpatterns indexed with BuildFailureFunction,
their failure functions always start with 0).
BUG=609747
Committed: https://crrev.com/15cf72498b6786b1f3f11b0b0695541484e50d0a
Cr-Commit-Position: refs/heads/master@{#405098}
Patch Set 1 #
Total comments: 23
Patch Set 2 : Make the unittests code shorter. #Patch Set 3 : Add dependency. #Patch Set 4 : Address comments. #Patch Set 5 : Fix int-to-bool cast warning. #Patch Set 6 : Change comment. #Patch Set 7 : Fix another cast warning. #Messages
Total messages: 30 (18 generated)
Description was changed from ========== Data structures for Safe Browsing subresource filtering rules. BUG=609747 ========== to ========== Data structures for Safe Browsing subresource filtering rules. BUG=609747 ==========
Patchset #1 (id:1) has been deleted
Description was changed from ========== Data structures for Safe Browsing subresource filtering rules. BUG=609747 ========== to ========== Implement matching of URL patterns. BUG=609747 ==========
pkalinnikov@chromium.org changed reviewers: + engedy@chromium.org
Description was changed from ========== Implement matching of URL patterns. BUG=609747 ========== to ========== Implement matching of URL patterns. To build a compound failure function used to match URLs against a |pattern|, the |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders). Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
Patchset #1 (id:20001) has been deleted
PTAL.
Patchset #3 (id:80001) has been deleted
Patchset #3 (id:100001) has been deleted
LGTM, thanks! I have added a comment to the text that you are also using in the CL description, could you please propagate the changes? We might need to rethink our approach for the test cases, but it's not blocking this CL. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern.h:21: proto::UrlPatternType type = proto::URL_PATTERN_TYPE_UNSPECIFIED; style nit: Please move the data members to the end. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern.h:41: // representations. The passed |rule| must outlive the created instance. nit: passed in https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.cc (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.cc:51: // The 1 value is put in order to let matchers distinguish the subpatterns nit: Prepend the value '1' to the failure function to ... https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:5: // Throughout this file and the corresponding .cc we consistently distinguish nit: The matching logic distinguishes between the terms URL pattern and subpattern. ... https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:28: // Builds a compound failure function used to match URLs against the |pattern|. nit: ... compound Knuth-Morris-Pratt failure ... https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:32: // placeholders), and appended to the returned vector. Some of the subpatterns nit: Can you please split this into 2 or 3 paragraphs? https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:39: // The URL |pattern| is supposed to be normalized. For instance, it shouldn't nit: ... must be normalized, namely, it must not have the ... To clarify, shouldn't all pipe '|' characters have been removed from the pattern already by this point? https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:45: // is used that was previously created by BuildFailureFunction(pattern). The nit: How about: The |failure| function must be the output of BuildFailureFunction() called with the same |pattern|. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:51: const std::vector<size_t>& failure); #include <stddef.h> https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching_unittest.cc (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching_unittest.cc:20: {{"abcd", proto::URL_PATTERN_TYPE_SUBSTRING}, {0, 0, 0, 0}}, Can we just add a using declaration to the enum to make these shorter?
Description was changed from ========== Implement matching of URL patterns. To build a compound failure function used to match URLs against a |pattern|, the |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders). Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to the returned vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
Description was changed from ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to the returned vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
Description was changed from ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. TThe |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
PTAL again. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern.h:21: proto::UrlPatternType type = proto::URL_PATTERN_TYPE_UNSPECIFIED; On 2016/07/12 14:26:07, engedy wrote: > style nit: Please move the data members to the end. Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern.h:41: // representations. The passed |rule| must outlive the created instance. On 2016/07/12 14:26:07, engedy wrote: > nit: passed in Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.cc (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.cc:51: // The 1 value is put in order to let matchers distinguish the subpatterns On 2016/07/12 14:26:07, engedy wrote: > nit: Prepend the value '1' to the failure function to ... Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:5: // Throughout this file and the corresponding .cc we consistently distinguish On 2016/07/12 14:26:07, engedy wrote: > nit: The matching logic distinguishes between the terms URL pattern and > subpattern. ... Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:28: // Builds a compound failure function used to match URLs against the |pattern|. On 2016/07/12 14:26:07, engedy wrote: > nit: ... compound Knuth-Morris-Pratt failure ... Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:32: // placeholders), and appended to the returned vector. Some of the subpatterns On 2016/07/12 14:26:07, engedy wrote: > nit: Can you please split this into 2 or 3 paragraphs? Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:39: // The URL |pattern| is supposed to be normalized. For instance, it shouldn't On 2016/07/12 14:26:07, engedy wrote: > nit: ... must be normalized, namely, it must not have the ... > > To clarify, shouldn't all pipe '|' characters have been removed from the pattern > already by this point? They should have. But still, if the pattern has a BOUNDARY left anchor, then it shouldn't start with an asterisk. What do you think is the best way to shortly express this? https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:45: // is used that was previously created by BuildFailureFunction(pattern). The On 2016/07/12 14:26:07, engedy wrote: > nit: How about: > > The |failure| function must be the output of BuildFailureFunction() called with > the same |pattern|. Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:51: const std::vector<size_t>& failure); On 2016/07/12 14:26:07, engedy wrote: > #include <stddef.h> Done. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching_unittest.cc (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching_unittest.cc:20: {{"abcd", proto::URL_PATTERN_TYPE_SUBSTRING}, {0, 0, 0, 0}}, On 2016/07/12 14:26:07, engedy wrote: > Can we just add a using declaration to the enum to make these shorter? Please see the patchset #2. Is that what you mean?
PTAL again.
Description was changed from ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. TThe |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
Still LGTM % comments. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:39: // The URL |pattern| is supposed to be normalized. For instance, it shouldn't On 2016/07/12 14:48:13, pkalinnikov wrote: > On 2016/07/12 14:26:07, engedy wrote: > > nit: ... must be normalized, namely, it must not have the ... > > > > To clarify, shouldn't all pipe '|' characters have been removed from the > pattern > > already by this point? > > They should have. But still, if the pattern has a BOUNDARY left anchor, then it > shouldn't start with an asterisk. What do you think is the best way to shortly > express this? In that case I would just rephrase this in terms of where wildcards are allowed depending on the left/right anchor type. https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching_unittest.cc (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching_unittest.cc:20: {{"abcd", proto::URL_PATTERN_TYPE_SUBSTRING}, {0, 0, 0, 0}}, On 2016/07/12 14:48:13, pkalinnikov wrote: > On 2016/07/12 14:26:07, engedy wrote: > > Can we just add a using declaration to the enum to make these shorter? > > Please see the patchset #2. Is that what you mean? Yep, this is even better, thanks!
https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... File components/subresource_filter/core/common/url_pattern_matching.h (right): https://codereview.chromium.org/2098233002/diff/40001/components/subresource_... components/subresource_filter/core/common/url_pattern_matching.h:39: // The URL |pattern| is supposed to be normalized. For instance, it shouldn't On 2016/07/12 16:06:31, engedy wrote: > On 2016/07/12 14:48:13, pkalinnikov wrote: > > On 2016/07/12 14:26:07, engedy wrote: > > > nit: ... must be normalized, namely, it must not have the ... > > > > > > To clarify, shouldn't all pipe '|' characters have been removed from the > > pattern > > > already by this point? > > > > They should have. But still, if the pattern has a BOUNDARY left anchor, then > it > > shouldn't start with an asterisk. What do you think is the best way to shortly > > express this? > > In that case I would just rephrase this in terms of where wildcards are allowed > depending on the left/right anchor type. Done.
The CQ bit was checked by pkalinnikov@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from engedy@chromium.org Link to the patchset: https://codereview.chromium.org/2098233002/#ps200001 (title: "Fix another cast warning.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by pkalinnikov@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:200001)
Message was sent while issue was closed.
CQ bit was unchecked.
Message was sent while issue was closed.
Description was changed from ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 ========== to ========== Implement matching of URL patterns. A compound Knuth-Morris-Pratt failure function is used to match URLs against the |pattern|. The |pattern| is split on the '*' wildcard symbol and then a failure function is built for each subpattern by BuildFailureFunctionFuzzy or BuildFailureFunction (depending on whether the subpattern contains separator placeholders), and appended to a vector. Some of the subpatterns can be exempted from being indexed. E.g., if the |pattern| has a BOUNDARY left anchor, the first subpattern can be matched by checking if it's a prefix of a URL. Each subpattern indexed with BuildFailureFunctionFuzzy is prepended with a value 1 (to distinguish them from the subpatterns indexed with BuildFailureFunction, their failure functions always start with 0). BUG=609747 Committed: https://crrev.com/15cf72498b6786b1f3f11b0b0695541484e50d0a Cr-Commit-Position: refs/heads/master@{#405098} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/15cf72498b6786b1f3f11b0b0695541484e50d0a Cr-Commit-Position: refs/heads/master@{#405098} |