Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(247)

Issue 2128173003: Merge the hash prefixes from the old store and the additions in the partial (Closed)

Created:
4 years, 5 months ago by vakh (use Gerrit instead)
Modified:
4 years, 4 months ago
CC:
chromium-reviews, palmer, noé, woz
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Merge the hash prefixes from the old store and the additions in the partial update fetched from the PVer4 service to update the hash prefixes in the new store. Design doc: https://goto.google.com/design-doc-v4store BUG=543161 Committed: https://crrev.com/2eb43a6078426caf95b9fd46342e47696bfb4272 Cr-Commit-Position: refs/heads/master@{#405331}

Patch Set 1 #

Patch Set 2 : git fetch and pull #

Total comments: 1

Patch Set 3 : Minor changes to the test inputs #

Total comments: 25

Patch Set 4 : HashPrefix as std::string and so is HashPrefixes (the list of hashes). HashPrefixes is a concatenat… #

Patch Set 5 : Nit: Added a NOTREACHED #

Patch Set 6 : git fetch and pull #

Total comments: 21

Patch Set 7 : Simplified merge logic. +Nit: Fix a failing test #

Total comments: 8

Patch Set 8 : Discard update if there's an error in processing it. Add tests. CR feedback. #

Patch Set 9 : Minor: Address comments by nparker@ #

Patch Set 10 : Add the histogram info to histograms.xml #

Total comments: 1

Patch Set 11 : Nit: s/soace/space #

Unified diffs Side-by-side diffs Delta from patch set Stats (+612 lines, -20 lines) Patch
M components/safe_browsing_db/v4_database.h View 1 2 3 4 5 6 7 1 chunk +1 line, -0 lines 0 comments Download
M components/safe_browsing_db/v4_database.cc View 1 2 3 4 5 6 7 1 chunk +3 lines, -1 line 0 comments Download
M components/safe_browsing_db/v4_database_unittest.cc View 1 2 3 4 5 6 7 4 chunks +49 lines, -11 lines 0 comments Download
M components/safe_browsing_db/v4_store.h View 1 2 3 4 5 6 7 8 9 10 4 chunks +114 lines, -0 lines 0 comments Download
M components/safe_browsing_db/v4_store.cc View 1 2 3 4 5 6 7 8 3 chunks +230 lines, -8 lines 0 comments Download
M components/safe_browsing_db/v4_store_unittest.cc View 1 2 3 4 5 6 7 8 1 chunk +196 lines, -0 lines 0 comments Download
M tools/metrics/histograms/histograms.xml View 1 2 3 4 5 6 7 8 9 2 chunks +19 lines, -0 lines 0 comments Download

Dependent Patchsets:

Messages

Total messages: 37 (6 generated)
vakh (use Gerrit instead)
nparker: This isn't ready for a full review yet but it might be worth taking ...
4 years, 5 months ago (2016-07-07 10:41:50 UTC) #2
vakh (use Gerrit instead)
git fetch and pull
4 years, 5 months ago (2016-07-07 10:42:53 UTC) #3
vakh (use Gerrit instead)
Minor changes to the test inputs
4 years, 5 months ago (2016-07-07 10:53:03 UTC) #4
Nathan Parker
https://codereview.chromium.org/2128173003/diff/40001/components/safe_browsing_db/v4_store.cc File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2128173003/diff/40001/components/safe_browsing_db/v4_store.cc#newcode22 components/safe_browsing_db/v4_store.cc:22: const uint32_t kMinHashPrefixLength = 4; Nit: Would it be ...
4 years, 5 months ago (2016-07-11 18:09:58 UTC) #5
vakh (use Gerrit instead)
HashPrefix as std::string and so is HashPrefixes (the list of hashes). HashPrefixes is a concatenation ...
4 years, 5 months ago (2016-07-12 07:30:10 UTC) #6
vakh (use Gerrit instead)
Two comments left unresolved. Addressed all other feedback. PTAL. https://codereview.chromium.org/2128173003/diff/20001/components/safe_browsing_db/v4_store.cc File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2128173003/diff/20001/components/safe_browsing_db/v4_store.cc#newcode39 components/safe_browsing_db/v4_store.cc:39: ...
4 years, 5 months ago (2016-07-12 07:34:19 UTC) #7
vakh (use Gerrit instead)
Nit: Added a NOTREACHED
4 years, 5 months ago (2016-07-12 07:35:03 UTC) #8
vakh (use Gerrit instead)
git fetch and pull
4 years, 5 months ago (2016-07-12 07:44:50 UTC) #9
vakh (use Gerrit instead)
Simplified merge logic. +Nit: Fix a failing test
4 years, 5 months ago (2016-07-12 18:33:14 UTC) #10
Nathan Parker
I ended up commenting on two different versions, but I think it worked out. https://codereview.chromium.org/2128173003/diff/100001/components/safe_browsing_db/v4_store.cc ...
4 years, 5 months ago (2016-07-12 20:35:06 UTC) #11
vakh (use Gerrit instead)
Discard update if there's an error in processing it. Add tests. CR feedback.
4 years, 5 months ago (2016-07-12 21:56:47 UTC) #12
vakh (use Gerrit instead)
https://codereview.chromium.org/2128173003/diff/40001/components/safe_browsing_db/v4_store.cc File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2128173003/diff/40001/components/safe_browsing_db/v4_store.cc#newcode248 components/safe_browsing_db/v4_store.cc:248: if (found_in_old && found_in_additions) { On 2016/07/11 18:09:58, Nathan ...
4 years, 5 months ago (2016-07-12 21:57:03 UTC) #13
Nathan Parker
lgtm w/ two comments to address https://codereview.chromium.org/2128173003/diff/100001/components/safe_browsing_db/v4_store.cc File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2128173003/diff/100001/components/safe_browsing_db/v4_store.cc#newcode161 components/safe_browsing_db/v4_store.cc:161: (*additions_map)[prefix_size] = lumped_hashes; ...
4 years, 5 months ago (2016-07-12 22:47:12 UTC) #14
vakh (use Gerrit instead)
Minor: Address comments by nparker@
4 years, 5 months ago (2016-07-13 00:21:30 UTC) #15
vakh (use Gerrit instead)
Add the histogram info to histograms.xml
4 years, 5 months ago (2016-07-13 00:28:11 UTC) #16
vakh (use Gerrit instead)
rkaplow@ -- can you please review changes in histograms.xml? nparker@ -- thanks for the review! ...
4 years, 5 months ago (2016-07-13 00:29:17 UTC) #18
Scott Hess - ex-Googler
I find that I don't have the time/energy to really understand/review during the few days ...
4 years, 5 months ago (2016-07-13 05:46:08 UTC) #20
rkaplow
lgtm histogram lg
4 years, 5 months ago (2016-07-13 19:54:38 UTC) #21
vakh (use Gerrit instead)
Nit: s/soace/space
4 years, 5 months ago (2016-07-13 20:24:05 UTC) #22
vakh (use Gerrit instead)
On 2016/07/13 05:46:08, Scott Hess (OOO Jul 1-Aug 7) wrote: > I find that I ...
4 years, 5 months ago (2016-07-13 20:27:02 UTC) #23
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2128173003/200001
4 years, 5 months ago (2016-07-13 20:28:30 UTC) #26
commit-bot: I haz the power
Committed patchset #11 (id:200001)
4 years, 5 months ago (2016-07-13 22:55:51 UTC) #27
commit-bot: I haz the power
CQ bit was unchecked.
4 years, 5 months ago (2016-07-13 22:56:10 UTC) #28
Scott Hess - ex-Googler
On 2016/07/13 20:27:02, vakh wrote: > On 2016/07/13 05:46:08, Scott Hess (OOO Jul 1-Aug 7) ...
4 years, 5 months ago (2016-07-13 22:57:05 UTC) #29
commit-bot: I haz the power
Patchset 11 (id:??) landed as https://crrev.com/2eb43a6078426caf95b9fd46342e47696bfb4272 Cr-Commit-Position: refs/heads/master@{#405331}
4 years, 5 months ago (2016-07-13 22:59:06 UTC) #31
vakh (use Gerrit instead)
On 2016/07/13 22:57:05, Scott Hess (OOO Jul 1-Aug 7) wrote: > On 2016/07/13 20:27:02, vakh ...
4 years, 5 months ago (2016-07-13 23:10:54 UTC) #32
Scott Hess - ex-Googler
On 2016/07/13 23:10:54, vakh wrote: > On 2016/07/13 22:57:05, Scott Hess (OOO Jul 1-Aug 7) ...
4 years, 4 months ago (2016-08-10 17:45:53 UTC) #33
vakh (use Gerrit instead)
On 2016/08/10 17:45:53, Scott Hess wrote: > On 2016/07/13 23:10:54, vakh wrote: > > On ...
4 years, 4 months ago (2016-08-10 17:58:50 UTC) #34
Scott Hess - ex-Googler
On 2016/08/10 17:58:50, vakh wrote: > I am not sure I understand your comment completely ...
4 years, 4 months ago (2016-08-10 18:22:10 UTC) #35
Scott Hess - ex-Googler
On 2016/08/10 17:58:50, vakh wrote: > I am not sure I understand your comment completely ...
4 years, 4 months ago (2016-08-10 18:22:15 UTC) #36
vakh (use Gerrit instead)
4 years, 4 months ago (2016-08-10 18:44:15 UTC) #37
Message was sent while issue was closed.
On 2016/08/10 18:22:15, Scott Hess wrote:
> On 2016/08/10 17:58:50, vakh wrote:
> > I am not sure I understand your comment completely but the core of it seems
> be:
> > why are we using different-sized prefixes?
> > The reason is that the SafeBrowsing service now supports different sized
> > prefixes.
> > These are used to prevent collisions between a bad website and a hugely
> popular
> > website.
> > You probably understand this better than me, but to illustrate with an
> example,
> > let's assume:
> > 1. example.com/one is a bad URL with hash: aabbccddeeeeeeee
> > 2. http://google.com with hash: aabbccddgggggggg
> > 
> > They have a common 32-bit hash-prefix: aabbccdd
> > This means if we only support 32-bit hash prefixes, every time a user goes
to
> > http://google.com, it would lead to a full hash request to SafeBrowsing
> service.
> > 
> > By supporting 5-byte hash prefixes, we blacklist: aabbccddee and do not send
> > full hash requests for aabbccddgg
> 
> My point is that for a set of ~2M items using 32-bit prefixes, there should be
> only a handful of collisions, so the advantage of having a 40-bit prefix for
> that case over a 32-byte full hash is pretty trivial.
> 
> If you had ~2M items using 24-bit prefixes, there should be a fair number of
> collisions, so being able to send both 24-bit and 32-bit prefixes could save
> some bandwidth.  If the set was well-sorted, though, the initial download
> wouldn't be much impacted (due to compression opportunities), and if prefixes
> waffle between 24-bit and 32-bit during updates you could easily eat up the
> savings in overhead.  In any case, moving from 24-bit to 32-bit should reduce
> collisions by 1/256, at which point it's unlikely that adding 40-bit prefixes
> would be a material improvement.
> 
> Mostly where I'm going is that it's probably not worthwhile to make the code
> complicated to save ~1% in bandwidth.  Having a map from prefix size to lists
> feels complicated to me

I know the new design is slightly more complex to understand at first due to
variable sized
hash prefixes, but to be honest, I find it easier to understand overall than the
PVer2/3
implementation. Part of it is because you helped keep it simple by avoiding use
of locks etc and
partly because the data structures used (unordered_maps) are ones used very
commonly
(unlike BloomFilters/PrefixSet).

I would say that the increase in complexity of code is subjective.

> because it makes it really easy for minor errors to
> have major impacts.
With the kind of user base Chrome has, this is true of any errors.

> As a for-instance of the kind of thing I worry about, you
> have a map full of iterators with the same signatures from different
containers,
> so you could imagine cases comparing across containers resulting in running
off
> the end of the container, but maybe due to statistics of hashing it only
happens
> infrequently so the case kicks up based on a data change, rather than a code
> change (so suddenly all channels start crashing).
I agree that any condition that can trigger based on data change can start
causing crashes anytime
and should be avoided.
At the same time, I think that's why having unit tests and rolling out the
change slowly
is the right way to go about this launch, which is what I plan to do.

As for the particular case of map of iterators, it is always initialized just
before use
and has a short lifetime so I don't expect it to go into a weird state ever.
Also, there are a number of unit tests for that piece of code.

> 
> [It's possible the server side has already set everything in stone for you. 
> I've often worried about whether Chrome team gave them enough pushback about
> offloading processing and memory requirements onto clients, given that there
are
> billions of clients.]

That's a fair point, but it is something that can be discussed for the next
iteration
of the server implementation. As the service works currently, this is the only
option
for Chrome [other than to not adopt the new server implementation].

Powered by Google App Engine
This is Rietveld 408576698