Issue 650113: Convert SafeBrowsingStoreFile to do bulk reads and writes.

Issue 650113: Convert SafeBrowsingStoreFile to do bulk reads and writes. (Closed)

Created:
10 years, 10 months ago by Scott Hess - ex-Googler

Modified:
9 years, 6 months ago

Reviewers:
Erik does not do reviews, piman

CC:
chromium-reviews_googlegroups.com

Visibility:
Public.

Description

Convert SafeBrowsingStoreFile to do bulk reads and writes. Read/write the data in the style of fread/fwrite, rather than doing I/O element by element. This lays the groundwork for adding checksumming. BUG=none TEST=none Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=39619 Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=40186

Patch Set 1 #

Total comments: 9

Patch Set 2 : Drop functor, dedicated read/write ops for sets. #

Patch Set 3 : Typo. #

Patch Set 4 : Fix for gcc 4.4 #

Patch Set 5 : Additional comment on the tweak #

Total comments: 2

Patch Set 6 : Default-initialize POD contents. #

Created: 10 years, 10 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+201 lines, -290 lines)			Patch
M	chrome/browser/safe_browsing/safe_browsing_store.h	View	1 2 3 4 5	4 chunks	+5 lines, -0 lines	0 comments	Download
M	chrome/browser/safe_browsing/safe_browsing_store_file.h	View		1 chunk	+0 lines, -19 lines	0 comments	Download
M	chrome/browser/safe_browsing/safe_browsing_store_file.cc	View	1 2 4 5	9 chunks	+196 lines, -271 lines	0 comments	Download

Messages

Total messages: 18 (0 generated)

Expand Messages | Collapse Messages

Scott Hess - ex-Googler

OK, now we're getting somewhere. This if the first half of adding checksums to the ...

10 years, 10 months ago (2010-02-20 20:34:13 UTC) #1

Erik does not do reviews

http://codereview.chromium.org/650113/diff/1/3 File chrome/browser/safe_browsing/safe_browsing_store_file.cc (right): http://codereview.chromium.org/650113/diff/1/3#newcode98 chrome/browser/safe_browsing/safe_browsing_store_file.cc:98: class DeletedChunkRemover { name is a bit confusing. Maybe ...

10 years, 10 months ago (2010-02-22 17:50:58 UTC) #2

Scott Hess - ex-Googler

10 years, 10 months ago (2010-02-22 18:25:55 UTC) #4

http://codereview.chromium.org/650113/diff/1/3
File chrome/browser/safe_browsing/safe_browsing_store_file.cc (right):

http://codereview.chromium.org/650113/diff/1/3#newcode98
chrome/browser/safe_browsing/safe_browsing_store_file.cc:98: class
DeletedChunkRemover {
On 2010/02/22 17:50:58, Erik Kay wrote:
> name is a bit confusing.  Maybe DeletedChunkExists or DeletedChunkTest?

Now that you mention it - that _is_ confusing, since it's not the active remover
thing.  DeletedChunkTester seems more reasonable.

http://codereview.chromium.org/650113/diff/1/3#newcode120
chrome/browser/safe_browsing/safe_browsing_store_file.cc:120:
std::remove_if(old_end, vec->end(), DeletedChunkRemover<T>(del_set));
On 2010/02/22 17:50:58, Erik Kay wrote:
> I was going to comment that this seemed like it didn't have the nice perf
> characteristics of your old code, but then I thought better and read the docs
> for this (poorly named) method.  It's pretty cool, but I would have never
> guessed what it did from its name.

The remove_if() is good for what it does, writing it longhand would be fine,
too.  Thinking back, I believe that using the functor is an artifact of how I
ended up with this design - now that it's only used in these two places, it's
not as compelling.  Requiring the reader to look something up is poor, I think.

Unfortunately, we can either read in bulk then delete, or read record-by-record
and delete before inserting, so at core this probably is a bit less efficient in
memory usage.

http://codereview.chromium.org/650113/diff/1/3#newcode361
chrome/browser/safe_browsing/safe_browsing_store_file.cc:361:
DeletedChunkRemover<SBAddFullHash>(add_del_cache_));
This usage is also kind of degenerate.  Once I had a use for
DeletedChunkRemover<>(), having multiple styles of testing was poor.

http://codereview.chromium.org/650113/diff/1/3#newcode394
chrome/browser/safe_browsing/safe_browsing_store_file.cc:394: std::vector<int32>
add_chunks_flat(add_chunks_cache_.begin(),
On 2010/02/22 17:50:58, Erik Kay wrote:
> These sets can be pretty large, right?  It's a little bit of a bummer that
we're
> doubling their memory use here, but I guess it's not really avoidable.  At a
> minimum, you could create and write these vectors one at a time so we're not
> paying the extra memory use for both of them at the same time.

I could recall the distinct writer specific to sets, but I don't think these are
relatively all that big.  From a 5M database, they're maybe 25k records (100k of
data)?  Having all callers use the same readers and writers is useful for the
checksum change.

Hmm.  It would be easy to toss this into a wrapper function to get the
half-at-a-time without losing the all-through-one-chokepoint.  OK.

Scott Hess - ex-Googler

Sorry to waste a perfectly good LGTM, but ... could you check out these two ...

10 years, 10 months ago (2010-02-22 19:32:22 UTC) #5

Scott Hess - ex-Googler

+piman because of the ARM change. ARM builder failed with: chrome/browser/safe_browsing/safe_browsing_store_file.cc: In function 'bool<unnamed>::ReadToVectorAndDelete(std::vector<T, std::allocator<_Tp1> ...

10 years, 10 months ago (2010-02-23 21:47:43 UTC) #7

Scott Hess - ex-Googler

On 2010/02/23 21:47:43, shess wrote: > ARM builder failed with: <crap> +1 for websites freely ...

10 years, 10 months ago (2010-02-23 21:51:46 UTC) #8

piman

http://codereview.chromium.org/650113/diff/1011/1012 File chrome/browser/safe_browsing/safe_browsing_store.h (right): http://codereview.chromium.org/650113/diff/1011/1012#newcode46 chrome/browser/safe_browsing/safe_browsing_store.h:46: SBAddPrefix() {} mmh, I wonder if the gcc issue ...

10 years, 10 months ago (2010-02-23 22:02:53 UTC) #9

Scott Hess - ex-Googler

10 years, 10 months ago (2010-02-23 22:22:38 UTC) #10

piman

On Tue, Feb 23, 2010 at 2:22 PM, <shess@chromium.org> wrote: > > http://codereview.chromium.org/650113/diff/1011/1012 > File ...

10 years, 10 months ago (2010-02-23 22:31:52 UTC) #11

Scott Hess - ex-Googler

On 2010/02/23 22:31:52, piman wrote: > Different inlining may or may not let the compiler ...

10 years, 10 months ago (2010-02-23 22:39:53 UTC) #12

Scott Hess - ex-Googler

On 2010/02/23 22:39:53, shess wrote: > On 2010/02/23 22:31:52, piman wrote: > > Different inlining ...

10 years, 10 months ago (2010-02-23 22:42:27 UTC) #13

Scott Hess - ex-Googler

On 2010/02/23 22:42:27, shess wrote: > On 2010/02/23 22:39:53, shess wrote: > > On 2010/02/23 ...

10 years, 10 months ago (2010-02-25 00:53:30 UTC) #14

On 2010/02/23 22:42:27, shess wrote:
> On 2010/02/23 22:39:53, shess wrote:
> > On 2010/02/23 22:31:52, piman wrote:
> > > Different inlining may or may not let the compiler detect uninitialized
> > > reads - std::vector::resize likely copies a default object into its final
> > > position when growing (meaning it will read from the newly constructed
> > > object).
> > > In general we frown upon uninitialized data (constant source of bugs), so
I
> > > strongly suggest fixing that.
> > 
> > I'm concerned about code-generation because there are no uninitialized
reads. 
> > The data is uninitialized after the resize(), if ReadArray() succeeds the
data
> > has been initialized (from the file), if it fails the second resize() should
> > remove it.  So if the code is detecting an uninitialized read, that implies
> that
> > it can detect an initialized read, and that chunk_id is initialized with 0,
> thus
> > the count() on the const set in RemoveDeleted() will always be the same and
> the
> > loop can be reduced appropriately.
> 
> Beyond all that, though, chunk_id is uninitialized in the replacement code,
too
> :-).
> 
> And ... adding empty destructors to the objects also allows the original code
to
> work.

OK, so in the interests of moving this forward... the code in question wants to
enlarge the vector, then fill the data directly from a blob.  This is a
degenerate thing to do, but it is also intentional.  Things that seem to fix the
error:

 - Use explicit insert() and erase() as alternatives to resize().
 - Default constructors initialize chunk_id.
 - Provide default destructors.
 - Provide a copy constructor.
 - Remove all constructors entirely.

Initializing chunk_id seems right, but insofar as the compiler can see that it's
used uninitialized and is making a legitimate complaint, I'm not entirely clear
that it wouldn't optimize that use WRT the default constructor, which would be
incorrect.

I kind of like the last option of removing all constructors.  I originally added
constructors as a convenience for creating new elements.

piman

On Wed, Feb 24, 2010 at 4:53 PM, <shess@chromium.org> wrote: > On 2010/02/23 22:42:27, shess ...

10 years, 10 months ago (2010-02-25 02:09:21 UTC) #15

On Wed, Feb 24, 2010 at 4:53 PM, <shess@chromium.org> wrote:

> On 2010/02/23 22:42:27, shess wrote:
>
>> On 2010/02/23 22:39:53, shess wrote:
>> > On 2010/02/23 22:31:52, piman wrote:
>> > > Different inlining may or may not let the compiler detect
>> uninitialized
>> > > reads - std::vector::resize likely copies a default object into its
>> final
>> > > position when growing (meaning it will read from the newly constructed
>> > > object).
>> > > In general we frown upon uninitialized data (constant source of bugs),
>> so
>>
> I
>
>> > > strongly suggest fixing that.
>> >
>> > I'm concerned about code-generation because there are no uninitialized
>>
> reads.
>
>> > The data is uninitialized after the resize(), if ReadArray() succeeds
>> the
>>
> data
>
>> > has been initialized (from the file), if it fails the second resize()
>> should
>> > remove it.  So if the code is detecting an uninitialized read, that
>> implies
>> that
>> > it can detect an initialized read, and that chunk_id is initialized with
>> 0,
>> thus
>> > the count() on the const set in RemoveDeleted() will always be the same
>> and
>> the
>> > loop can be reduced appropriately.
>>
>
>  Beyond all that, though, chunk_id is uninitialized in the replacement
>> code,
>>
> too
>
>> :-).
>>
>
>  And ... adding empty destructors to the objects also allows the original
>> code
>>
> to
>
>> work.
>>
>
> OK, so in the interests of moving this forward... the code in question
> wants to
> enlarge the vector, then fill the data directly from a blob.  This is a
> degenerate thing to do, but it is also intentional.  Things that seem to
> fix the
> error:
>
>  - Use explicit insert() and erase() as alternatives to resize().
>

This seems wrong, I don't buy the GCC bug, we have uninitialized stuff that
we most likely shouldn't. Switching to insert/erase only mask the problem.


>  - Default constructors initialize chunk_id.
>

(and other pod fields)


>  - Provide default destructors.
>  - Provide a copy constructor.
>

Those two probably mask the problem because it makes the struct non-POD and
the compiler consider it differently


>  - Remove all constructors entirely.
>

I'm not sure why that fixes it...


>
> Initializing chunk_id seems right, but insofar as the compiler can see that
> it's
> used uninitialized and is making a legitimate complaint, I'm not entirely
> clear
> that it wouldn't optimize that use WRT the default constructor, which would
> be
> incorrect.
>

What's the concern about initializing the data ? You're doing I/O, that'll
be orders of magnitude slower than writing a bunch of 0s in memory.


>
> I kind of like the last option of removing all constructors.  I originally
> added
> constructors as a convenience for creating new elements.


>
> http://codereview.chromium.org/650113
>

Scott Hess - ex-Googler

Sorry for the delay. Sheriffing. On Wed, Feb 24, 2010 at 6:08 PM, Antoine Labour ...

10 years, 10 months ago (2010-02-26 23:39:41 UTC) #16

Scott Hess - ex-Googler

10 years, 10 months ago (2010-02-27 00:40:07 UTC) #18

On 2010/02/26 23:46:46, piman wrote:
> LGTM, thanks. I think it's better that way.

Thank you for the patience.  I learned a lot.  Probably not stuff I'll ever be
able to productively use, but I'll sure wow them at the next cocktail party I
attend.

Expand Messages | Collapse Messages