Issue 2789683002: Speed up SimpleCache eviction set computation

Maks Orlovich

The CQ bit was checked by morlovich@chromium.org to run a CQ dry run

3 years, 8 months ago (2017-03-30 18:20:51 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2789683002/1

3 years, 8 months ago (2017-03-30 18:21:33 UTC) #2

Maks Orlovich

Windows numbers, on a slowish test laptop: ~25ms -> ~12ms The GetLastUsedTime tweak gets it ...

3 years, 8 months ago (2017-03-30 18:51:29 UTC) #3

Maks Orlovich

morlovich@chromium.org changed reviewers: + pasko@chromium.org

3 years, 8 months ago (2017-03-30 18:52:13 UTC) #4

Maks Orlovich

Please see some of the above comments re: potential tweak as well.

3 years, 8 months ago (2017-03-30 18:52:14 UTC) #5

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-03-30 21:06:24 UTC) #6

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 8 months ago (2017-03-30 21:06:25 UTC) #7

pasko

overall simplecache part looks good, asking for a more elaborate benchmark thank you for taking ...

3 years, 8 months ago (2017-03-31 17:41:56 UTC) #8

overall simplecache part looks good, asking for a more elaborate benchmark

thank you for taking on this!

Some history: the reason why we did not do it straight this way is AFAIR because
sizeof(EntryMetadata) was bigger than the hash and we wanted to consume less
memory at peak)

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
File net/disk_cache/disk_cache_perftest.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" <<
timer.Elapsed().InMillisecondsF() << "ms";
How stable are the numbers produced?

I think it is easy to create a significantly better benchmark that would avoid
disadvantages of:

1. timing insertions together with evictions

2. being susceptible to noise from brief background activities on the machine
running the benchmark

The usual recommendation in such cases is to repeat the workload in a tight loop
to run for 10-20 seconds and then divide by the amount of operations (the amount
of iterations is constant in code chosen for some random powerful machine).

So I'm imagining an iteration to be like:
1. insert a bunch
2. start timer
3. make series of evicts by gradually shrinking with SetMaxSize+UpdateEntrySize
4. stop timer

we may not need numerous iterations if the number of entries is sufficiently
large, but I guess the number would be more realistic id we get 'typical' order
of magnitude amount of entries, and repeat evictions on those if one such shrink
is too short.

WDYT?

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
File net/disk_cache/simple/simple_index.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:310: // Flatten for sorting
nit: period at the end of the sentence

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i =
entries_set_.begin();
how about:

for (auto& entry : entries_set_) { ... ?

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() <
b->second.GetLastUsedTime();
On 2017/03/30 18:51:29, morlovich1 wrote:
> Can get it further down to 2.9-3ms range by adding an inline accessor for
> last_used_time_seconds_since_epoch_ to EntryMetadata and comparing that,
> avoiding needing the external call and conversion to time::Base, etc.
> 
> Should I?

yeah, comparing the seconds directly sounds worth this small change, even if
only to avoid thinking of GetLastUsedTime() being more expensive on Windows
(though I am not sure whether the perf difference is actually noticeable).

Whether to inline it or not - maybe let the compiler decide for now? I think at
least GCC would do a good job of inlining simple 32bit getters, even when we
optimize for size. Anyway, it is not such a big deal, so just avoid clutter with
manual inline.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:327:
entry_hashes.reserve(entries_set_.size() / 16);
why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it would be
good in assumptions that all entries are equally sized and probably OK otherwise
as well.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
File net/disk_cache/simple/simple_index.h (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.h:178:
FRIEND_TEST_ALL_PREFIXES(SimpleIndexPerfTest, EvictionPerformance);
It would be more readable to introduce a public
SimpleIndex::InsertEntryForTesting() and avoid this friendship. Do we need it
for something else?

Maks Orlovich

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_perftest.cc File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_perftest.cc#newcode337 net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" << timer.Elapsed().InMillisecondsF() << "ms"; On 2017/03/31 ...

3 years, 8 months ago (2017-03-31 19:01:30 UTC) #9

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
File net/disk_cache/disk_cache_perftest.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" <<
timer.Elapsed().InMillisecondsF() << "ms";
On 2017/03/31 17:41:56, pasko wrote:
> How stable are the numbers produced?

Less stable now that you've asked about it :) (usually was in about 0.1-0.2ms
range, but sometimes spiked)... and at any rate your suggestion produced numbers
that vary in single digit amount of microseconds or so.

> I think it is easy to create a significantly better benchmark that would avoid
> disadvantages of:
> 
> 1. timing insertions together with evictions

That turned out a bigger deal than I thought, looks like insertions actually
dominated post-this-CL.

Bottom-line numbers after revised benchmark (and with smaller # of entries than
before):
4.82ms -> 0.33ms on a my Linux workstation. (Will get the test Windows laptop
numbers as follow up, need to upload the change first)

> 
> 2. being susceptible to noise from brief background activities on the machine
> running the benchmark
> 
> The usual recommendation in such cases is to repeat the workload in a tight
loop
> to run for 10-20 seconds and then divide by the amount of operations (the
amount
> of iterations is constant in code chosen for some random powerful machine).

Anyway, did something similar. Any reason for constant iterations rather than
constant time?

> 
> So I'm imagining an iteration to be like:
> 1. insert a bunch
> 2. start timer
> 3. make series of evicts by gradually shrinking with
SetMaxSize+UpdateEntrySize

Hmm, didn't think of shrinking the size, that's a nice way of avoiding mixed
timing. Why multiple evicts, though?

> 4. stop timer
> 
> we may not need numerous iterations if the number of entries is sufficiently
> large, but I guess the number would be more realistic id we get 'typical'
order
> of magnitude amount of entries, and repeat evictions on those if one such
shrink
> is too short.

Yeah, don't want to increase the # of entries too much since the algorithm is
super-linear. Actually with these changes, I think I'll go to 10K since it's
more realistic.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
File net/disk_cache/simple/simple_index.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:310: // Flatten for sorting
On 2017/03/31 17:41:56, pasko wrote:
> nit: period at the end of the sentence

Done.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i =
entries_set_.begin();
On 2017/03/31 17:41:56, pasko wrote:
> how about:
> 
> for (auto& entry : entries_set_) { ... ?

Hmm, I think it'll work (with a const... might not w/o it), but it's really hard
to be confident that it'd actually take pointers into the structure with one
more layer of abstraction here hiding the iterators. That's the trick here,
really --- EntryMetadata is biggish, but a pointer into the table is <= 8 bytes.
(The iterator isn't. I tried)

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() <
b->second.GetLastUsedTime();
> yeah, comparing the seconds directly sounds worth this small change, even if
> only to avoid thinking of GetLastUsedTime() being more expensive on Windows
> (though I am not sure whether the perf difference is actually noticeable).
> 
> Whether to inline it or not - maybe let the compiler decide for now? I think
at
> least GCC would do a good job of inlining simple 32bit getters, even when we
> optimize for size. Anyway, it is not such a big deal, so just avoid clutter
with
> manual inline.

Done. By inline I just meant int Accessor() const { return field_; } above.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:327:
entry_hashes.reserve(entries_set_.size() / 16);
On 2017/03/31 17:41:56, pasko wrote:
> why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it would
be
> good in assumptions that all entries are equally sized and probably OK
otherwise
> as well.

Mostly just going with a nearest power of 2. Probably a pointless
overoptimization --- I don't suppose a proper divide would be an issue even on a
lowest-end phone? (Though GCC/clang can probably turn it into multiply, too)

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
File net/disk_cache/simple/simple_index.h (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.h:178:
FRIEND_TEST_ALL_PREFIXES(SimpleIndexPerfTest, EvictionPerformance);
On 2017/03/31 17:41:56, pasko wrote:
> It would be more readable to introduce a public
> SimpleIndex::InsertEntryForTesting() and avoid this friendship. Do we need it
> for something else?

Great idea, done. Also lets me avoid being in the disk_cache namespace for the
test.

Maks Orlovich

Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used ...

3 years, 8 months ago (2017-03-31 19:42:26 UTC) #10

Maks Orlovich

The CQ bit was checked by morlovich@chromium.org to run a CQ dry run

3 years, 8 months ago (2017-04-01 02:53:12 UTC) #11

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2789683002/20001

3 years, 8 months ago (2017-04-01 02:53:31 UTC) #12

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-04-01 04:04:10 UTC) #13

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 8 months ago (2017-04-01 04:04:11 UTC) #14

pasko

looks really nice now! my remaining notes are about nits, and stylistic sugar, and random ...

3 years, 8 months ago (2017-04-03 09:39:08 UTC) #15

looks really nice now!

my remaining notes are about nits, and stylistic sugar, and random thoughts. All
looks addressable in one review roundtrip.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
File net/disk_cache/disk_cache_perftest.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p...
net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" <<
timer.Elapsed().InMillisecondsF() << "ms";
On 2017/03/31 19:01:29, Maks Orlovich wrote:
> On 2017/03/31 17:41:56, pasko wrote:
> > How stable are the numbers produced?
> 
> Less stable now that you've asked about it :) (usually was in about 0.1-0.2ms
> range, but sometimes spiked)... and at any rate your suggestion produced
numbers
> that vary in single digit amount of microseconds or so.
> 
> 
> > I think it is easy to create a significantly better benchmark that would
avoid
> > disadvantages of:
> > 
> > 1. timing insertions together with evictions
> 
> That turned out a bigger deal than I thought, looks like insertions actually
> dominated post-this-CL.
> 
> Bottom-line numbers after revised benchmark (and with smaller # of entries
than
> before):
> 4.82ms -> 0.33ms on a my Linux workstation. (Will get the test Windows laptop
> numbers as follow up, need to upload the change first)
> 
> 
> > 
> > 2. being susceptible to noise from brief background activities on the
machine
> > running the benchmark
> > 
> > The usual recommendation in such cases is to repeat the workload in a tight
> loop
> > to run for 10-20 seconds and then divide by the amount of operations (the
> amount
> > of iterations is constant in code chosen for some random powerful machine).
> 
> Anyway, did something similar. Any reason for constant iterations rather than
> constant time?

Reasons I can see:
1. with const iterations there is less code
2. checking for time between iterations may be slow on some systems (Android/ARM
is one of them)
3. may affect CPU caches

The disadvantage of const iterations is that hypothetically at some point you
get a fast machine and in the world of standardized benchmarks this presents a
problem, but we can "re-standardize" this benchmark at almost no time.

> > So I'm imagining an iteration to be like:
> > 1. insert a bunch
> > 2. start timer
> > 3. make series of evicts by gradually shrinking with
> SetMaxSize+UpdateEntrySize
> 
> Hmm, didn't think of shrinking the size, that's a nice way of avoiding mixed
> timing. Why multiple evicts, though?

I thought it's nice because it reduces the amount of preparation work to be done
before the timed part of the benchmark, hence the overall testing will take less
time. Also covers eviction of different sizes in some way (but this latter may
also confuse the interpretation of the number - less sure).

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
File net/disk_cache/simple/simple_index.cc (right):

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i =
entries_set_.begin();
On 2017/03/31 19:01:29, Maks Orlovich wrote:
> On 2017/03/31 17:41:56, pasko wrote:
> > how about:
> > 
> > for (auto& entry : entries_set_) { ... ?
> 
> Hmm, I think it'll work (with a const... might not w/o it), but it's really
hard
> to be confident that it'd actually take pointers into the structure with one
> more layer of abstraction here hiding the iterators.

After looking at it 2nd time I think the explicit iterator is the right call,
since it might look confusing to have the 'auto' to map to std::pair. Maybe not
so confusing in 2020, but 2017 for pasko@ it would :)

> That's the trick here,
> really --- EntryMetadata is biggish, but a pointer into the table is <= 8
bytes.
> (The iterator isn't. I tried)

sizeof(EntryMetadata) == 8, see:
https://codesearch.chromium.org/chromium/src/net/disk_cache/simple/simple_ind...

But then you are right about one target: sizeof(EntryMetadata*) == 4 on
Android/ARM (i.e. what we currently ship), so it's a win. By the way, please
update the description to mention that it saves one aspect of memory consumption
on android/arm32.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() <
b->second.GetLastUsedTime();
On 2017/03/31 19:01:29, Maks Orlovich wrote:
> > yeah, comparing the seconds directly sounds worth this small change, even if
> > only to avoid thinking of GetLastUsedTime() being more expensive on Windows
> > (though I am not sure whether the perf difference is actually noticeable).
> > 
> > Whether to inline it or not - maybe let the compiler decide for now? I think
> at
> > least GCC would do a good job of inlining simple 32bit getters, even when we
> > optimize for size. Anyway, it is not such a big deal, so just avoid clutter
> with
> > manual inline.
> 
> Done. By inline I just meant int Accessor() const { return field_; } above.

Acknowledged.

https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl...
net/disk_cache/simple/simple_index.cc:327:
entry_hashes.reserve(entries_set_.size() / 16);
On 2017/03/31 19:01:29, Maks Orlovich wrote:
> On 2017/03/31 17:41:56, pasko wrote:
> > why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it
would
> be
> > good in assumptions that all entries are equally sized and probably OK
> otherwise
> > as well.
> 
> Mostly just going with a nearest power of 2. Probably a pointless
> overoptimization --- I don't suppose a proper divide would be an issue even on
a
> lowest-end phone? (Though GCC/clang can probably turn it into multiply, too)

Hm, I did not think of the division overhead here, thanks for insight :) It
would be hard to spot the cost of it on the benchmark we have, which is already
a microbenchmark, so on the real world scenarios the effect is even smaller.

I think the absence of .reserve() will also be not noticeable. Let's remove it
for simplicity? Looks like premature optimization.

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac...
File net/disk_cache/disk_cache_perftest.cc (right):

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac...
net/disk_cache/disk_cache_perftest.cc:333: i, disk_cache::EntryMetadata(start +
base::TimeDelta::FromSeconds(i),
nit: maybe do a sha1(i) to make the keys look more realistic? the hashmap has
its own hashing, which may play differently performance-wise with sequential
numbers than with seemingly random hashes we have.

Does not seem like worth complication though, fyi and up to you.

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac...
net/disk_cache/disk_cache_perftest.cc:335: index.UpdateEntrySize(i, 1u);
I think it would be less error prone to make InsertEntryForTesting to update
index.cache_size_ from the size of the entry.

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s...
File net/disk_cache/simple/simple_index.cc (right):

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s...
net/disk_cache/simple/simple_index.cc:326: // collecting relevant hashed into
entry_hashes.
s/relevant/least recently used/ <- reduces guessing for the reader

s/entry_hashes/|entry_hashes|/

Maks Orlovich

Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used ...

3 years, 8 months ago (2017-04-03 14:07:38 UTC) #16

Description was changed from

==========
Speed up SimpleCache eviction set computation

The code previously used to collect hash keys into vector and
sort that by LRU order, which required indexing into the hash
table on every comparison.

Instead, collect pointers into the hashtable (which are same size), to
avoid the hash lookups.

CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark:
4.82ms -> 0.33ms on Linux/clang workstation
9.26ms -> 0.64ms on Windows/MSVC laptop
(With 10K entries)

Memory usage impact: short-term impact is increased by needing a separate output
buffer for actual hashes, which I would expect to be < 5KiB or so at 95%
percentile, plus vector rounding up. Longer-term impact should actually be
lower, I think, since the buffer that's kept around during disk activity will be
sized to max(index entries/16, actually evicted entries)*8 or so, rather than
index entries * 8 bytes.

BUG=706878
==========

to

==========
Speed up SimpleCache eviction set computation

The code previously used to collect hash keys into vector and
sort that by LRU order, which required indexing into the hash
table on every comparison.

Instead, collect pointers into the hashtable (which are same size), to
avoid the hash lookups.

CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark:
4.82ms -> 0.33ms on Linux/clang workstation
9.26ms -> 0.64ms on Windows/MSVC laptop
(With 10K entries)

Memory usage impact: 
Let N = number of entries in cache. E = number of entries evicted,
which is usually around N/20.

My workstation currently has 16325 entries in cache, which is within the
"not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into
formulas to eyeball numbers.

32-bit:
Peak/short-term usage (not counting any vector rounding) goes from:
8*N -> 4*N + 8*E
e.g. 128,000 -> 70,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400


64-bit:
Shurt-term
8*N -> 8*N + 8*E
e.g. 128,000 -> 134,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400

BUG=706878
==========

Maks Orlovich

https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cache_perftest.cc File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cache_perftest.cc#newcode333 net/disk_cache/disk_cache_perftest.cc:333: i, disk_cache::EntryMetadata(start + base::TimeDelta::FromSeconds(i), On 2017/04/03 09:39:08, pasko wrote: ...

3 years, 8 months ago (2017-04-03 14:32:40 UTC) #17

Maks Orlovich

> Reasons I can see: > 1. with const iterations there is less code I ...

3 years, 8 months ago (2017-04-03 14:33:28 UTC) #18

> Reasons I can see:
> 1. with const iterations there is less code

I think in this case it just changes while (iterations < 61000) { 
to while (evict_elapsed_ms < 20000) { or such.

> 2. checking for time between iterations may be slow on some systems
(Android/ARM
> is one of them)

I guess that goes in general for anything that does stop/start timing? Kinda
hard to 
avoid that when the action is destructive wrt to set up.

> The disadvantage of const iterations is that hypothetically at some point you
> get a fast machine and in the world of standardized benchmarks this presents a
> problem, but we can "re-standardize" this benchmark at almost no time.

The other disadvantage is that when I measure on a relatively slow machine it
takes 
way too long... 

> I thought it's nice because it reduces the amount of preparation work to be
done
> before the timed part of the benchmark, hence the overall testing will take
less
> time. Also covers eviction of different sizes in some way (but this latter may
> also confuse the interpretation of the number - less sure).

The different sizes is why I avoided it, though I could argue either way.


> > That's the trick here,
> > really --- EntryMetadata is biggish, but a pointer into the table is <= 8
> bytes.
> > (The iterator isn't. I tried)
> 
> sizeof(EntryMetadata) == 8, see:
>
https://codesearch.chromium.org/chromium/src/net/disk_cache/simple/simple_ind...

Oh, right, but still need the key, which is extra 8 bytes.

> But then you are right about one target: sizeof(EntryMetadata*) == 4 on
> Android/ARM (i.e. what we currently ship), so it's a win. By the way, please
> update the description to mention that it saves one aspect of memory
consumption
> on android/arm32.

Will do/already did. This is a nice thought, given that this is probably the
most 
RAM sensitive platform.

> I think the absence of .reserve() will also be not noticeable. Let's remove it
> for simplicity? Looks like premature optimization.

OK, tried it, and can't measure a difference, so removed.

Maks Orlovich

The CQ bit was checked by morlovich@chromium.org to run a CQ dry run

3 years, 8 months ago (2017-04-03 17:27:01 UTC) #20

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2789683002/40001

3 years, 8 months ago (2017-04-03 17:27:33 UTC) #21

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-04-03 18:41:41 UTC) #22

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 8 months ago (2017-04-03 18:41:43 UTC) #23

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2789683002/40001

3 years, 8 months ago (2017-04-03 18:43:07 UTC) #25

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-04-03 19:01:40 UTC) #26

commit-bot: I haz the power

Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/401200)

3 years, 8 months ago (2017-04-03 19:01:41 UTC) #27

Maks Orlovich

Looks like I need an owner for disk_cache_perftest.cc

3 years, 8 months ago (2017-04-03 19:04:31 UTC) #28

Maks Orlovich

morlovich@chromium.org changed reviewers: + jkarlin@chromium.org

3 years, 8 months ago (2017-04-03 19:07:40 UTC) #29

Maks Orlovich

Josh, could you take a look as well? Looks like I need non-SimpleCache approval due ...

3 years, 8 months ago (2017-04-03 19:09:42 UTC) #30

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2789683002/40001

3 years, 8 months ago (2017-04-04 18:48:35 UTC) #33

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1491331650917480, "parent_rev": "f55816a64a901b81f70c128361fc5df16ceb40f6", "commit_rev": "c348edb0c510441efe4ae5b8b1cfccc47f3c15ce"}

3 years, 8 months ago (2017-04-04 20:23:27 UTC) #34

commit-bot: I haz the power

Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used ...

3 years, 8 months ago (2017-04-04 20:24:23 UTC) #35

Message was sent while issue was closed.

Description was changed from

==========
Speed up SimpleCache eviction set computation

The code previously used to collect hash keys into vector and
sort that by LRU order, which required indexing into the hash
table on every comparison.

Instead, collect pointers into the hashtable (which are same size), to
avoid the hash lookups.

CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark:
4.82ms -> 0.33ms on Linux/clang workstation
9.26ms -> 0.64ms on Windows/MSVC laptop
(With 10K entries)

Memory usage impact: 
Let N = number of entries in cache. E = number of entries evicted,
which is usually around N/20.

My workstation currently has 16325 entries in cache, which is within the
"not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into
formulas to eyeball numbers.

32-bit:
Peak/short-term usage (not counting any vector rounding) goes from:
8*N -> 4*N + 8*E
e.g. 128,000 -> 70,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400


64-bit:
Shurt-term
8*N -> 8*N + 8*E
e.g. 128,000 -> 134,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400

BUG=706878
==========

to

==========
Speed up SimpleCache eviction set computation

The code previously used to collect hash keys into vector and
sort that by LRU order, which required indexing into the hash
table on every comparison.

Instead, collect pointers into the hashtable (which are same size), to
avoid the hash lookups.

CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark:
4.82ms -> 0.33ms on Linux/clang workstation
9.26ms -> 0.64ms on Windows/MSVC laptop
(With 10K entries)

Memory usage impact: 
Let N = number of entries in cache. E = number of entries evicted,
which is usually around N/20.

My workstation currently has 16325 entries in cache, which is within the
"not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into
formulas to eyeball numbers.

32-bit:
Peak/short-term usage (not counting any vector rounding) goes from:
8*N -> 4*N + 8*E
e.g. 128,000 -> 70,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400


64-bit:
Shurt-term
8*N -> 8*N + 8*E
e.g. 128,000 -> 134,400

Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400

BUG=706878

Review-Url: https://codereview.chromium.org/2789683002
Cr-Commit-Position: refs/heads/master@{#461815}
Committed:
https://chromium.googlesource.com/chromium/src/+/c348edb0c510441efe4ae5b8b1cf...
==========

commit-bot: I haz the power

3 years, 8 months ago (2017-04-04 20:24:25 UTC) #36

Message was sent while issue was closed.

Committed patchset #3 (id:40001) as
https://chromium.googlesource.com/chromium/src/+/c348edb0c510441efe4ae5b8b1cf...

Issue 2789683002: Speed up SimpleCache eviction set computation (Closed)

Description

Patch Set 1 #

Patch Set 2 : Apply review feedback. #

Patch Set 3 : git cl try #

Messages