|
|
Chromium Code Reviews|
Created:
3 years, 8 months ago by Maks Orlovich Modified:
3 years, 8 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, gavinp+disk_chromium.org, net-reviews_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionSpeed up SimpleCache eviction set computation
The code previously used to collect hash keys into vector and
sort that by LRU order, which required indexing into the hash
table on every comparison.
Instead, collect pointers into the hashtable (which are same size), to
avoid the hash lookups.
CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark:
4.82ms -> 0.33ms on Linux/clang workstation
9.26ms -> 0.64ms on Windows/MSVC laptop
(With 10K entries)
Memory usage impact:
Let N = number of entries in cache. E = number of entries evicted,
which is usually around N/20.
My workstation currently has 16325 entries in cache, which is within the "not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into formulas to eyeball numbers.
32-bit:
Peak/short-term usage (not counting any vector rounding) goes from:
8*N -> 4*N + 8*E
e.g. 128,000 -> 70,400
Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400
64-bit:
Shurt-term
8*N -> 8*N + 8*E
e.g. 128,000 -> 134,400
Long-term, while I/O going on, usage goes from:
8*N -> 8*E
e.g. 128,000 -> 6,400
BUG=706878
Review-Url: https://codereview.chromium.org/2789683002
Cr-Commit-Position: refs/heads/master@{#461815}
Committed: https://chromium.googlesource.com/chromium/src/+/c348edb0c510441efe4ae5b8b1cfccc47f3c15ce
Patch Set 1 #
Total comments: 17
Patch Set 2 : Apply review feedback. #
Total comments: 6
Patch Set 3 : git cl try #
Messages
Total messages: 36 (21 generated)
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Windows numbers, on a slowish test laptop: ~25ms -> ~12ms The GetLastUsedTime tweak gets it down further to 5.8ms range. Sounds like I should do that, then, unless there is a good reason not to? https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() < b->second.GetLastUsedTime(); Can get it further down to 2.9-3ms range by adding an inline accessor for last_used_time_seconds_since_epoch_ to EntryMetadata and comparing that, avoiding needing the external call and conversion to time::Base, etc. Should I?
morlovich@chromium.org changed reviewers: + pasko@chromium.org
Please see some of the above comments re: potential tweak as well.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
overall simplecache part looks good, asking for a more elaborate benchmark thank you for taking on this! Some history: the reason why we did not do it straight this way is AFAIR because sizeof(EntryMetadata) was bigger than the hash and we wanted to consume less memory at peak) https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" << timer.Elapsed().InMillisecondsF() << "ms"; How stable are the numbers produced? I think it is easy to create a significantly better benchmark that would avoid disadvantages of: 1. timing insertions together with evictions 2. being susceptible to noise from brief background activities on the machine running the benchmark The usual recommendation in such cases is to repeat the workload in a tight loop to run for 10-20 seconds and then divide by the amount of operations (the amount of iterations is constant in code chosen for some random powerful machine). So I'm imagining an iteration to be like: 1. insert a bunch 2. start timer 3. make series of evicts by gradually shrinking with SetMaxSize+UpdateEntrySize 4. stop timer we may not need numerous iterations if the number of entries is sufficiently large, but I guess the number would be more realistic id we get 'typical' order of magnitude amount of entries, and repeat evictions on those if one such shrink is too short. WDYT? https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:310: // Flatten for sorting nit: period at the end of the sentence https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i = entries_set_.begin(); how about: for (auto& entry : entries_set_) { ... ? https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() < b->second.GetLastUsedTime(); On 2017/03/30 18:51:29, morlovich1 wrote: > Can get it further down to 2.9-3ms range by adding an inline accessor for > last_used_time_seconds_since_epoch_ to EntryMetadata and comparing that, > avoiding needing the external call and conversion to time::Base, etc. > > Should I? yeah, comparing the seconds directly sounds worth this small change, even if only to avoid thinking of GetLastUsedTime() being more expensive on Windows (though I am not sure whether the perf difference is actually noticeable). Whether to inline it or not - maybe let the compiler decide for now? I think at least GCC would do a good job of inlining simple 32bit getters, even when we optimize for size. Anyway, it is not such a big deal, so just avoid clutter with manual inline. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:327: entry_hashes.reserve(entries_set_.size() / 16); why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it would be good in assumptions that all entries are equally sized and probably OK otherwise as well. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.h (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.h:178: FRIEND_TEST_ALL_PREFIXES(SimpleIndexPerfTest, EvictionPerformance); It would be more readable to introduce a public SimpleIndex::InsertEntryForTesting() and avoid this friendship. Do we need it for something else?
https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" << timer.Elapsed().InMillisecondsF() << "ms"; On 2017/03/31 17:41:56, pasko wrote: > How stable are the numbers produced? Less stable now that you've asked about it :) (usually was in about 0.1-0.2ms range, but sometimes spiked)... and at any rate your suggestion produced numbers that vary in single digit amount of microseconds or so. > I think it is easy to create a significantly better benchmark that would avoid > disadvantages of: > > 1. timing insertions together with evictions That turned out a bigger deal than I thought, looks like insertions actually dominated post-this-CL. Bottom-line numbers after revised benchmark (and with smaller # of entries than before): 4.82ms -> 0.33ms on a my Linux workstation. (Will get the test Windows laptop numbers as follow up, need to upload the change first) > > 2. being susceptible to noise from brief background activities on the machine > running the benchmark > > The usual recommendation in such cases is to repeat the workload in a tight loop > to run for 10-20 seconds and then divide by the amount of operations (the amount > of iterations is constant in code chosen for some random powerful machine). Anyway, did something similar. Any reason for constant iterations rather than constant time? > > So I'm imagining an iteration to be like: > 1. insert a bunch > 2. start timer > 3. make series of evicts by gradually shrinking with SetMaxSize+UpdateEntrySize Hmm, didn't think of shrinking the size, that's a nice way of avoiding mixed timing. Why multiple evicts, though? > 4. stop timer > > we may not need numerous iterations if the number of entries is sufficiently > large, but I guess the number would be more realistic id we get 'typical' order > of magnitude amount of entries, and repeat evictions on those if one such shrink > is too short. Yeah, don't want to increase the # of entries too much since the algorithm is super-linear. Actually with these changes, I think I'll go to 10K since it's more realistic. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:310: // Flatten for sorting On 2017/03/31 17:41:56, pasko wrote: > nit: period at the end of the sentence Done. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i = entries_set_.begin(); On 2017/03/31 17:41:56, pasko wrote: > how about: > > for (auto& entry : entries_set_) { ... ? Hmm, I think it'll work (with a const... might not w/o it), but it's really hard to be confident that it'd actually take pointers into the structure with one more layer of abstraction here hiding the iterators. That's the trick here, really --- EntryMetadata is biggish, but a pointer into the table is <= 8 bytes. (The iterator isn't. I tried) https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() < b->second.GetLastUsedTime(); > yeah, comparing the seconds directly sounds worth this small change, even if > only to avoid thinking of GetLastUsedTime() being more expensive on Windows > (though I am not sure whether the perf difference is actually noticeable). > > Whether to inline it or not - maybe let the compiler decide for now? I think at > least GCC would do a good job of inlining simple 32bit getters, even when we > optimize for size. Anyway, it is not such a big deal, so just avoid clutter with > manual inline. Done. By inline I just meant int Accessor() const { return field_; } above. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:327: entry_hashes.reserve(entries_set_.size() / 16); On 2017/03/31 17:41:56, pasko wrote: > why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it would be > good in assumptions that all entries are equally sized and probably OK otherwise > as well. Mostly just going with a nearest power of 2. Probably a pointless overoptimization --- I don't suppose a proper divide would be an issue even on a lowest-end phone? (Though GCC/clang can probably turn it into multiply, too) https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.h (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.h:178: FRIEND_TEST_ALL_PREFIXES(SimpleIndexPerfTest, EvictionPerformance); On 2017/03/31 17:41:56, pasko wrote: > It would be more readable to introduce a public > SimpleIndex::InsertEntryForTesting() and avoid this friendship. Do we need it > for something else? Great idea, done. Also lets me avoid being in the disk_cache namespace for the test.
Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark seems to go from about 12.3ms to about 5.3ms (on Linux/clang, will also add Windows data later) Memory usage impact: short-term impact is increased by needing a separate output buffer for actual hashes, which I would expect to be < 5KiB or so at 95% percentile, plus vector rounding up. Longer-term impact should actually be lower, I think, since the buffer that's kept around during disk activity will be sized to max(index entries/16, actually evicted entries)*8 or so, rather than index entries * 8 bytes. BUG=706878 ========== to ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark: 4.82ms -> 0.33ms on Linux/clang workstation 9.26ms -> 0.64ms on Windows/MSVC laptop (With 10K entries) Memory usage impact: short-term impact is increased by needing a separate output buffer for actual hashes, which I would expect to be < 5KiB or so at 95% percentile, plus vector rounding up. Longer-term impact should actually be lower, I think, since the buffer that's kept around during disk activity will be sized to max(index entries/16, actually evicted entries)*8 or so, rather than index entries * 8 bytes. BUG=706878 ==========
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
looks really nice now! my remaining notes are about nits, and stylistic sugar, and random thoughts. All looks addressable in one review roundtrip. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/disk_cache_p... net/disk_cache/disk_cache_perftest.cc:337: LOG(ERROR) << "Took:" << timer.Elapsed().InMillisecondsF() << "ms"; On 2017/03/31 19:01:29, Maks Orlovich wrote: > On 2017/03/31 17:41:56, pasko wrote: > > How stable are the numbers produced? > > Less stable now that you've asked about it :) (usually was in about 0.1-0.2ms > range, but sometimes spiked)... and at any rate your suggestion produced numbers > that vary in single digit amount of microseconds or so. > > > > I think it is easy to create a significantly better benchmark that would avoid > > disadvantages of: > > > > 1. timing insertions together with evictions > > That turned out a bigger deal than I thought, looks like insertions actually > dominated post-this-CL. > > Bottom-line numbers after revised benchmark (and with smaller # of entries than > before): > 4.82ms -> 0.33ms on a my Linux workstation. (Will get the test Windows laptop > numbers as follow up, need to upload the change first) > > > > > > 2. being susceptible to noise from brief background activities on the machine > > running the benchmark > > > > The usual recommendation in such cases is to repeat the workload in a tight > loop > > to run for 10-20 seconds and then divide by the amount of operations (the > amount > > of iterations is constant in code chosen for some random powerful machine). > > Anyway, did something similar. Any reason for constant iterations rather than > constant time? Reasons I can see: 1. with const iterations there is less code 2. checking for time between iterations may be slow on some systems (Android/ARM is one of them) 3. may affect CPU caches The disadvantage of const iterations is that hypothetically at some point you get a fast machine and in the world of standardized benchmarks this presents a problem, but we can "re-standardize" this benchmark at almost no time. > > So I'm imagining an iteration to be like: > > 1. insert a bunch > > 2. start timer > > 3. make series of evicts by gradually shrinking with > SetMaxSize+UpdateEntrySize > > Hmm, didn't think of shrinking the size, that's a nice way of avoiding mixed > timing. Why multiple evicts, though? I thought it's nice because it reduces the amount of preparation work to be done before the timed part of the benchmark, hence the overall testing will take less time. Also covers eviction of different sizes in some way (but this latter may also confuse the interpretation of the number - less sure). https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:313: for (EntrySet::const_iterator i = entries_set_.begin(); On 2017/03/31 19:01:29, Maks Orlovich wrote: > On 2017/03/31 17:41:56, pasko wrote: > > how about: > > > > for (auto& entry : entries_set_) { ... ? > > Hmm, I think it'll work (with a const... might not w/o it), but it's really hard > to be confident that it'd actually take pointers into the structure with one > more layer of abstraction here hiding the iterators. After looking at it 2nd time I think the explicit iterator is the right call, since it might look confusing to have the 'auto' to map to std::pair. Maybe not so confusing in 2020, but 2017 for pasko@ it would :) > That's the trick here, > really --- EntryMetadata is biggish, but a pointer into the table is <= 8 bytes. > (The iterator isn't. I tried) sizeof(EntryMetadata) == 8, see: https://codesearch.chromium.org/chromium/src/net/disk_cache/simple/simple_ind... But then you are right about one target: sizeof(EntryMetadata*) == 4 on Android/ARM (i.e. what we currently ship), so it's a win. By the way, please update the description to mention that it saves one aspect of memory consumption on android/arm32. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:321: return a->second.GetLastUsedTime() < b->second.GetLastUsedTime(); On 2017/03/31 19:01:29, Maks Orlovich wrote: > > yeah, comparing the seconds directly sounds worth this small change, even if > > only to avoid thinking of GetLastUsedTime() being more expensive on Windows > > (though I am not sure whether the perf difference is actually noticeable). > > > > Whether to inline it or not - maybe let the compiler decide for now? I think > at > > least GCC would do a good job of inlining simple 32bit getters, even when we > > optimize for size. Anyway, it is not such a big deal, so just avoid clutter > with > > manual inline. > > Done. By inline I just meant int Accessor() const { return field_; } above. Acknowledged. https://codereview.chromium.org/2789683002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index.cc:327: entry_hashes.reserve(entries_set_.size() / 16); On 2017/03/31 19:01:29, Maks Orlovich wrote: > On 2017/03/31 17:41:56, pasko wrote: > > why 16? The low_watermark_ is based on kEvictionMarginDivisor, using it would > be > > good in assumptions that all entries are equally sized and probably OK > otherwise > > as well. > > Mostly just going with a nearest power of 2. Probably a pointless > overoptimization --- I don't suppose a proper divide would be an issue even on a > lowest-end phone? (Though GCC/clang can probably turn it into multiply, too) Hm, I did not think of the division overhead here, thanks for insight :) It would be hard to spot the cost of it on the benchmark we have, which is already a microbenchmark, so on the real world scenarios the effect is even smaller. I think the absence of .reserve() will also be not noticeable. Let's remove it for simplicity? Looks like premature optimization. https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... net/disk_cache/disk_cache_perftest.cc:333: i, disk_cache::EntryMetadata(start + base::TimeDelta::FromSeconds(i), nit: maybe do a sha1(i) to make the keys look more realistic? the hashmap has its own hashing, which may play differently performance-wise with sequential numbers than with seemingly random hashes we have. Does not seem like worth complication though, fyi and up to you. https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... net/disk_cache/disk_cache_perftest.cc:335: index.UpdateEntrySize(i, 1u); I think it would be less error prone to make InsertEntryForTesting to update index.cache_size_ from the size of the entry. https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s... net/disk_cache/simple/simple_index.cc:326: // collecting relevant hashed into entry_hashes. s/relevant/least recently used/ <- reduces guessing for the reader s/entry_hashes/|entry_hashes|/
Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark: 4.82ms -> 0.33ms on Linux/clang workstation 9.26ms -> 0.64ms on Windows/MSVC laptop (With 10K entries) Memory usage impact: short-term impact is increased by needing a separate output buffer for actual hashes, which I would expect to be < 5KiB or so at 95% percentile, plus vector rounding up. Longer-term impact should actually be lower, I think, since the buffer that's kept around during disk activity will be sized to max(index entries/16, actually evicted entries)*8 or so, rather than index entries * 8 bytes. BUG=706878 ========== to ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark: 4.82ms -> 0.33ms on Linux/clang workstation 9.26ms -> 0.64ms on Windows/MSVC laptop (With 10K entries) Memory usage impact: Let N = number of entries in cache. E = number of entries evicted, which is usually around N/20. My workstation currently has 16325 entries in cache, which is within the "not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into formulas to eyeball numbers. 32-bit: Peak/short-term usage (not counting any vector rounding) goes from: 8*N -> 4*N + 8*E e.g. 128,000 -> 70,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 64-bit: Shurt-term 8*N -> 8*N + 8*E e.g. 128,000 -> 134,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 BUG=706878 ==========
https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... File net/disk_cache/disk_cache_perftest.cc (right): https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... net/disk_cache/disk_cache_perftest.cc:333: i, disk_cache::EntryMetadata(start + base::TimeDelta::FromSeconds(i), On 2017/04/03 09:39:08, pasko wrote: > nit: maybe do a sha1(i) to make the keys look more realistic? the hashmap has > its own hashing, which may play differently performance-wise with sequential > numbers than with seemingly random hashes we have. > > Does not seem like worth complication though, fyi and up to you. Yeah, I think I'll pass, since I trust unordered_map more than I trust myself not to forget to indirect via keys[] somewhere.... https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/disk_cac... net/disk_cache/disk_cache_perftest.cc:335: index.UpdateEntrySize(i, 1u); On 2017/04/03 09:39:08, pasko wrote: > I think it would be less error prone to make InsertEntryForTesting to update > index.cache_size_ from the size of the entry. Done. Also added a DCHECK to require entry to not be there, since the behavior is weird otherwise, and making it right seems more code than this is worth. https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/2789683002/diff/20001/net/disk_cache/simple/s... net/disk_cache/simple/simple_index.cc:326: // collecting relevant hashed into entry_hashes. On 2017/04/03 09:39:08, pasko wrote: > s/relevant/least recently used/ <- reduces guessing for the reader > > s/entry_hashes/|entry_hashes|/ Done.
> Reasons I can see:
> 1. with const iterations there is less code
I think in this case it just changes while (iterations < 61000) {
to while (evict_elapsed_ms < 20000) { or such.
> 2. checking for time between iterations may be slow on some systems
(Android/ARM
> is one of them)
I guess that goes in general for anything that does stop/start timing? Kinda
hard to
avoid that when the action is destructive wrt to set up.
> The disadvantage of const iterations is that hypothetically at some point you
> get a fast machine and in the world of standardized benchmarks this presents a
> problem, but we can "re-standardize" this benchmark at almost no time.
The other disadvantage is that when I measure on a relatively slow machine it
takes
way too long...
> I thought it's nice because it reduces the amount of preparation work to be
done
> before the timed part of the benchmark, hence the overall testing will take
less
> time. Also covers eviction of different sizes in some way (but this latter may
> also confuse the interpretation of the number - less sure).
The different sizes is why I avoided it, though I could argue either way.
> > That's the trick here,
> > really --- EntryMetadata is biggish, but a pointer into the table is <= 8
> bytes.
> > (The iterator isn't. I tried)
>
> sizeof(EntryMetadata) == 8, see:
>
https://codesearch.chromium.org/chromium/src/net/disk_cache/simple/simple_ind...
Oh, right, but still need the key, which is extra 8 bytes.
> But then you are right about one target: sizeof(EntryMetadata*) == 4 on
> Android/ARM (i.e. what we currently ship), so it's a win. By the way, please
> update the description to mention that it saves one aspect of memory
consumption
> on android/arm32.
Will do/already did. This is a nice thought, given that this is probably the
most
RAM sensitive platform.
> I think the absence of .reserve() will also be not noticeable. Let's remove it
> for simplicity? Looks like premature optimization.
OK, tried it, and can't measure a difference, so removed.
lgtm
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by morlovich@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
Looks like I need an owner for disk_cache_perftest.cc
morlovich@chromium.org changed reviewers: + jkarlin@chromium.org
Josh, could you take a look as well? Looks like I need non-SimpleCache approval due to changing diskcache_perftest
disk_cache_perftest.cc lgtm
The CQ bit was checked by morlovich@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 40001, "attempt_start_ts": 1491331650917480,
"parent_rev": "f55816a64a901b81f70c128361fc5df16ceb40f6", "commit_rev":
"c348edb0c510441efe4ae5b8b1cfccc47f3c15ce"}
Message was sent while issue was closed.
Description was changed from ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark: 4.82ms -> 0.33ms on Linux/clang workstation 9.26ms -> 0.64ms on Windows/MSVC laptop (With 10K entries) Memory usage impact: Let N = number of entries in cache. E = number of entries evicted, which is usually around N/20. My workstation currently has 16325 entries in cache, which is within the "not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into formulas to eyeball numbers. 32-bit: Peak/short-term usage (not counting any vector rounding) goes from: 8*N -> 4*N + 8*E e.g. 128,000 -> 70,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 64-bit: Shurt-term 8*N -> 8*N + 8*E e.g. 128,000 -> 134,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 BUG=706878 ========== to ========== Speed up SimpleCache eviction set computation The code previously used to collect hash keys into vector and sort that by LRU order, which required indexing into the hash table on every comparison. Instead, collect pointers into the hashtable (which are same size), to avoid the hash lookups. CPU impact: new SimpleIndexPerfTest, EvictionPerformance ubenchmark: 4.82ms -> 0.33ms on Linux/clang workstation 9.26ms -> 0.64ms on Windows/MSVC laptop (With 10K entries) Memory usage impact: Let N = number of entries in cache. E = number of entries evicted, which is usually around N/20. My workstation currently has 16325 entries in cache, which is within the "not-surprising" range, so I'll plug N=16000 plus corresponding E=800 into formulas to eyeball numbers. 32-bit: Peak/short-term usage (not counting any vector rounding) goes from: 8*N -> 4*N + 8*E e.g. 128,000 -> 70,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 64-bit: Shurt-term 8*N -> 8*N + 8*E e.g. 128,000 -> 134,400 Long-term, while I/O going on, usage goes from: 8*N -> 8*E e.g. 128,000 -> 6,400 BUG=706878 Review-Url: https://codereview.chromium.org/2789683002 Cr-Commit-Position: refs/heads/master@{#461815} Committed: https://chromium.googlesource.com/chromium/src/+/c348edb0c510441efe4ae5b8b1cf... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/c348edb0c510441efe4ae5b8b1cf... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
