|
|
Created:
5 years, 7 months ago by Daniel Bratell Modified:
5 years, 7 months ago CC:
chromium-reviews, piman+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionMore compact representation of the GL Query cache.
The only thing we need to know per QuerySync block is its
index in the block sequence. Everything else can be
calculated from that if we just store a few numbers in
the Bucket objects.
Total memory reduction is 92%.
This is alt #2 in bug 485536.
BUG=485536
Committed: https://crrev.com/e49ed7adac7c2100b340f8963fcafa48544dccdf
Cr-Commit-Position: refs/heads/master@{#330334}
Patch Set 1 #Patch Set 2 : Shrink even more. #Patch Set 3 : Minimize patch size #Patch Set 4 : Smaller patch still. #
Total comments: 18
Patch Set 5 : Post review. #Patch Set 6 : Removed the |used| counter #Patch Set 7 : And unittest #
Messages
Total messages: 21 (5 generated)
bratell@opera.com changed reviewers: + reveman@chromium.org
bratell@opera.com changed reviewers: + jbauman@chromium.org
Adding jbauman as reviewer too since he recommended this approach in the bug.
https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.cc (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:30: QuerySyncManager::Bucket::~Bucket() = default; is "= default;" allowed/recommended now? https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:49: if (bucket_candidate->free_queries.size() != 0) { s/free_queries.size() != 0/!free_queries.empty()/ https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:65: bucket->free_queries.reserve(kSyncsPerBucket); Please use kSyncsPerBucket when creating the vector in the ctor instead. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:66: for (size_t ii = 0; ii < kSyncsPerBucket; ++ii) maybe move this loop to the ctor too https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.h (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:37: unsigned used_query_count; is |used_query_count| still needed? https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:39: unsigned int base_shm_offset; nit: uint32 base_shm_offset to be consistent with SyncInfo https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:40: std::vector<unsigned short> free_queries; can we just use vector<size_t> here instead? that's the type used for kSyncsPerBucket...
https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.h (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:40: std::vector<unsigned short> free_queries; Could you add a COMPILE_ASSERT that kSyncsPerBucket <= USHRT_MAX (or similar constant)?
https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.cc (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:30: QuerySyncManager::Bucket::~Bucket() = default; On 2015/05/11 17:13:07, reveman wrote: > is "= default;" allowed/recommended now? It is allowed. The differences between Foo::~Foo() {} and Foo::~Foo() = default are minuscule but I believe that = default will in some cases generate better code (since the compiler and type traits can determine that the constructor isn't hand written) so I'm trying to get into the habit of using it. I don't mind changing to {} though so leaving it for you to decide which one you want here. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:49: if (bucket_candidate->free_queries.size() != 0) { On 2015/05/11 17:13:07, reveman wrote: > s/free_queries.size() != 0/!free_queries.empty()/ Done. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:65: bucket->free_queries.reserve(kSyncsPerBucket); On 2015/05/11 17:13:07, reveman wrote: > Please use kSyncsPerBucket when creating the vector in the ctor instead. Done. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:66: for (size_t ii = 0; ii < kSyncsPerBucket; ++ii) On 2015/05/11 17:13:07, reveman wrote: > maybe move this loop to the ctor too Done. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.h (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:37: unsigned used_query_count; On 2015/05/11 17:13:07, reveman wrote: > is |used_query_count| still needed? It can be replaced by |kSyncsPerBucket - bucket->free_queries.size()| but I didn't want to make the patch more complicated so I left if. I wouldn't mind doing that change if you think it's useful. For memory usage it would save 4 bytes per QueryBucket but probably add some 10-30 bytes of machine code. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:39: unsigned int base_shm_offset; On 2015/05/11 17:13:08, reveman wrote: > nit: uint32 base_shm_offset to be consistent with SyncInfo Done. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:40: std::vector<unsigned short> free_queries; On 2015/05/11 17:13:08, reveman wrote: > can we just use vector<size_t> here instead? that's the type used for > kSyncsPerBucket... This is the main source of memory usage in the cache so it's valuable to use the smallest type possible. I will add a static assert as jbauman suggests so that we don't get bitten by it if the cache is grown to more than 64k slots. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:40: std::vector<unsigned short> free_queries; On 2015/05/11 19:12:53, jbauman wrote: > Could you add a COMPILE_ASSERT that kSyncsPerBucket <= USHRT_MAX (or similar > constant)? Done.
https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.h (right): https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:37: unsigned used_query_count; On 2015/05/12 16:16:40, Daniel Bratell wrote: > On 2015/05/11 17:13:07, reveman wrote: > > is |used_query_count| still needed? > > It can be replaced by |kSyncsPerBucket - bucket->free_queries.size()| but I > didn't want to make the patch more complicated so I left if. I wouldn't mind > doing that change if you think it's useful. For memory usage it would save 4 > bytes per QueryBucket but probably add some 10-30 bytes of machine code. I generally prefer unnecessary state when possible but I'll leave it up to you to decide. https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.h:40: std::vector<unsigned short> free_queries; On 2015/05/12 16:16:40, Daniel Bratell wrote: > On 2015/05/11 17:13:08, reveman wrote: > > can we just use vector<size_t> here instead? that's the type used for > > kSyncsPerBucket... > > This is the main source of memory usage in the cache so it's valuable to use the > smallest type possible. I will add a static assert as jbauman suggests so that > we don't get bitten by it if the cache is grown to more than 64k slots. Acknowledged. Btw, have you considered using std::bitset<kSyncsPerBucket> where bit k represents the query at index k? That would very compact and I bet it would be plenty fast to just search it for a free query.
On 2015/05/12 16:45:05, reveman wrote: > https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... > File gpu/command_buffer/client/query_tracker.h (right): > > https://codereview.chromium.org/1131913004/diff/60001/gpu/command_buffer/clie... > gpu/command_buffer/client/query_tracker.h:37: unsigned used_query_count; > On 2015/05/12 16:16:40, Daniel Bratell wrote: > > On 2015/05/11 17:13:07, reveman wrote: > > > is |used_query_count| still needed? > > > > It can be replaced by |kSyncsPerBucket - bucket->free_queries.size()| but I > > didn't want to make the patch more complicated so I left if. I wouldn't mind > > doing that change if you think it's useful. For memory usage it would save 4 > > bytes per QueryBucket but probably add some 10-30 bytes of machine code. > > I generally prefer unnecessary state when possible but I'll leave it up to you > to decide. Ok, consider it removed! > Btw, have you considered using std::bitset<kSyncsPerBucket> where bit k > represents the query at index k? That would very compact and I bet it would be > plenty fast to just search it for a free query. Hey, that is the Opera way of thinking! (People underestimate the speed of linear scan through memory so for guaranteed not-large N, arrays will be smaller, faster and need less machine code than more complex data structures - that is something we take advantage of in Presto but not something that is generally accepted in Chromium because of the risks and maintenance once N might be increasing beyond "small"). I have considered it but right now I rather land this. I can prepare a followup patch once this has landed and we can see if we think it's worth it.
lgtm
The CQ bit was checked by bratell@opera.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1131913004/120001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
jbauman, we need your reviews as well! Can you please take a look?
lgtm
The CQ bit was checked by bratell@opera.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1131913004/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/e49ed7adac7c2100b340f8963fcafa48544dccdf Cr-Commit-Position: refs/heads/master@{#330334}
Message was sent while issue was closed.
> > Btw, have you considered using std::bitset<kSyncsPerBucket> where bit k > > represents the query at index k? That would very compact and I bet it would be > > plenty fast to just search it for a free query. > > Hey, that is the Opera way of thinking! (People underestimate the speed of > linear scan through memory so for guaranteed not-large N, arrays will be > smaller, faster and need less machine code than more complex data structures - > that is something we take advantage of in Presto but not something that is > generally accepted in Chromium because of the risks and maintenance once N might > be increasing beyond "small"). I have considered it but right now I rather land > this. I can prepare a followup patch once this has landed and we can see if we > think it's worth it. reveman: Uploaded (but not yet published) std::bitset solution to https://codereview.chromium.org/1129253006/ It will use an estimated 6-10 KB less memory and has the same performance for normal use cases as far as I can judge. A bit tricker and more complex though. |