|
|
Created:
6 years ago by sof Modified:
6 years ago CC:
blink-reviews, oilpan-reviews, kouhei+heap_chromium.org, Mads Ager (chromium) Base URL:
svn://svn.chromium.org/blink/trunk Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
DescriptionOilpan: attempt first-fit freelist allocation for backing heaps.
When not being able to bump allocate a vector/hash table's backing storage,
try to reuse an entry from the corresponding sized bin, picking the first.
Should that fail, fall into allocating the largest chunk available.
This freelist allocation scheme has some merit on
shadow_dom:LargeDistributionWithoutLayout, improving Linux performance
by 27%.
R=haraken
BUG=420515
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=187243
Patch Set 1 #Patch Set 2 : compile fix #
Total comments: 5
Patch Set 3 : Share more low-level allocation code #
Total comments: 25
Patch Set 4 : Comments + tidying #
Total comments: 24
Patch Set 5 : variable renamings #Patch Set 6 : rebased #Messages
Total messages: 30 (3 generated)
sigbjornf@opera.com changed reviewers: + oilpan-reviews@chromium.org
Been looking at quicklist allocation performance etc, and split out the first-fit sized bin matching into this CL instead. https://docs.google.com/spreadsheets/d/1U8QeZsVuAzGARDMHSoj0VnUJAJRjqmFg2phAJ... has performance data for both linux and windows. Not quite sure what to think of it; only doing first-fit for the backing heaps pays off on one test (on one platform), but if i extend it to other heaps, there are a couple of mild shadow_dom regressions. So, if it reproduces on other linux boxes and being better on that test really matters, then I suppose it could be considered. I'm a bit worried about fragmentation costs (overall), but haven't found a way to avoid that here without losing out on the perf gain.
haraken@chromium.org changed reviewers: + haraken@chromium.org
> This freelist allocation scheme has some merit on shadow_dom:LargeDistributionWithoutLayout, improving Linux performance by 27%. You did!!!!!!!!!!!!!!! I also confirmed that this CL improves LargeDistributionWithoutLayout by 26% :) I'm curious but what's your theory for the performance improvement? Why does the first-fit benefit collection backing heaps even though it doesn't benefit other heaps? My guess is that it's related to coalescing. In case of collection backings, a lot of promptly freed small blocks are registered to a free list because of coalescing, but those blocks are unlikely to be reused in the worst-fit scheme. On the other hand, the first-fit scheme aggressively uses the promptly freed blocks and thus improves memory reuse. Probably can we just disable coalescing with this CL and how it affects performance? I'm not intending to ask you to do a full investigation to justify this change but just want to understand what is good and get a hint for further optimizations on our allocation scheme. https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:713: Address ThreadHeap<Header>::allocateFromFreeListEntry(FreeListEntry* entry, size_t allocationSize, const GCInfo* gcInfo) It's a bit unfortunate that we have to duplicate the code between ThreadHeap::allocateFromFreeListEntry and ThreadHeap::allocate. Can we somehow merge the implementation? Then we can also get rid of the |result| parameter from allocateFromFreeList. https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:750: // If a larger chunk becomes available for bump allocation, switch. It's not clear to me what this comment means. https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:753: // current allocation area. Ditto. It looks like the following block is just adjusting m_biggestFreeListIndex.
On 2014/12/15 02:10:52, haraken wrote: > > This freelist allocation scheme has some merit on > shadow_dom:LargeDistributionWithoutLayout, improving Linux performance by 27%. > > You did!!!!!!!!!!!!!!! > > I also confirmed that this CL improves LargeDistributionWithoutLayout by 26% :) > > I'm curious but what's your theory for the performance improvement? Why does the > first-fit benefit collection backing heaps even though it doesn't benefit other > heaps? > > My guess is that it's related to coalescing. In case of collection backings, a > lot of promptly freed small blocks are registered to a free list because of > coalescing, but those blocks are unlikely to be reused in the worst-fit scheme. > On the other hand, the first-fit scheme aggressively uses the promptly freed > blocks and thus improves memory reuse. Probably can we just disable coalescing > with this CL and how it affects performance? > > I'm not intending to ask you to do a full investigation to justify this change > but just want to understand what is good and get a hint for further > optimizations on our allocation scheme. keishi-san: Would you possible compare the output of the heap visualizer between with/without the CL for LargeDistributionWithoutLayout and study what's going on there?
I've added a pair of sheets to https://docs.google.com/spreadsheets/d/1U8QeZsVuAzGARDMHSoj0VnUJAJRjqmFg2phAJ... which compares against trunk (no Oilpan), for this CL and without.
On 2014/12/15 02:10:52, haraken wrote: > > This freelist allocation scheme has some merit on > shadow_dom:LargeDistributionWithoutLayout, improving Linux performance by 27%. > > You did!!!!!!!!!!!!!!! > > I also confirmed that this CL improves LargeDistributionWithoutLayout by 26% :) > > I'm curious but what's your theory for the performance improvement? Why does the > first-fit benefit collection backing heaps even though it doesn't benefit other > heaps? > The collection backing store heap that benefits the most is HashTableBackingHeapNonFinalized. Notice that the allocation happens into a block that fits on the freelist, but we do not switch allocation area. My theory is that this enables reuse of recently freed backing stores in a (more) consistent manner than would otherwise happen (i.e., if we do switch allocation area, the win for this sub-micro benchmark evaporates.) At the cost of turning the remaining block into a freelist entry of its own (that's not immediately allocated into, or ever.) Preferring to fall into the 'worst fit' case for the other heaps allows more bump allocations to happen, which co-locates their allocation (and lowers amortized allocation cost.) This appears to matter! > My guess is that it's related to coalescing. In case of collection backings, a > lot of promptly freed small blocks are registered to a free list because of > coalescing, but those blocks are unlikely to be reused in the worst-fit scheme. > On the other hand, the first-fit scheme aggressively uses the promptly freed > blocks and thus improves memory reuse. Probably can we just disable coalescing > with this CL and how it affects performance? > Coalescing of promptly freeds does not have an observable impact here.
On 2014/12/15 09:11:54, sof wrote: > On 2014/12/15 02:10:52, haraken wrote: > > > This freelist allocation scheme has some merit on > > shadow_dom:LargeDistributionWithoutLayout, improving Linux performance by 27%. > > > > You did!!!!!!!!!!!!!!! > > > > I also confirmed that this CL improves LargeDistributionWithoutLayout by 26% > :) > > > > I'm curious but what's your theory for the performance improvement? Why does > the > > first-fit benefit collection backing heaps even though it doesn't benefit > other > > heaps? > > > > The collection backing store heap that benefits the most is > HashTableBackingHeapNonFinalized. Notice that the allocation happens into a > block that fits on the freelist, but we do not switch allocation area. My theory > is that this enables reuse of recently freed backing stores in a (more) > consistent manner than would otherwise happen (i.e., if we do switch allocation > area, the win for this sub-micro benchmark evaporates.) At the cost of turning > the remaining block into a freelist entry of its own (that's not immediately > allocated into, or ever.) ah, that's plausible. Great theory :) keishi-san: Probably does this optimization make it useless to implement expandBuffer for HashTable backings? > Preferring to fall into the 'worst fit' case for the other heaps allows more > bump allocations to happen, which co-locates their allocation (and lowers > amortized allocation cost.) This appears to matter! This makes sense too.
On 2014/12/15 09:32:52, haraken wrote: > On 2014/12/15 09:11:54, sof wrote: > > On 2014/12/15 02:10:52, haraken wrote: > > > > This freelist allocation scheme has some merit on > > > shadow_dom:LargeDistributionWithoutLayout, improving Linux performance by > 27%. > > > > > > You did!!!!!!!!!!!!!!! > > > > > > I also confirmed that this CL improves LargeDistributionWithoutLayout by 26% > > :) > > > > > > I'm curious but what's your theory for the performance improvement? Why does > > the > > > first-fit benefit collection backing heaps even though it doesn't benefit > > other > > > heaps? > > > > > > > The collection backing store heap that benefits the most is > > HashTableBackingHeapNonFinalized. Notice that the allocation happens into a > > block that fits on the freelist, but we do not switch allocation area. My > theory > > is that this enables reuse of recently freed backing stores in a (more) > > consistent manner than would otherwise happen (i.e., if we do switch > allocation > > area, the win for this sub-micro benchmark evaporates.) At the cost of turning > > the remaining block into a freelist entry of its own (that's not immediately > > allocated into, or ever.) > > ah, that's plausible. Great theory :) > The last bit is a source of worry. I do suspect we need to address fragmentation for Oilpan so as to be able to handle workloads that run for an extended period of time. Keeping it under control by having quicklists or other schemes that more often reuses freelist entries only goes so far. And at the cost of regressing performance tests, it appears. But just current thinking; more study of behavior needed. (A real eye opener btw that linux behavior is so sensitive - almost no difference in perf seen on windows.)
https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:713: Address ThreadHeap<Header>::allocateFromFreeListEntry(FreeListEntry* entry, size_t allocationSize, const GCInfo* gcInfo) On 2014/12/15 02:10:52, haraken wrote: > > It's a bit unfortunate that we have to duplicate the code between > ThreadHeap::allocateFromFreeListEntry and ThreadHeap::allocate. Can we somehow > merge the implementation? Then we can also get rid of the |result| parameter > from allocateFromFreeList. > Done, moving things around. Not the best of names for some local helpers, but best I can think of right now. https://codereview.chromium.org/798293002/diff/20001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:750: // If a larger chunk becomes available for bump allocation, switch. On 2014/12/15 02:10:52, haraken wrote: > > It's not clear to me what this comment means. Sorry; various strategies were experimented with, but forgot to sync comments in the end :) Tidied up now.
Mostly looks good. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:67: #define PREFETCH(ptr) _mm_prefetch((char*)(ptr), _MM_HINT_T0) I think we should add these macros to wtf/, or at least ThreadState.h. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:715: Address ThreadHeap<Header>::allocateFromFreeListEntry(FreeListEntry* entry, size_t allocationSize, const GCInfo* gcInfo) I'd prefer inline this method in the (only one) call site. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:724: static bool useFirstFitForHeap(int heapIndex) useFirstFitForHeap => shouldUseBestFit ? https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:726: if (heapIndex >= VectorBackingHeap && heapIndex <= HashTableBackingHeap) Add a comment about why we should use a best-fit scheme for these backing storages. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:735: Address ThreadHeap<Header>::allocateFromFreeList(size_t allocationSize, const GCInfo* gcInfo) Let's add a comment about the allocation scheme. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:737: int bucket = FreeList<Header>::bucketIndexForSize(allocationSize) + 1; Nit: I'm not sure if this matters for performance, but the current way of calculating bucketIndexForSize is slow. I think we should replace the implementation with a smarter bit-wise operation (if it matters for performance). allocateFromFreeList is in a hot call path. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:740: PREFETCH(entry->address()); Does this matter for performance? - If no, let's remove it. - If yes, you might also want to insert more PREFETCHes to the marking phase (as you experimented before) in a follow-up CL. (I'd prefer introducing PREFETCH in a separate CL either way.) https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:744: int maxIndex = m_freeList.m_biggestFreeListIndex; int maxIndex = m_freeList.m_biggestFreeListIndex - 1; https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without disturbing the current allocation area. I do agree that the best fit scheme works fine for collection backings, but what's the point of not disturbing the current allocation point? What happens if we just adjust the current allocation point to the best-fit position? Intuitively, if we want to reuse as much promptly freed area of hash table backings as possible, I guess we should adjust the current allocation point to the best-fit slot, so that we can fall back to outOfLineAllocate more frequently and thus have more chances to allocate the backings on the best-fit positions. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:755: int i = m_freeList.m_biggestFreeListIndex; i => index
https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:715: Address ThreadHeap<Header>::allocateFromFreeListEntry(FreeListEntry* entry, size_t allocationSize, const GCInfo* gcInfo) On 2014/12/15 15:42:55, haraken wrote: > > I'd prefer inline this method in the (only one) call site. I'd hope to reuse it for quicklists, but why not for now. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:737: int bucket = FreeList<Header>::bucketIndexForSize(allocationSize) + 1; On 2014/12/15 15:42:55, haraken wrote: > > Nit: I'm not sure if this matters for performance, but the current way of > calculating bucketIndexForSize is slow. I think we should replace the > implementation with a smarter bit-wise operation (if it matters for > performance). allocateFromFreeList is in a hot call path. Thought you said last week that outOfLineAllocate() wasn't a perf concern? https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without disturbing the current allocation area. On 2014/12/15 15:42:56, haraken wrote: > > I do agree that the best fit scheme works fine for collection backings, but > what's the point of not disturbing the current allocation point? What happens if > we just adjust the current allocation point to the best-fit position? > > Intuitively, if we want to reuse as much promptly freed area of hash table > backings as possible, I guess we should adjust the current allocation point to > the best-fit slot, so that we can fall back to outOfLineAllocate more frequently > and thus have more chances to allocate the backings on the best-fit positions. What if you have a fairly big slab left, but you switch to a smaller one by changing -- are you better off? (And I did try to come up with heuristics for when it makes sense to switch, without finding any.)
https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:737: int bucket = FreeList<Header>::bucketIndexForSize(allocationSize) + 1; On 2014/12/15 16:25:11, sof wrote: > On 2014/12/15 15:42:55, haraken wrote: > > > > Nit: I'm not sure if this matters for performance, but the current way of > > calculating bucketIndexForSize is slow. I think we should replace the > > implementation with a smarter bit-wise operation (if it matters for > > performance). allocateFromFreeList is in a hot call path. > > Thought you said last week that outOfLineAllocate() wasn't a perf concern? Right, so it wouldn't be worth doing. Ignore my comment. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without disturbing the current allocation area. On 2014/12/15 16:25:11, sof wrote: > On 2014/12/15 15:42:56, haraken wrote: > > > > I do agree that the best fit scheme works fine for collection backings, but > > what's the point of not disturbing the current allocation point? What happens > if > > we just adjust the current allocation point to the best-fit position? > > > > Intuitively, if we want to reuse as much promptly freed area of hash table > > backings as possible, I guess we should adjust the current allocation point to > > the best-fit slot, so that we can fall back to outOfLineAllocate more > frequently > > and thus have more chances to allocate the backings on the best-fit positions. > > What if you have a fairly big slab left, but you switch to a smaller one by > changing -- are you better off? > > (And I did try to come up with heuristics for when it makes sense to switch, > without finding any.) Hmm, I'm getting a bit confused. You're at this point because you've eaten up the remaining size of the current allocation area, so at this point there should be no big area left in the current allocation area, right? Then it seems it doesn't really matter whether we should switch the current allocation point to the best fit area or not in most cases.
https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without disturbing the current allocation area. On 2014/12/15 16:38:18, haraken wrote: > On 2014/12/15 16:25:11, sof wrote: > > On 2014/12/15 15:42:56, haraken wrote: > > > > > > I do agree that the best fit scheme works fine for collection backings, but > > > what's the point of not disturbing the current allocation point? What > happens > > if > > > we just adjust the current allocation point to the best-fit position? > > > > > > Intuitively, if we want to reuse as much promptly freed area of hash table > > > backings as possible, I guess we should adjust the current allocation point > to > > > the best-fit slot, so that we can fall back to outOfLineAllocate more > > frequently > > > and thus have more chances to allocate the backings on the best-fit > positions. > > > > What if you have a fairly big slab left, but you switch to a smaller one by > > changing -- are you better off? > > > > (And I did try to come up with heuristics for when it makes sense to switch, > > without finding any.) > > Hmm, I'm getting a bit confused. > > You're at this point because you've eaten up the remaining size of the current > allocation area, so at this point there should be no big area left in the > current allocation area, right? Then it seems it doesn't really matter whether > we should switch the current allocation point to the best fit area or not in > most cases. It does matter in this case. What if the backing store allocation is larger than most in a sequence, or that you get to reuse another freelist entry for the next allocation by not switching? I don't have all the answers here about why this particularly helps performance on this test.
https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without disturbing the current allocation area. On 2014/12/15 16:52:52, sof wrote: > On 2014/12/15 16:38:18, haraken wrote: > > On 2014/12/15 16:25:11, sof wrote: > > > On 2014/12/15 15:42:56, haraken wrote: > > > > > > > > I do agree that the best fit scheme works fine for collection backings, > but > > > > what's the point of not disturbing the current allocation point? What > > happens > > > if > > > > we just adjust the current allocation point to the best-fit position? > > > > > > > > Intuitively, if we want to reuse as much promptly freed area of hash table > > > > backings as possible, I guess we should adjust the current allocation > point > > to > > > > the best-fit slot, so that we can fall back to outOfLineAllocate more > > > frequently > > > > and thus have more chances to allocate the backings on the best-fit > > positions. > > > > > > What if you have a fairly big slab left, but you switch to a smaller one by > > > changing -- are you better off? > > > > > > (And I did try to come up with heuristics for when it makes sense to switch, > > > without finding any.) > > > > Hmm, I'm getting a bit confused. > > > > You're at this point because you've eaten up the remaining size of the current > > allocation area, so at this point there should be no big area left in the > > current allocation area, right? Then it seems it doesn't really matter whether > > we should switch the current allocation point to the best fit area or not in > > most cases. > > It does matter in this case. What if the backing store allocation is larger than > most in a sequence, or that you get to reuse another freelist entry for the next > allocation by not switching? I don't have all the answers here about why this > particularly helps performance on this test. Thanks, I'm 80% convinced :) Either way let's land this CL and iterate from there. I'll take another close look tomorrow.
On 2014/12/15 17:03:32, haraken wrote: > https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... > File Source/platform/heap/Heap.cpp (right): > > https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... > Source/platform/heap/Heap.cpp:748: // Allocate into the freelist block without > disturbing the current allocation area. > On 2014/12/15 16:52:52, sof wrote: > > On 2014/12/15 16:38:18, haraken wrote: > > > On 2014/12/15 16:25:11, sof wrote: > > > > On 2014/12/15 15:42:56, haraken wrote: > > > > > > > > > > I do agree that the best fit scheme works fine for collection backings, > > but > > > > > what's the point of not disturbing the current allocation point? What > > > happens > > > > if > > > > > we just adjust the current allocation point to the best-fit position? > > > > > > > > > > Intuitively, if we want to reuse as much promptly freed area of hash > table > > > > > backings as possible, I guess we should adjust the current allocation > > point > > > to > > > > > the best-fit slot, so that we can fall back to outOfLineAllocate more > > > > frequently > > > > > and thus have more chances to allocate the backings on the best-fit > > > positions. > > > > > > > > What if you have a fairly big slab left, but you switch to a smaller one > by > > > > changing -- are you better off? > > > > > > > > (And I did try to come up with heuristics for when it makes sense to > switch, > > > > without finding any.) > > > > > > Hmm, I'm getting a bit confused. > > > > > > You're at this point because you've eaten up the remaining size of the > current > > > allocation area, so at this point there should be no big area left in the > > > current allocation area, right? Then it seems it doesn't really matter > whether > > > we should switch the current allocation point to the best fit area or not in > > > most cases. > > > > It does matter in this case. What if the backing store allocation is larger > than > > most in a sequence, or that you get to reuse another freelist entry for the > next > > allocation by not switching? I don't have all the answers here about why this > > particularly helps performance on this test. > > Thanks, I'm 80% convinced :) Either way let's land this CL and iterate from > there. I'll take another close look tomorrow. Sounds good, thanks. I'm in two minds about it as well, but let's get it all ready in any case.
https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:715: Address ThreadHeap<Header>::allocateFromFreeListEntry(FreeListEntry* entry, size_t allocationSize, const GCInfo* gcInfo) On 2014/12/15 15:42:55, haraken wrote: > > I'd prefer inline this method in the (only one) call site. Done. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:724: static bool useFirstFitForHeap(int heapIndex) On 2014/12/15 15:42:55, haraken wrote: > > useFirstFitForHeap => shouldUseBestFit ? shouldUseFirstFitForHeap() https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:726: if (heapIndex >= VectorBackingHeap && heapIndex <= HashTableBackingHeap) On 2014/12/15 15:42:55, haraken wrote: > > Add a comment about why we should use a best-fit scheme for these backing > storages. Done. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:735: Address ThreadHeap<Header>::allocateFromFreeList(size_t allocationSize, const GCInfo* gcInfo) On 2014/12/15 15:42:56, haraken wrote: > > Let's add a comment about the allocation scheme. Done. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:740: PREFETCH(entry->address()); On 2014/12/15 15:42:55, haraken wrote: > > Does this matter for performance? > > - If no, let's remove it. > - If yes, you might also want to insert more PREFETCHes to the marking phase (as > you experimented before) in a follow-up CL. > > (I'd prefer introducing PREFETCH in a separate CL either way.) It improves perf by 0.5-1% on a handful of shadow_dom tests, we can handle prefetching separately. I haven't thought about how prefetching could be utilized with eager marking - prefetching the vtbl? https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:744: int maxIndex = m_freeList.m_biggestFreeListIndex; On 2014/12/15 15:42:56, haraken wrote: > > int maxIndex = m_freeList.m_biggestFreeListIndex - 1; Done. https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:755: int i = m_freeList.m_biggestFreeListIndex; On 2014/12/15 15:42:56, haraken wrote: > > i => index Done.
LGTM https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/40001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:740: PREFETCH(entry->address()); On 2014/12/15 21:36:40, sof wrote: > On 2014/12/15 15:42:55, haraken wrote: > > > > Does this matter for performance? > > > > - If no, let's remove it. > > - If yes, you might also want to insert more PREFETCHes to the marking phase > (as > > you experimented before) in a follow-up CL. > > > > (I'd prefer introducing PREFETCH in a separate CL either way.) > > It improves perf by 0.5-1% on a handful of shadow_dom tests, we can handle > prefetching separately. > > I haven't thought about how prefetching could be utilized with eager marking - > prefetching the vtbl? Ah, it wouldn't be trivial to utilize prefetching in eager tracing. That said, I think we should look for more opportunity of prefetching. Probably would it be helpful for sweeping? https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) Isn't this "best fit", not "first fit"? https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:717: // first-fit matching. Shall we also mention that this optimization makes sense only for backing storages where coalescing is supported? I think coalescing is a key. If coalescing is not supported, we cannot get any freed block until a GC is triggered and thus this optimization wouldn't be helpful. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:723: // heap types. We are currently seeking an understanding of why; I guess the reason this optimization doesn't help normal heaps is that normal heaps don't support coalescing. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:727: if (heapIndex >= VectorBackingHeap && heapIndex <= HashTableBackingHeap) For safety, I'd prefer: heapIndex == VectorBackingHeap || heapIndex == HashTableBackingHeap to exactly match what we want to match. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:729: if (heapIndex >= VectorBackingHeapNonFinalized && heapIndex <= HashTableBackingHeapNonFinalized) Ditto. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:755: int bucket = FreeList<Header>::bucketIndexForSize(allocationSize) + 1; bucket => index https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:757: if (FreeListEntry* entry = m_freeList.m_freeLists[bucket]) { Don't we want to look up the entry in m_freeLists[x] where x is larger than bucket? I mean: for (int index = bucket; index < blinkPageSizeLog2; ++index) { if (FreeListEntry* entry = m_freeList.m_freeLists[index]) { entry->unlink(&m_freeList.m_freeLists[index]); ...; break; } } https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:763: m_freeList.m_biggestFreeListIndex = maxIndex < 0 ? 0 : maxIndex; The calculation of m_biggestFreeListIndex is getting complicated. We might want to have an ASSERT somewhere to verify that m_biggestFreeListIndex is actually pointing to the max index. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:777: for (; index > 0; index--, bucketSize >>= 1) { --index https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Heap.h File Source/platform/heap/Heap.h (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.h:1277: Unnecessary empty line.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 02:26:33, haraken wrote: > > Isn't this "best fit", not "first fit"? Not as I know the terms -- best fit would search that bin for the optimal block. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:717: // first-fit matching. On 2014/12/16 02:26:33, haraken wrote: > > Shall we also mention that this optimization makes sense only for backing > storages where coalescing is supported? > > I think coalescing is a key. If coalescing is not supported, we cannot get any > freed block until a GC is triggered and thus this optimization wouldn't be > helpful. > Coalescing is not the key -- as I pointed out yesterday, it doesn't make a difference. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:723: // heap types. We are currently seeking an understanding of why; On 2014/12/16 02:26:33, haraken wrote: > > I guess the reason this optimization doesn't help normal heaps is that normal > heaps don't support coalescing. They do coalesce when sweeping though. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:727: if (heapIndex >= VectorBackingHeap && heapIndex <= HashTableBackingHeap) On 2014/12/16 02:26:33, haraken wrote: > > For safety, I'd prefer: > > heapIndex == VectorBackingHeap || heapIndex == HashTableBackingHeap > > to exactly match what we want to match. And also the inline heap case -- this seems verbose (and a trifling detail.) https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:755: int bucket = FreeList<Header>::bucketIndexForSize(allocationSize) + 1; On 2014/12/16 02:26:33, haraken wrote: > > bucket => index No, sorry - this is a bucket, not an index. Or do you want bucketIndex? https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:757: if (FreeListEntry* entry = m_freeList.m_freeLists[bucket]) { On 2014/12/16 02:26:33, haraken wrote: > > Don't we want to look up the entry in m_freeLists[x] where x is larger than > bucket? I mean: > > for (int index = bucket; index < blinkPageSizeLog2; ++index) { > if (FreeListEntry* entry = m_freeList.m_freeLists[index]) { > entry->unlink(&m_freeList.m_freeLists[index]); > ...; > break; > } > } That's been checked as regressing performance earlier.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 07:31:54, sof wrote: > On 2014/12/16 02:26:33, haraken wrote: > > > > Isn't this "best fit", not "first fit"? > > Not as I know the terms -- best fit would search that bin for the optimal block. (I'm not intending to bike-shed terms but) as far as I understand, what we are doing in allocateFromFreeList is a choice between approximately-best-fit or approximately-worst-fit. First-fit sounds like a scheme that allocates on a slot that is found first by accident (without maintaining a list of power-of-2 free slots). https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:717: // first-fit matching. On 2014/12/16 07:31:54, sof wrote: > On 2014/12/16 02:26:33, haraken wrote: > > > > Shall we also mention that this optimization makes sense only for backing > > storages where coalescing is supported? > > > > I think coalescing is a key. If coalescing is not supported, we cannot get any > > freed block until a GC is triggered and thus this optimization wouldn't be > > helpful. > > > > Coalescing is not the key -- as I pointed out yesterday, it doesn't make a > difference. Hmm. Then probably the situation is like this? Fact 1: Worst-fit is better than best-fit in terms of allocation overhead. Fact 2: Best-fit is better than worst-fit in terms of memory reuse. Therefore, a smart allocation scheme is to allocate large objects in best-fit and small objects in worst-fit. If this is the case, what we should do is to determine best-fit or worst-fit depending on allocation size, not depending on whether it is a collection backing heap or not. (I guess there are a lot of large allocations in collection backing heaps but there are few large allocations in normal heaps. That's why this optimization makes sense only for collection backing heaps in most cases.) Just 2 cents. Either way, I'm fine with landing this CL and investigate a better scheme later.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 08:26:14, haraken wrote: > On 2014/12/16 07:31:54, sof wrote: > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > Isn't this "best fit", not "first fit"? > > > > Not as I know the terms -- best fit would search that bin for the optimal > block. > > (I'm not intending to bike-shed terms but) as far as I understand, what we are > doing in allocateFromFreeList is a choice between approximately-best-fit or > approximately-worst-fit. First-fit sounds like a scheme that allocates on a slot > that is found first by accident (without maintaining a list of power-of-2 free > slots). It performs first fit within segregated bins.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:717: // first-fit matching. On 2014/12/16 08:26:14, haraken wrote: > On 2014/12/16 07:31:54, sof wrote: > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > Shall we also mention that this optimization makes sense only for backing > > > storages where coalescing is supported? > > > > > > I think coalescing is a key. If coalescing is not supported, we cannot get > any > > > freed block until a GC is triggered and thus this optimization wouldn't be > > > helpful. > > > > > > > Coalescing is not the key -- as I pointed out yesterday, it doesn't make a > > difference. > > Hmm. Then probably the situation is like this? > > Fact 1: Worst-fit is better than best-fit in terms of allocation overhead. > > Fact 2: Best-fit is better than worst-fit in terms of memory reuse. > > Therefore, a smart allocation scheme is to allocate large objects in best-fit > and small objects in worst-fit. > > If this is the case, what we should do is to determine best-fit or worst-fit > depending on allocation size, not depending on whether it is a collection > backing heap or not. (I guess there are a lot of large allocations in collection > backing heaps but there are few large allocations in normal heaps. That's why > this optimization makes sense only for collection backing heaps in most cases.) > Just 2 cents. > > Either way, I'm fine with landing this CL and investigate a better scheme later. > As I said earlier, doing a decision based on size doesn't work out. (I'm very sorry, but it's frustrating and off-putting when followups aren't listened to.)
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 08:32:15, sof wrote: > On 2014/12/16 08:26:14, haraken wrote: > > On 2014/12/16 07:31:54, sof wrote: > > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > > > Isn't this "best fit", not "first fit"? > > > > > > Not as I know the terms -- best fit would search that bin for the optimal > > block. > > > > (I'm not intending to bike-shed terms but) as far as I understand, what we are > > doing in allocateFromFreeList is a choice between approximately-best-fit or > > approximately-worst-fit. First-fit sounds like a scheme that allocates on a > slot > > that is found first by accident (without maintaining a list of power-of-2 free > > slots). > > It performs first fit within segregated bins. Then our worst-fit is also working as a first-fit within segregated bins, so there is no difference... The difference is from which segregated bin we pick up a free list. Best-fit segregated bin or worst-fit segregated bin. That's why I thought best-fit or worst-fit reflects the scheme. https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:717: // first-fit matching. On 2014/12/16 08:36:18, sof wrote: > On 2014/12/16 08:26:14, haraken wrote: > > On 2014/12/16 07:31:54, sof wrote: > > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > > > Shall we also mention that this optimization makes sense only for backing > > > > storages where coalescing is supported? > > > > > > > > I think coalescing is a key. If coalescing is not supported, we cannot get > > any > > > > freed block until a GC is triggered and thus this optimization wouldn't be > > > > helpful. > > > > > > > > > > Coalescing is not the key -- as I pointed out yesterday, it doesn't make a > > > difference. > > > > Hmm. Then probably the situation is like this? > > > > Fact 1: Worst-fit is better than best-fit in terms of allocation overhead. > > > > Fact 2: Best-fit is better than worst-fit in terms of memory reuse. > > > > Therefore, a smart allocation scheme is to allocate large objects in best-fit > > and small objects in worst-fit. > > > > If this is the case, what we should do is to determine best-fit or worst-fit > > depending on allocation size, not depending on whether it is a collection > > backing heap or not. (I guess there are a lot of large allocations in > collection > > backing heaps but there are few large allocations in normal heaps. That's why > > this optimization makes sense only for collection backing heaps in most > cases.) > > Just 2 cents. > > > > Either way, I'm fine with landing this CL and investigate a better scheme > later. > > > > As I said earlier, doing a decision based on size doesn't work out. (I'm very > sorry, but it's frustrating and off-putting when followups aren't listened to.) I'm sorry about it... Then the situation is that we don't know why this allocation scheme is working well.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 08:51:19, haraken wrote: > On 2014/12/16 08:32:15, sof wrote: > > On 2014/12/16 08:26:14, haraken wrote: > > > On 2014/12/16 07:31:54, sof wrote: > > > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > > > > > Isn't this "best fit", not "first fit"? > > > > > > > > Not as I know the terms -- best fit would search that bin for the optimal > > > block. > > > > > > (I'm not intending to bike-shed terms but) as far as I understand, what we > are > > > doing in allocateFromFreeList is a choice between approximately-best-fit or > > > approximately-worst-fit. First-fit sounds like a scheme that allocates on a > > slot > > > that is found first by accident (without maintaining a list of power-of-2 > free > > > slots). > > > > It performs first fit within segregated bins. > > Then our worst-fit is also working as a first-fit within segregated bins, so > there is no difference... The difference is from which segregated bin we pick up > a free list. Best-fit segregated bin or worst-fit segregated bin. That's why I > thought best-fit or worst-fit reflects the scheme. > It's performing first-fit within the worst bin in the normal case. I haven't used the worst-fit term, i think.
https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... File Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/798293002/diff/60001/Source/platform/heap/Hea... Source/platform/heap/Heap.cpp:707: static bool shouldUseFirstFitForHeap(int heapIndex) On 2014/12/16 08:54:22, sof wrote: > On 2014/12/16 08:51:19, haraken wrote: > > On 2014/12/16 08:32:15, sof wrote: > > > On 2014/12/16 08:26:14, haraken wrote: > > > > On 2014/12/16 07:31:54, sof wrote: > > > > > On 2014/12/16 02:26:33, haraken wrote: > > > > > > > > > > > > Isn't this "best fit", not "first fit"? > > > > > > > > > > Not as I know the terms -- best fit would search that bin for the > optimal > > > > block. > > > > > > > > (I'm not intending to bike-shed terms but) as far as I understand, what we > > are > > > > doing in allocateFromFreeList is a choice between approximately-best-fit > or > > > > approximately-worst-fit. First-fit sounds like a scheme that allocates on > a > > > slot > > > > that is found first by accident (without maintaining a list of power-of-2 > > free > > > > slots). > > > > > > It performs first fit within segregated bins. > > > > Then our worst-fit is also working as a first-fit within segregated bins, so > > there is no difference... The difference is from which segregated bin we pick > up > > a free list. Best-fit segregated bin or worst-fit segregated bin. That's why I > > thought best-fit or worst-fit reflects the scheme. > > > > It's performing first-fit within the worst bin in the normal case. That's why I think shouldUseFirstFit doesn't capture what it is doing. In your definition, we're doing first-fit in both cases. So I suggested to use best-fit/worst-fit as terms to distinguish allocation schemes between collection backing heaps and other heaps. That said, I don't really care about it since you have a very good comment here.
The CQ bit was checked by sigbjornf@opera.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/798293002/100001
Message was sent while issue was closed.
Committed patchset #6 (id:100001) as https://src.chromium.org/viewvc/blink?view=rev&revision=187243
Message was sent while issue was closed.
Perfbot confirms improvement, https://chromeperf.appspot.com/report?masters=ChromiumPerf&bots=linux-oilpan-...
Message was sent while issue was closed.
On 2014/12/16 19:53:31, sof wrote: > Perfbot confirms improvement, > > > https://chromeperf.appspot.com/report?masters=ChromiumPerf&bots=linux-oilpan-... Great! |