|
|
Created:
4 years, 10 months ago by haraken Modified:
4 years, 10 months ago CC:
chromium-reviews, oilpan-reviews, Mads Ager (chromium), blink-reviews, kinuko+watch, kouhei+heap_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionOilpan: Decommit backing storage of CallbackStacks
CallbackStacks are used only while Oilpan's GC is doing marking & weak processing.
However, currently each CallbackStack continues retaining one Block forever.
This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread.
Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads).
When I start Chrome's new tab page, it creates 18 CallbackStacks,
meaning that it wastes 2.3 MB of memory.
This CL removes the waste by discarding system pages of the Block after finishing every GC phase.
BUG=
Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a
Cr-Commit-Position: refs/heads/master@{#374674}
Committed: https://crrev.com/494e3b6b092a89e8987781e9fb4412f24c0f4bd8
Cr-Commit-Position: refs/heads/master@{#377575}
Patch Set 1 #Patch Set 2 : #Patch Set 3 : #Patch Set 4 : #Patch Set 5 : #Patch Set 6 : #
Total comments: 2
Patch Set 7 : #
Total comments: 6
Patch Set 8 : #Patch Set 9 : #
Messages
Total messages: 68 (15 generated)
haraken@chromium.org changed reviewers: + keishi@chromium.org
PTAL
sigbjornf@opera.com changed reviewers: + sigbjornf@opera.com
what if this allocation fails when under memory pressure?
On 2016/02/10 08:34:54, sof wrote: > what if this allocation fails when under memory pressure? That's a valid concern conceptually, but in practice memory pressure listeners for renderers are not yet working well in the first place. So I might want to see crash reports if it becomes a real problem before worrying about it too much.
On 2016/02/10 08:39:28, haraken wrote: > On 2016/02/10 08:34:54, sof wrote: > > what if this allocation fails when under memory pressure? > > That's a valid concern conceptually, but in practice memory pressure listeners > for renderers are not yet working well in the first place. So I might want to > see crash reports if it becomes a real problem before worrying about it too > much. We cannot fail doing a GC; to me this is an upfront allocation cost to Oilpan. (Tuning the limits used were attempted some time ago, https://codereview.chromium.org/1115993003/ )
On 2016/02/10 08:47:48, sof wrote: > On 2016/02/10 08:39:28, haraken wrote: > > On 2016/02/10 08:34:54, sof wrote: > > > what if this allocation fails when under memory pressure? > > > > That's a valid concern conceptually, but in practice memory pressure listeners > > for renderers are not yet working well in the first place. So I might want to > > see crash reports if it becomes a real problem before worrying about it too > > much. > > We cannot fail doing a GC; to me this is an upfront allocation cost to Oilpan. > > (Tuning the limits used were attempted some time ago, > https://codereview.chromium.org/1115993003/ ) The 640 KB is one of the top memory consumers in Blink, so we need take some action here. I understand your point, but I think it's a bit too premature to worry about the allocation failure at this point -- there are other scenarios where allocation may fail during a GC. For example, the second or later Block allocation may fail.
Why are the 5 stacks? Is there a design doc that explains this? I'm with haraken@ though, I don't think adding half a MB to every renderer for Oilpan is acceptable. :/
On 2016/02/10 09:04:53, esprehn wrote: > Why are the 5 stacks? Is there a design doc that explains this? 1) Normal stack for making objects. 2) Stack for making objects that must be marked at the very end of the marking phase. 3) Stack for process-wide weak processing. 4) Stack for ephemerons. 5) Per-thread stack for thread-local weak processing. (This stack exists per thread.) 2) is rarely used. 3) will be gone once we implement per-thread heaps. Either way, I don't think the memory usage is a concern as long as the stack is allocated only during GCs. (It's worth considering reducing the stack size for 2), 4) and 5).) > I'm with haraken@ though, I don't think adding half a MB to every renderer for > Oilpan is acceptable. :/
On 2016/02/10 08:56:40, haraken wrote: > On 2016/02/10 08:47:48, sof wrote: > > On 2016/02/10 08:39:28, haraken wrote: > > > On 2016/02/10 08:34:54, sof wrote: > > > > what if this allocation fails when under memory pressure? > > > > > > That's a valid concern conceptually, but in practice memory pressure > listeners > > > for renderers are not yet working well in the first place. So I might want > to > > > see crash reports if it becomes a real problem before worrying about it too > > > much. > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to Oilpan. > > > > (Tuning the limits used were attempted some time ago, > > https://codereview.chromium.org/1115993003/ ) > > The 640 KB is one of the top memory consumers in Blink, so we need take some > action here. > > I understand your point, but I think it's a bit too premature to worry about the > allocation failure at this point -- there are other scenarios where allocation > may fail during a GC. For example, the second or later Block allocation may > fail. Given the background of having released and worked on browsers current & past with GCs, it is not premature. What problem are you really trying to solve? Having 640k of a 32-bit(?) address space taken is not that considerable.
On 2016/02/10 09:54:06, sof wrote: > On 2016/02/10 08:56:40, haraken wrote: > > On 2016/02/10 08:47:48, sof wrote: > > > On 2016/02/10 08:39:28, haraken wrote: > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > what if this allocation fails when under memory pressure? > > > > > > > > That's a valid concern conceptually, but in practice memory pressure > > listeners > > > > for renderers are not yet working well in the first place. So I might want > > to > > > > see crash reports if it becomes a real problem before worrying about it > too > > > > much. > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to > Oilpan. > > > > > > (Tuning the limits used were attempted some time ago, > > > https://codereview.chromium.org/1115993003/ ) > > > > The 640 KB is one of the top memory consumers in Blink, so we need take some > > action here. > > > > I understand your point, but I think it's a bit too premature to worry about > the > > allocation failure at this point -- there are other scenarios where allocation > > may fail during a GC. For example, the second or later Block allocation may > > fail. > > Given the background of having released and worked on browsers current & past > with GCs, it is not premature. I'm saying it's premature because we haven't yet handled memory pressure notifications in renderers. So we don't yet know if OOM during a memory pressure GC is a real problem or not. I'd be more interested in trying this CL and seeing how much Oilpan crashes when allocating the CallbackStacks. Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is too much. Oilpan should decommit more memory after finishing the marking phase. (c.f., V8 decommits a marking dequeue after finishing the marking phase, reserving 256 KB: https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > What problem are you really trying to solve? Having 640k of a 32-bit(?) address > space taken is not that considerable. My concern is the physical memory behind it. If Oilpan releases the 640 KB, we can let other components use the 640 KB without causing page swapping.
On 2016/02/10 11:55:50, haraken wrote: > On 2016/02/10 09:54:06, sof wrote: > > On 2016/02/10 08:56:40, haraken wrote: > > > On 2016/02/10 08:47:48, sof wrote: > > > > On 2016/02/10 08:39:28, haraken wrote: > > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > > what if this allocation fails when under memory pressure? > > > > > > > > > > That's a valid concern conceptually, but in practice memory pressure > > > listeners > > > > > for renderers are not yet working well in the first place. So I might > want > > > to > > > > > see crash reports if it becomes a real problem before worrying about it > > too > > > > > much. > > > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to > > Oilpan. > > > > > > > > (Tuning the limits used were attempted some time ago, > > > > https://codereview.chromium.org/1115993003/ ) > > > > > > The 640 KB is one of the top memory consumers in Blink, so we need take some > > > action here. > > > > > > I understand your point, but I think it's a bit too premature to worry about > > the > > > allocation failure at this point -- there are other scenarios where > allocation > > > may fail during a GC. For example, the second or later Block allocation may > > > fail. > > > > Given the background of having released and worked on browsers current & past > > with GCs, it is not premature. > > I'm saying it's premature because we haven't yet handled memory pressure > notifications in renderers. So we don't yet know if OOM during a memory pressure > GC is a real problem or not. I'd be more interested in trying this CL and seeing > how much Oilpan crashes when allocating the CallbackStacks. > > Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is too > much. Oilpan should decommit more memory after finishing the marking phase. > (c.f., V8 decommits a marking dequeue after finishing the marking phase, > reserving 256 KB: > https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > > > > What problem are you really trying to solve? Having 640k of a 32-bit(?) > address > > space taken is not that considerable. > > My concern is the physical memory behind it. If Oilpan releases the 640 KB, we > can let other components use the 640 KB without causing page swapping. Hinting that the stack area is unused post GC (and decommit it), would be a good idea. Let's do that instead?
When I start Chrome, 36 CallbackStacks are created. When I navigate to twitter, (some of the 36 Callbacks are destroyed and) another 18 CallbackStacks are created.
On 2016/02/10 12:04:11, sof wrote: > On 2016/02/10 11:55:50, haraken wrote: > > On 2016/02/10 09:54:06, sof wrote: > > > On 2016/02/10 08:56:40, haraken wrote: > > > > On 2016/02/10 08:47:48, sof wrote: > > > > > On 2016/02/10 08:39:28, haraken wrote: > > > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > > > what if this allocation fails when under memory pressure? > > > > > > > > > > > > That's a valid concern conceptually, but in practice memory pressure > > > > listeners > > > > > > for renderers are not yet working well in the first place. So I might > > want > > > > to > > > > > > see crash reports if it becomes a real problem before worrying about > it > > > too > > > > > > much. > > > > > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to > > > Oilpan. > > > > > > > > > > (Tuning the limits used were attempted some time ago, > > > > > https://codereview.chromium.org/1115993003/ ) > > > > > > > > The 640 KB is one of the top memory consumers in Blink, so we need take > some > > > > action here. > > > > > > > > I understand your point, but I think it's a bit too premature to worry > about > > > the > > > > allocation failure at this point -- there are other scenarios where > > allocation > > > > may fail during a GC. For example, the second or later Block allocation > may > > > > fail. > > > > > > Given the background of having released and worked on browsers current & > past > > > with GCs, it is not premature. > > > > I'm saying it's premature because we haven't yet handled memory pressure > > notifications in renderers. So we don't yet know if OOM during a memory > pressure > > GC is a real problem or not. I'd be more interested in trying this CL and > seeing > > how much Oilpan crashes when allocating the CallbackStacks. > > > > Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is too > > much. Oilpan should decommit more memory after finishing the marking phase. > > (c.f., V8 decommits a marking dequeue after finishing the marking phase, > > reserving 256 KB: > > > https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > > > > > > > What problem are you really trying to solve? Having 640k of a 32-bit(?) > > address > > > space taken is not that considerable. > > > > My concern is the physical memory behind it. If Oilpan releases the 640 KB, we > > can let other components use the 640 KB without causing page swapping. > > Hinting that the stack area is unused post GC (and decommit it), would be a good > idea. Let's do that instead? We could do that. But just to clarify: Are you worrying about out-of-address-space or out-of-physical-memory? (I guess you're concerned about out-of-address-space. If you're concerned about out-of-physical-memory, there would be no difference between decommitting pages and destructing CallbackStacks.)
On 2016/02/10 12:11:37, haraken wrote: > On 2016/02/10 12:04:11, sof wrote: > > On 2016/02/10 11:55:50, haraken wrote: > > > On 2016/02/10 09:54:06, sof wrote: > > > > On 2016/02/10 08:56:40, haraken wrote: > > > > > On 2016/02/10 08:47:48, sof wrote: > > > > > > On 2016/02/10 08:39:28, haraken wrote: > > > > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > > > > what if this allocation fails when under memory pressure? > > > > > > > > > > > > > > That's a valid concern conceptually, but in practice memory pressure > > > > > listeners > > > > > > > for renderers are not yet working well in the first place. So I > might > > > want > > > > > to > > > > > > > see crash reports if it becomes a real problem before worrying about > > it > > > > too > > > > > > > much. > > > > > > > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to > > > > Oilpan. > > > > > > > > > > > > (Tuning the limits used were attempted some time ago, > > > > > > https://codereview.chromium.org/1115993003/ ) > > > > > > > > > > The 640 KB is one of the top memory consumers in Blink, so we need take > > some > > > > > action here. > > > > > > > > > > I understand your point, but I think it's a bit too premature to worry > > about > > > > the > > > > > allocation failure at this point -- there are other scenarios where > > > allocation > > > > > may fail during a GC. For example, the second or later Block allocation > > may > > > > > fail. > > > > > > > > Given the background of having released and worked on browsers current & > > past > > > > with GCs, it is not premature. > > > > > > I'm saying it's premature because we haven't yet handled memory pressure > > > notifications in renderers. So we don't yet know if OOM during a memory > > pressure > > > GC is a real problem or not. I'd be more interested in trying this CL and > > seeing > > > how much Oilpan crashes when allocating the CallbackStacks. > > > > > > Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is > too > > > much. Oilpan should decommit more memory after finishing the marking phase. > > > (c.f., V8 decommits a marking dequeue after finishing the marking phase, > > > reserving 256 KB: > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > > > > > > > > > > What problem are you really trying to solve? Having 640k of a 32-bit(?) > > > address > > > > space taken is not that considerable. > > > > > > My concern is the physical memory behind it. If Oilpan releases the 640 KB, > we > > > can let other components use the 640 KB without causing page swapping. > > > > Hinting that the stack area is unused post GC (and decommit it), would be a > good > > idea. Let's do that instead? > > We could do that. > > But just to clarify: Are you worrying about out-of-address-space or > out-of-physical-memory? > > (I guess you're concerned about out-of-address-space. If you're concerned about > out-of-physical-memory, there would be no difference between decommitting pages > and destructing CallbackStacks.) (Decommitting the pages is a bit complicated since we need to implement the CallbackStack's backing storage with mmap. It's just work though.)
On 2016/02/10 12:11:37, haraken wrote: > On 2016/02/10 12:04:11, sof wrote: > > On 2016/02/10 11:55:50, haraken wrote: > > > On 2016/02/10 09:54:06, sof wrote: > > > > On 2016/02/10 08:56:40, haraken wrote: > > > > > On 2016/02/10 08:47:48, sof wrote: > > > > > > On 2016/02/10 08:39:28, haraken wrote: > > > > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > > > > what if this allocation fails when under memory pressure? > > > > > > > > > > > > > > That's a valid concern conceptually, but in practice memory pressure > > > > > listeners > > > > > > > for renderers are not yet working well in the first place. So I > might > > > want > > > > > to > > > > > > > see crash reports if it becomes a real problem before worrying about > > it > > > > too > > > > > > > much. > > > > > > > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost to > > > > Oilpan. > > > > > > > > > > > > (Tuning the limits used were attempted some time ago, > > > > > > https://codereview.chromium.org/1115993003/ ) > > > > > > > > > > The 640 KB is one of the top memory consumers in Blink, so we need take > > some > > > > > action here. > > > > > > > > > > I understand your point, but I think it's a bit too premature to worry > > about > > > > the > > > > > allocation failure at this point -- there are other scenarios where > > > allocation > > > > > may fail during a GC. For example, the second or later Block allocation > > may > > > > > fail. > > > > > > > > Given the background of having released and worked on browsers current & > > past > > > > with GCs, it is not premature. > > > > > > I'm saying it's premature because we haven't yet handled memory pressure > > > notifications in renderers. So we don't yet know if OOM during a memory > > pressure > > > GC is a real problem or not. I'd be more interested in trying this CL and > > seeing > > > how much Oilpan crashes when allocating the CallbackStacks. > > > > > > Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is > too > > > much. Oilpan should decommit more memory after finishing the marking phase. > > > (c.f., V8 decommits a marking dequeue after finishing the marking phase, > > > reserving 256 KB: > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > > > > > > > > > > What problem are you really trying to solve? Having 640k of a 32-bit(?) > > > address > > > > space taken is not that considerable. > > > > > > My concern is the physical memory behind it. If Oilpan releases the 640 KB, > we > > > can let other components use the 640 KB without causing page swapping. > > > > Hinting that the stack area is unused post GC (and decommit it), would be a > good > > idea. Let's do that instead? > > We could do that. > > But just to clarify: Are you worrying about out-of-address-space or > out-of-physical-memory? > The former, OOM happening while trying to service a GC. i.e., preferable for a mutator to GC more often when it runs up against OOM, than have it over-allocate first so that the GC is unable to do its job. > (I guess you're concerned about out-of-address-space. If you're concerned about > out-of-physical-memory, there would be no difference between decommitting pages > and destructing CallbackStacks.) Yes, I leave it to the OS to worry about paging behavior. But hint to it that this stack region will now be unused for a while, seems worthwhile.
On 2016/02/10 12:40:09, sof wrote: > On 2016/02/10 12:11:37, haraken wrote: > > On 2016/02/10 12:04:11, sof wrote: > > > On 2016/02/10 11:55:50, haraken wrote: > > > > On 2016/02/10 09:54:06, sof wrote: > > > > > On 2016/02/10 08:56:40, haraken wrote: > > > > > > On 2016/02/10 08:47:48, sof wrote: > > > > > > > On 2016/02/10 08:39:28, haraken wrote: > > > > > > > > On 2016/02/10 08:34:54, sof wrote: > > > > > > > > > what if this allocation fails when under memory pressure? > > > > > > > > > > > > > > > > That's a valid concern conceptually, but in practice memory > pressure > > > > > > listeners > > > > > > > > for renderers are not yet working well in the first place. So I > > might > > > > want > > > > > > to > > > > > > > > see crash reports if it becomes a real problem before worrying > about > > > it > > > > > too > > > > > > > > much. > > > > > > > > > > > > > > We cannot fail doing a GC; to me this is an upfront allocation cost > to > > > > > Oilpan. > > > > > > > > > > > > > > (Tuning the limits used were attempted some time ago, > > > > > > > https://codereview.chromium.org/1115993003/ ) > > > > > > > > > > > > The 640 KB is one of the top memory consumers in Blink, so we need > take > > > some > > > > > > action here. > > > > > > > > > > > > I understand your point, but I think it's a bit too premature to worry > > > about > > > > > the > > > > > > allocation failure at this point -- there are other scenarios where > > > > allocation > > > > > > may fail during a GC. For example, the second or later Block > allocation > > > may > > > > > > fail. > > > > > > > > > > Given the background of having released and worked on browsers current & > > > past > > > > > with GCs, it is not premature. > > > > > > > > I'm saying it's premature because we haven't yet handled memory pressure > > > > notifications in renderers. So we don't yet know if OOM during a memory > > > pressure > > > > GC is a real problem or not. I'd be more interested in trying this CL and > > > seeing > > > > how much Oilpan crashes when allocating the CallbackStacks. > > > > > > > > Even if we don't accept this CL, reserving 128 KB * (4 + # of threads) is > > too > > > > much. Oilpan should decommit more memory after finishing the marking > phase. > > > > (c.f., V8 decommits a marking dequeue after finishing the marking phase, > > > > reserving 256 KB: > > > > > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/heap/heap.c...). > > > > > > > > > > > > > > What problem are you really trying to solve? Having 640k of a 32-bit(?) > > > > address > > > > > space taken is not that considerable. > > > > > > > > My concern is the physical memory behind it. If Oilpan releases the 640 > KB, > > we > > > > can let other components use the 640 KB without causing page swapping. > > > > > > Hinting that the stack area is unused post GC (and decommit it), would be a > > good > > > idea. Let's do that instead? > > > > We could do that. > > > > But just to clarify: Are you worrying about out-of-address-space or > > out-of-physical-memory? > > > > The former, OOM happening while trying to service a GC. i.e., preferable for a > mutator to GC more often when it runs up against OOM, than have it over-allocate > first so that the GC is unable to do its job. > > > (I guess you're concerned about out-of-address-space. If you're concerned > about > > out-of-physical-memory, there would be no difference between decommitting > pages > > and destructing CallbackStacks.) > > Yes, I leave it to the OS to worry about paging behavior. But hint to it that > this stack region will now be unused for a while, seems worthwhile. Updated the CL with the decommit approach. PTAL.
Description was changed from ========== Oilpan: Reduce 640 KB from CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 5 CallbackStacks and each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * 5 = 640 KB. This CL removes the 640 KB by explicitly initializing & shutting down the CallbackStacks at every GC phase. BUG= ========== to ========== Oilpan: Reduce 640 KB from CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it spawns 18 threads attached to Oilpan, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ==========
Description was changed from ========== Oilpan: Reduce 640 KB from CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it spawns 18 threads attached to Oilpan, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ========== to ========== Oilpan: Reduce 640 KB from CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ==========
lgtm. could you sync&update the description as well? https://codereview.chromium.org/1686943002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:37: WTF::discardSystemPages(m_buffer, blockSize * sizeof(Item)); Move this below the loop?
https://codereview.chromium.org/1686943002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:37: WTF::discardSystemPages(m_buffer, blockSize * sizeof(Item)); On 2016/02/10 14:11:38, sof wrote: > Move this below the loop? Done.
The CQ bit was checked by haraken@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from sigbjornf@opera.com Link to the patchset: https://codereview.chromium.org/1686943002/#ps120001 (title: " ")
Description was changed from ========== Oilpan: Reduce 640 KB from CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ==========
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1686943002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1686943002/120001
Message was sent while issue was closed.
Description was changed from ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674}
Message was sent while issue was closed.
Hmm, I begin to think that the original approach (i.e., destruct CallbackStacks every time) would be simpler and better. PartitionAlloc never releases virtual address spaces that have been used before. So as long as we allocate CallbackStacks on PartitionAlloc, we should not hit out-of-address-space when allocating a new CallbackStack in future GCs. In addition, if we allocate CallbackStacks on PartitionAlloc, PartitionAlloc will decommit the underlying system pages only when needed. In other words, PartitionAlloc is smart enough to support our expectations on CallbackStacks, so I think it's better to just use it.
Message was sent while issue was closed.
On 2016/02/11 05:55:38, haraken wrote: > Hmm, I begin to think that the original approach (i.e., destruct CallbackStacks > every time) would be simpler and better. > > PartitionAlloc never releases virtual address spaces that have been used before. > So as long as we allocate CallbackStacks on PartitionAlloc, we should not hit > out-of-address-space when allocating a new CallbackStack in future GCs. In > addition, if we allocate CallbackStacks on PartitionAlloc, PartitionAlloc will > decommit the underlying system pages only when needed. > > In other words, PartitionAlloc is smart enough to support our expectations on > CallbackStacks, so I think it's better to just use it. But someone else might well take the CallbackStack's VM block that you released. And then you'll OOM when attempting to initiate the next garbage collection.
Message was sent while issue was closed.
On 2016/02/11 06:05:55, sof wrote: > On 2016/02/11 05:55:38, haraken wrote: > > Hmm, I begin to think that the original approach (i.e., destruct > CallbackStacks > > every time) would be simpler and better. > > > > PartitionAlloc never releases virtual address spaces that have been used > before. > > So as long as we allocate CallbackStacks on PartitionAlloc, we should not hit > > out-of-address-space when allocating a new CallbackStack in future GCs. In > > addition, if we allocate CallbackStacks on PartitionAlloc, PartitionAlloc will > > decommit the underlying system pages only when needed. > > > > In other words, PartitionAlloc is smart enough to support our expectations on > > CallbackStacks, so I think it's better to just use it. > > But someone else might well take the CallbackStack's VM block that you released. > And then you'll OOM when attempting to initiate the next garbage collection. Yes, but only if there is a guy that allocates an object that fits the same bucket (8192 * sizeof(Item) + \alpha). I'm slightly concerned about the performance impact of decommitting because it will cause page faults at each marking phase (marking must be super fast). If we just use PartitionAlloc, the underlying system pages won't be decommitted in most cases.
Message was sent while issue was closed.
On 2016/02/11 06:17:37, haraken wrote: > On 2016/02/11 06:05:55, sof wrote: > > On 2016/02/11 05:55:38, haraken wrote: > > > Hmm, I begin to think that the original approach (i.e., destruct > > CallbackStacks > > > every time) would be simpler and better. > > > > > > PartitionAlloc never releases virtual address spaces that have been used > > before. > > > So as long as we allocate CallbackStacks on PartitionAlloc, we should not > hit > > > out-of-address-space when allocating a new CallbackStack in future GCs. In > > > addition, if we allocate CallbackStacks on PartitionAlloc, PartitionAlloc > will > > > decommit the underlying system pages only when needed. > > > > > > In other words, PartitionAlloc is smart enough to support our expectations > on > > > CallbackStacks, so I think it's better to just use it. > > > > But someone else might well take the CallbackStack's VM block that you > released. > > And then you'll OOM when attempting to initiate the next garbage collection. > > Yes, but only if there is a guy that allocates an object that fits the same > bucket (8192 * sizeof(Item) + \alpha). > > I'm slightly concerned about the performance impact of decommitting because it > will cause page faults at each marking phase (marking must be super fast). If we > just use PartitionAlloc, the underlying system pages won't be decommitted in > most cases. And super reliable. If that's a concern, then revert this? The original change here isn't something we should do (without going through yet another discussion loop on that.)
Message was sent while issue was closed.
On 2016/02/11 06:28:26, sof wrote: > On 2016/02/11 06:17:37, haraken wrote: > > On 2016/02/11 06:05:55, sof wrote: > > > On 2016/02/11 05:55:38, haraken wrote: > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > CallbackStacks > > > > every time) would be simpler and better. > > > > > > > > PartitionAlloc never releases virtual address spaces that have been used > > > before. > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we should not > > hit > > > > out-of-address-space when allocating a new CallbackStack in future GCs. In > > > > addition, if we allocate CallbackStacks on PartitionAlloc, PartitionAlloc > > will > > > > decommit the underlying system pages only when needed. > > > > > > > > In other words, PartitionAlloc is smart enough to support our expectations > > on > > > > CallbackStacks, so I think it's better to just use it. > > > > > > But someone else might well take the CallbackStack's VM block that you > > released. > > > And then you'll OOM when attempting to initiate the next garbage collection. > > > > Yes, but only if there is a guy that allocates an object that fits the same > > bucket (8192 * sizeof(Item) + \alpha). > > > > I'm slightly concerned about the performance impact of decommitting because it > > will cause page faults at each marking phase (marking must be super fast). If > we > > just use PartitionAlloc, the underlying system pages won't be decommitted in > > most cases. > > And super reliable. If that's a concern, then revert this? The original change > here isn't something we should do (without going through yet another discussion > loop on that.) Yeah, let's revert this. And start yet another discussion :)
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/1689823003/ by haraken@chromium.org. The reason for reverting is: This CL will cause page faults at every marking phase. Marking must be super fast, so we need to reconsider the approach. .
Message was sent while issue was closed.
On 2016/02/11 06:29:59, haraken wrote: > On 2016/02/11 06:28:26, sof wrote: > > On 2016/02/11 06:17:37, haraken wrote: > > > On 2016/02/11 06:05:55, sof wrote: > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > > CallbackStacks > > > > > every time) would be simpler and better. > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have been used > > > > before. > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we should > not > > > hit > > > > > out-of-address-space when allocating a new CallbackStack in future GCs. > In > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > PartitionAlloc > > > will > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > expectations > > > on > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > But someone else might well take the CallbackStack's VM block that you > > > released. > > > > And then you'll OOM when attempting to initiate the next garbage > collection. > > > > > > Yes, but only if there is a guy that allocates an object that fits the same > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > I'm slightly concerned about the performance impact of decommitting because > it > > > will cause page faults at each marking phase (marking must be super fast). > If > > we > > > just use PartitionAlloc, the underlying system pages won't be decommitted in > > > most cases. > > > > And super reliable. If that's a concern, then revert this? The original change > > here isn't something we should do (without going through yet another > discussion > > loop on that.) > > Yeah, let's revert this. And start yet another discussion :) No, let's not. You're trading away Oilpan stability to avoid a 640k allocation showing up in a top 10 allocation breakdown. It's lacking in sense.
Message was sent while issue was closed.
On 2016/02/11 06:32:18, sof wrote: > On 2016/02/11 06:29:59, haraken wrote: > > On 2016/02/11 06:28:26, sof wrote: > > > On 2016/02/11 06:17:37, haraken wrote: > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > > > CallbackStacks > > > > > > every time) would be simpler and better. > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have been > used > > > > > before. > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we should > > not > > > > hit > > > > > > out-of-address-space when allocating a new CallbackStack in future > GCs. > > In > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > PartitionAlloc > > > > will > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > > expectations > > > > on > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > But someone else might well take the CallbackStack's VM block that you > > > > released. > > > > > And then you'll OOM when attempting to initiate the next garbage > > collection. > > > > > > > > Yes, but only if there is a guy that allocates an object that fits the > same > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > I'm slightly concerned about the performance impact of decommitting > because > > it > > > > will cause page faults at each marking phase (marking must be super fast). > > If > > > we > > > > just use PartitionAlloc, the underlying system pages won't be decommitted > in > > > > most cases. > > > > > > And super reliable. If that's a concern, then revert this? The original > change > > > here isn't something we should do (without going through yet another > > discussion > > > loop on that.) > > > > Yeah, let's revert this. And start yet another discussion :) > > No, let's not. You're trading away Oilpan stability to avoid a 640k allocation > showing up in a top 10 allocation breakdown. It's lacking in sense. Let's not for what? Do you not want me to revert this CL? Or do you not want to reconsider how to address the 640k allocation (actually it's 2.3 MB of allocations as I updated the CL description)?
Message was sent while issue was closed.
On 2016/02/11 07:40:52, haraken wrote: > On 2016/02/11 06:32:18, sof wrote: > > On 2016/02/11 06:29:59, haraken wrote: > > > On 2016/02/11 06:28:26, sof wrote: > > > > On 2016/02/11 06:17:37, haraken wrote: > > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > > > > CallbackStacks > > > > > > > every time) would be simpler and better. > > > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have been > > used > > > > > > before. > > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we > should > > > not > > > > > hit > > > > > > > out-of-address-space when allocating a new CallbackStack in future > > GCs. > > > In > > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > > PartitionAlloc > > > > > will > > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > > > expectations > > > > > on > > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > > > But someone else might well take the CallbackStack's VM block that you > > > > > released. > > > > > > And then you'll OOM when attempting to initiate the next garbage > > > collection. > > > > > > > > > > Yes, but only if there is a guy that allocates an object that fits the > > same > > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > > > I'm slightly concerned about the performance impact of decommitting > > because > > > it > > > > > will cause page faults at each marking phase (marking must be super > fast). > > > If > > > > we > > > > > just use PartitionAlloc, the underlying system pages won't be > decommitted > > in > > > > > most cases. > > > > > > > > And super reliable. If that's a concern, then revert this? The original > > change > > > > here isn't something we should do (without going through yet another > > > discussion > > > > loop on that.) > > > > > > Yeah, let's revert this. And start yet another discussion :) > > > > No, let's not. You're trading away Oilpan stability to avoid a 640k allocation > > showing up in a top 10 allocation breakdown. It's lacking in sense. > > Let's not for what? Do you not want me to revert this CL? Or do you not want to > reconsider how to address the 640k allocation (actually it's 2.3 MB of > allocations as I updated the CL description)? I don't consider ps#1 worth discussing again.
Message was sent while issue was closed.
On 2016/02/11 07:53:34, sof wrote: > On 2016/02/11 07:40:52, haraken wrote: > > On 2016/02/11 06:32:18, sof wrote: > > > On 2016/02/11 06:29:59, haraken wrote: > > > > On 2016/02/11 06:28:26, sof wrote: > > > > > On 2016/02/11 06:17:37, haraken wrote: > > > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > > > > > CallbackStacks > > > > > > > > every time) would be simpler and better. > > > > > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have > been > > > used > > > > > > > before. > > > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we > > should > > > > not > > > > > > hit > > > > > > > > out-of-address-space when allocating a new CallbackStack in future > > > GCs. > > > > In > > > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > > > PartitionAlloc > > > > > > will > > > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > > > > expectations > > > > > > on > > > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > > > > > But someone else might well take the CallbackStack's VM block that > you > > > > > > released. > > > > > > > And then you'll OOM when attempting to initiate the next garbage > > > > collection. > > > > > > > > > > > > Yes, but only if there is a guy that allocates an object that fits the > > > same > > > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > > > > > I'm slightly concerned about the performance impact of decommitting > > > because > > > > it > > > > > > will cause page faults at each marking phase (marking must be super > > fast). > > > > If > > > > > we > > > > > > just use PartitionAlloc, the underlying system pages won't be > > decommitted > > > in > > > > > > most cases. > > > > > > > > > > And super reliable. If that's a concern, then revert this? The original > > > change > > > > > here isn't something we should do (without going through yet another > > > > discussion > > > > > loop on that.) > > > > > > > > Yeah, let's revert this. And start yet another discussion :) > > > > > > No, let's not. You're trading away Oilpan stability to avoid a 640k > allocation > > > showing up in a top 10 allocation breakdown. It's lacking in sense. > > > > Let's not for what? Do you not want me to revert this CL? Or do you not want > to > > reconsider how to address the 640k allocation (actually it's 2.3 MB of > > allocations as I updated the CL description)? > > I don't consider ps#1 worth discussing again. I cannot leave the waste of 2.3 MB anyway. I'd still propose going with PS1 and see if we really fail at allocating CallbackStacks in real worlds. However, if you don't like the idea for some philosophical reason, another idea would be: - Allocate 8192 slots only for Heap::s_markingStack. - Allocate 64 slots for all other stacks.
Message was sent while issue was closed.
On 2016/02/11 08:07:24, haraken wrote: > On 2016/02/11 07:53:34, sof wrote: > > On 2016/02/11 07:40:52, haraken wrote: > > > On 2016/02/11 06:32:18, sof wrote: > > > > On 2016/02/11 06:29:59, haraken wrote: > > > > > On 2016/02/11 06:28:26, sof wrote: > > > > > > On 2016/02/11 06:17:37, haraken wrote: > > > > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > > > > Hmm, I begin to think that the original approach (i.e., destruct > > > > > > > > CallbackStacks > > > > > > > > > every time) would be simpler and better. > > > > > > > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have > > been > > > > used > > > > > > > > before. > > > > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we > > > should > > > > > not > > > > > > > hit > > > > > > > > > out-of-address-space when allocating a new CallbackStack in > future > > > > GCs. > > > > > In > > > > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > > > > PartitionAlloc > > > > > > > will > > > > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > > > > > expectations > > > > > > > on > > > > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > > > > > > > But someone else might well take the CallbackStack's VM block that > > you > > > > > > > released. > > > > > > > > And then you'll OOM when attempting to initiate the next garbage > > > > > collection. > > > > > > > > > > > > > > Yes, but only if there is a guy that allocates an object that fits > the > > > > same > > > > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > > > > > > > I'm slightly concerned about the performance impact of decommitting > > > > because > > > > > it > > > > > > > will cause page faults at each marking phase (marking must be super > > > fast). > > > > > If > > > > > > we > > > > > > > just use PartitionAlloc, the underlying system pages won't be > > > decommitted > > > > in > > > > > > > most cases. > > > > > > > > > > > > And super reliable. If that's a concern, then revert this? The > original > > > > change > > > > > > here isn't something we should do (without going through yet another > > > > > discussion > > > > > > loop on that.) > > > > > > > > > > Yeah, let's revert this. And start yet another discussion :) > > > > > > > > No, let's not. You're trading away Oilpan stability to avoid a 640k > > allocation > > > > showing up in a top 10 allocation breakdown. It's lacking in sense. > > > > > > Let's not for what? Do you not want me to revert this CL? Or do you not want > > to > > > reconsider how to address the 640k allocation (actually it's 2.3 MB of > > > allocations as I updated the CL description)? > > > > I don't consider ps#1 worth discussing again. > > I cannot leave the waste of 2.3 MB anyway. > > I'd still propose going with PS1 and see if we really fail at allocating > CallbackStacks in real worlds. > > However, if you don't like the idea for some philosophical reason, another idea > would be: > > - Allocate 8192 slots only for Heap::s_markingStack. > - Allocate 64 slots for all other stacks. Philosophical is your assessment, not mine. I don't want to do this, so you'll have to find someone else to agree with your view of doing repeated allocations per GC instead. (Tuning limits is what https://codereview.chromium.org/1115993003/ experimented with.)
Message was sent while issue was closed.
On 2016/02/11 08:10:58, sof wrote: > On 2016/02/11 08:07:24, haraken wrote: > > On 2016/02/11 07:53:34, sof wrote: > > > On 2016/02/11 07:40:52, haraken wrote: > > > > On 2016/02/11 06:32:18, sof wrote: > > > > > On 2016/02/11 06:29:59, haraken wrote: > > > > > > On 2016/02/11 06:28:26, sof wrote: > > > > > > > On 2016/02/11 06:17:37, haraken wrote: > > > > > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > > > > > Hmm, I begin to think that the original approach (i.e., > destruct > > > > > > > > > CallbackStacks > > > > > > > > > > every time) would be simpler and better. > > > > > > > > > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that have > > > been > > > > > used > > > > > > > > > before. > > > > > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, we > > > > should > > > > > > not > > > > > > > > hit > > > > > > > > > > out-of-address-space when allocating a new CallbackStack in > > future > > > > > GCs. > > > > > > In > > > > > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > > > > > PartitionAlloc > > > > > > > > will > > > > > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support our > > > > > > expectations > > > > > > > > on > > > > > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > > > > > > > > > But someone else might well take the CallbackStack's VM block > that > > > you > > > > > > > > released. > > > > > > > > > And then you'll OOM when attempting to initiate the next garbage > > > > > > collection. > > > > > > > > > > > > > > > > Yes, but only if there is a guy that allocates an object that fits > > the > > > > > same > > > > > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > > > > > > > > > I'm slightly concerned about the performance impact of > decommitting > > > > > because > > > > > > it > > > > > > > > will cause page faults at each marking phase (marking must be > super > > > > fast). > > > > > > If > > > > > > > we > > > > > > > > just use PartitionAlloc, the underlying system pages won't be > > > > decommitted > > > > > in > > > > > > > > most cases. > > > > > > > > > > > > > > And super reliable. If that's a concern, then revert this? The > > original > > > > > change > > > > > > > here isn't something we should do (without going through yet another > > > > > > discussion > > > > > > > loop on that.) > > > > > > > > > > > > Yeah, let's revert this. And start yet another discussion :) > > > > > > > > > > No, let's not. You're trading away Oilpan stability to avoid a 640k > > > allocation > > > > > showing up in a top 10 allocation breakdown. It's lacking in sense. > > > > > > > > Let's not for what? Do you not want me to revert this CL? Or do you not > want > > > to > > > > reconsider how to address the 640k allocation (actually it's 2.3 MB of > > > > allocations as I updated the CL description)? > > > > > > I don't consider ps#1 worth discussing again. > > > > I cannot leave the waste of 2.3 MB anyway. > > > > I'd still propose going with PS1 and see if we really fail at allocating > > CallbackStacks in real worlds. > > > > However, if you don't like the idea for some philosophical reason, another > idea > > would be: > > > > - Allocate 8192 slots only for Heap::s_markingStack. > > - Allocate 64 slots for all other stacks. > > Philosophical is your assessment, not mine. I don't want to do this, so you'll > have to find someone else to agree with your view of doing repeated allocations > per GC instead. I'm saying philosophical in a sense that we don't yet have any data. That's why I'm proposing to try PS1 and get data to understand a real world. > > (Tuning limits is what https://codereview.chromium.org/1115993003/ experimented > with.) So I'm fine with this approach as well (at least it's much better than doing nothing). I'll experiment with the threshold tomorrow if you're okay with it.
Message was sent while issue was closed.
On 2016/02/11 08:16:12, haraken wrote: > On 2016/02/11 08:10:58, sof wrote: > > On 2016/02/11 08:07:24, haraken wrote: > > > On 2016/02/11 07:53:34, sof wrote: > > > > On 2016/02/11 07:40:52, haraken wrote: > > > > > On 2016/02/11 06:32:18, sof wrote: > > > > > > On 2016/02/11 06:29:59, haraken wrote: > > > > > > > On 2016/02/11 06:28:26, sof wrote: > > > > > > > > On 2016/02/11 06:17:37, haraken wrote: > > > > > > > > > On 2016/02/11 06:05:55, sof wrote: > > > > > > > > > > On 2016/02/11 05:55:38, haraken wrote: > > > > > > > > > > > Hmm, I begin to think that the original approach (i.e., > > destruct > > > > > > > > > > CallbackStacks > > > > > > > > > > > every time) would be simpler and better. > > > > > > > > > > > > > > > > > > > > > > PartitionAlloc never releases virtual address spaces that > have > > > > been > > > > > > used > > > > > > > > > > before. > > > > > > > > > > > So as long as we allocate CallbackStacks on PartitionAlloc, > we > > > > > should > > > > > > > not > > > > > > > > > hit > > > > > > > > > > > out-of-address-space when allocating a new CallbackStack in > > > future > > > > > > GCs. > > > > > > > In > > > > > > > > > > > addition, if we allocate CallbackStacks on PartitionAlloc, > > > > > > > PartitionAlloc > > > > > > > > > will > > > > > > > > > > > decommit the underlying system pages only when needed. > > > > > > > > > > > > > > > > > > > > > > In other words, PartitionAlloc is smart enough to support > our > > > > > > > expectations > > > > > > > > > on > > > > > > > > > > > CallbackStacks, so I think it's better to just use it. > > > > > > > > > > > > > > > > > > > > But someone else might well take the CallbackStack's VM block > > that > > > > you > > > > > > > > > released. > > > > > > > > > > And then you'll OOM when attempting to initiate the next > garbage > > > > > > > collection. > > > > > > > > > > > > > > > > > > Yes, but only if there is a guy that allocates an object that > fits > > > the > > > > > > same > > > > > > > > > bucket (8192 * sizeof(Item) + \alpha). > > > > > > > > > > > > > > > > > > I'm slightly concerned about the performance impact of > > decommitting > > > > > > because > > > > > > > it > > > > > > > > > will cause page faults at each marking phase (marking must be > > super > > > > > fast). > > > > > > > If > > > > > > > > we > > > > > > > > > just use PartitionAlloc, the underlying system pages won't be > > > > > decommitted > > > > > > in > > > > > > > > > most cases. > > > > > > > > > > > > > > > > And super reliable. If that's a concern, then revert this? The > > > original > > > > > > change > > > > > > > > here isn't something we should do (without going through yet > another > > > > > > > discussion > > > > > > > > loop on that.) > > > > > > > > > > > > > > Yeah, let's revert this. And start yet another discussion :) > > > > > > > > > > > > No, let's not. You're trading away Oilpan stability to avoid a 640k > > > > allocation > > > > > > showing up in a top 10 allocation breakdown. It's lacking in sense. > > > > > > > > > > Let's not for what? Do you not want me to revert this CL? Or do you not > > want > > > > to > > > > > reconsider how to address the 640k allocation (actually it's 2.3 MB of > > > > > allocations as I updated the CL description)? > > > > > > > > I don't consider ps#1 worth discussing again. > > > > > > I cannot leave the waste of 2.3 MB anyway. > > > > > > I'd still propose going with PS1 and see if we really fail at allocating > > > CallbackStacks in real worlds. > > > > > > However, if you don't like the idea for some philosophical reason, another > > idea > > > would be: > > > > > > - Allocate 8192 slots only for Heap::s_markingStack. > > > - Allocate 64 slots for all other stacks. > > > > Philosophical is your assessment, not mine. I don't want to do this, so you'll > > have to find someone else to agree with your view of doing repeated > allocations > > per GC instead. > > I'm saying philosophical in a sense that we don't yet have any data. That's why > I'm proposing to try PS1 and get data to understand a real world. > > > > > (Tuning limits is what https://codereview.chromium.org/1115993003/ > experimented > > with.) > > So I'm fine with this approach as well (at least it's much better than doing > nothing). I'll experiment with the threshold tomorrow if you're okay with it. Updated a new CL: https://codereview.chromium.org/1695653004/
Message was sent while issue was closed.
https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/Heap.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/Heap.cpp:360: s_markingStack->decommit(); An alternative scheme would be for Heap to reserve memory for all the initial CallbackStacks in one chunk & then create the stacks with initial blocks set up to offset into that reserved block. That way you could decommit the chunk in one operation, and free CallbackStack::Block from having to decommit themselves individually. Any extra blocks needed by a CallbackStack would then not be allocated in "decommittable-memory", which is "fine" (assuming any allocation like this is acceptable during marking) as they'll be deallocated shortly thereafter anyway.
Message was sent while issue was closed.
I confirmed that this CL doesn't regress frame_times on Linux: http://haraken.info/a/results.html On 2016/02/24 09:29:39, sof wrote: > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > File third_party/WebKit/Source/platform/heap/Heap.cpp (right): > > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > third_party/WebKit/Source/platform/heap/Heap.cpp:360: > s_markingStack->decommit(); > An alternative scheme would be for Heap to reserve memory for all the initial > CallbackStacks in one chunk & then create the stacks with initial blocks set up > to offset into that reserved block. > > That way you could decommit the chunk in one operation, and free > CallbackStack::Block from having to decommit themselves individually. Any extra > blocks needed by a CallbackStack would then not be allocated in > "decommittable-memory", which is "fine" (assuming any allocation like this is > acceptable during marking) as they'll be deallocated shortly thereafter anyway. Would there be any benefit of doing the decommit in one operation? Even if we do that in one operation, the number of system pages we have to decommit won't change.
Message was sent while issue was closed.
On 2016/02/24 09:42:22, haraken wrote: > I confirmed that this CL doesn't regress frame_times on Linux: > http://haraken.info/a/results.html > > On 2016/02/24 09:29:39, sof wrote: > > > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > > File third_party/WebKit/Source/platform/heap/Heap.cpp (right): > > > > > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > > third_party/WebKit/Source/platform/heap/Heap.cpp:360: > > s_markingStack->decommit(); > > An alternative scheme would be for Heap to reserve memory for all the initial > > CallbackStacks in one chunk & then create the stacks with initial blocks set > up > > to offset into that reserved block. > > > > That way you could decommit the chunk in one operation, and free > > CallbackStack::Block from having to decommit themselves individually. Any > extra > > blocks needed by a CallbackStack would then not be allocated in > > "decommittable-memory", which is "fine" (assuming any allocation like this is > > acceptable during marking) as they'll be deallocated shortly thereafter > anyway. > > Would there be any benefit of doing the decommit in one operation? Even if we do > that in one operation, the number of system pages we have to decommit won't > change. Less system call "traffic".
Message was sent while issue was closed.
https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: m_first->decommit(); Isn't this call redundant/unwanted if the stack itself is decommitted in the end?
Message was sent while issue was closed.
I'll implement the per-heap-reserved-stack-region if the number of system calls really matters, but I don't really want to introduce the complexity at this point. https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: m_first->decommit(); On 2016/02/24 09:50:05, sof wrote: > Isn't this call redundant/unwanted if the stack itself is decommitted in the > end? This decommit is only enabled in assert builds where we want to explicitly null out the memory of the unused CallbackStack.
Message was sent while issue was closed.
Description was changed from ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ==========
https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: m_first->decommit(); On 2016/02/24 10:06:19, haraken wrote: > On 2016/02/24 09:50:05, sof wrote: > > Isn't this call redundant/unwanted if the stack itself is decommitted in the > > end? > > This decommit is only enabled in assert builds where we want to explicitly null > out the memory of the unused CallbackStack. Shouldn't it just do the clearing then?
On 2016/02/24 10:06:20, haraken wrote: > I'll implement the per-heap-reserved-stack-region if the number of system calls > really matters, but I don't really want to introduce the complexity at this > point. > ok, and how will you gauge if it matters to have a contiguous block or not?
https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: m_first->decommit(); On 2016/02/24 10:07:39, sof wrote: > On 2016/02/24 10:06:19, haraken wrote: > > On 2016/02/24 09:50:05, sof wrote: > > > Isn't this call redundant/unwanted if the stack itself is decommitted in the > > > end? > > > > This decommit is only enabled in assert builds where we want to explicitly > null > > out the memory of the unused CallbackStack. > > Shouldn't it just do the clearing then? Done.
On 2016/02/24 10:11:46, sof wrote: > On 2016/02/24 10:06:20, haraken wrote: > > I'll implement the per-heap-reserved-stack-region if the number of system > calls > > really matters, but I don't really want to introduce the complexity at this > > point. > > > > ok, and how will you gauge if it matters to have a contiguous block or not? So the ideal long-term fix would be to implement a contiguous and reserved memory pool shared by all CallStacks. If this CL doesn't introduce any actual regression, I'd like to avoid introducing extra complexity in this CL.
On 2016/02/24 10:23:39, haraken wrote: > On 2016/02/24 10:11:46, sof wrote: > > On 2016/02/24 10:06:20, haraken wrote: > > > I'll implement the per-heap-reserved-stack-region if the number of system > > calls > > > really matters, but I don't really want to introduce the complexity at this > > > point. > > > > > > > ok, and how will you gauge if it matters to have a contiguous block or not? > > So the ideal long-term fix would be to implement a contiguous and reserved > memory pool shared by all CallStacks. If this CL doesn't introduce any actual > regression, I'd like to avoid introducing extra complexity in this CL. Sounds like a sound plan; what about tuning down the size of the ephemeron stack, handled separately?
https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: m_first->decommit(); On 2016/02/24 10:21:44, haraken wrote: > On 2016/02/24 10:07:39, sof wrote: > > On 2016/02/24 10:06:19, haraken wrote: > > > On 2016/02/24 09:50:05, sof wrote: > > > > Isn't this call redundant/unwanted if the stack itself is decommitted in > the > > > > end? > > > > > > This decommit is only enabled in assert builds where we want to explicitly > > null > > > out the memory of the unused CallbackStack. > > > > Shouldn't it just do the clearing then? > > Done. Not yet uploaded?
On 2016/02/24 10:25:57, sof wrote: > On 2016/02/24 10:23:39, haraken wrote: > > On 2016/02/24 10:11:46, sof wrote: > > > On 2016/02/24 10:06:20, haraken wrote: > > > > I'll implement the per-heap-reserved-stack-region if the number of system > > > calls > > > > really matters, but I don't really want to introduce the complexity at > this > > > > point. > > > > > > > > > > ok, and how will you gauge if it matters to have a contiguous block or not? > > > > So the ideal long-term fix would be to implement a contiguous and reserved > > memory pool shared by all CallStacks. If this CL doesn't introduce any actual > > regression, I'd like to avoid introducing extra complexity in this CL. > > Sounds like a sound plan; what about tuning down the size of the ephemeron > stack, handled separately? What size do you prefer? :) Another (more) important size is the one of the thread-local-weak-processing-stack. There are multiple threads in the renderer and each thread allocates 128 KB of the thread-local-weak-processing-stack. 128 KB would be a bit too much for non-main threads (e.g., html parser thread).
On 2016/02/24 10:32:23, sof wrote: > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > File third_party/WebKit/Source/platform/heap/CallbackStack.cpp (right): > > https://codereview.chromium.org/1686943002/diff/120001/third_party/WebKit/Sou... > third_party/WebKit/Source/platform/heap/CallbackStack.cpp:115: > m_first->decommit(); > On 2016/02/24 10:21:44, haraken wrote: > > On 2016/02/24 10:07:39, sof wrote: > > > On 2016/02/24 10:06:19, haraken wrote: > > > > On 2016/02/24 09:50:05, sof wrote: > > > > > Isn't this call redundant/unwanted if the stack itself is decommitted in > > the > > > > > end? > > > > > > > > This decommit is only enabled in assert builds where we want to explicitly > > > null > > > > out the memory of the unused CallbackStack. > > > > > > Shouldn't it just do the clearing then? > > > > Done. > > Not yet uploaded? Done.
lgtm. (tuning sizes separately makes more sense.)
On 2016/02/24 10:42:43, sof wrote: > lgtm. > > (tuning sizes separately makes more sense.) Thanks, I'll wait for yutak's results on mobile and land this.
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1686943002/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1686943002/160001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1686943002/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1686943002/160001
Message was sent while issue was closed.
Description was changed from ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
Description was changed from ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} ========== to ========== Oilpan: Decommit backing storage of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 2.3 MB of memory. This CL removes the waste by discarding system pages of the Block after finishing every GC phase. BUG= Committed: https://crrev.com/34393519671e62e5a66602736899b94ad33dd45a Cr-Commit-Position: refs/heads/master@{#374674} Committed: https://crrev.com/494e3b6b092a89e8987781e9fb4412f24c0f4bd8 Cr-Commit-Position: refs/heads/master@{#377575} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/494e3b6b092a89e8987781e9fb4412f24c0f4bd8 Cr-Commit-Position: refs/heads/master@{#377575} |