Issue 1695653004: Oilpan: Reduce the reserved size of CallbackStacks

Created:
4 years, 10 months ago by haraken

Modified:
4 years, 9 months ago

Reviewers:
sof, jochen (gone - plz use gerrit), esprehn, Hannes Payer (out of office), Yuta Kitamura, oilpan-reviews

CC:
chromium-reviews, oilpan-reviews, Mads Ager (chromium), blink-reviews, kinuko+watch, kouhei+heap_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Oilpan: Reduce the reserved size of CallbackStacks CallbackStacks are used only while Oilpan's GC is doing marking & weak processing. However, currently each CallbackStack continues retaining one Block forever. This wastes memory a lot. Oilpan has 4 global CallbackStacks and 1 CallbackStack per thread. Each Block consumes 8192 * sizeof(Item) = 128 KB. This means that Oilpan wastes 128 KB * (4 + # of threads). When I start Chrome's new tab page, it creates 18 CallbackStacks, meaning that it wastes 128 KB * 18 = 2.3 MB of memory. This CL reduces the reserved size down to 350 KB. BUG=

Patch Set 1 #

Patch Set 2 : #

Total comments: 1

Created: 4 years, 10 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+43 lines, -29 lines)			Patch
M	third_party/WebKit/Source/platform/heap/CallbackStack.h	View	3 chunks	+6 lines, -21 lines	0 comments	Download
M	third_party/WebKit/Source/platform/heap/CallbackStack.cpp	View	3 chunks	+25 lines, -3 lines	0 comments	Download
M	third_party/WebKit/Source/platform/heap/Heap.cpp	View	2 chunks	+9 lines, -4 lines	1 comment	Download
M	third_party/WebKit/Source/platform/heap/ThreadState.cpp	View	2 chunks	+3 lines, -1 line	0 comments	Download

Messages

Total messages: 27 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

haraken

PTAL (Another idea would be to create a pool for Blocks, but I'm not really ...

4 years, 10 months ago (2016-02-12 05:44:53 UTC) #3

sof

I would argue that it misses the point somewhat to view these stacks as "wasted" ...

4 years, 10 months ago (2016-02-14 21:47:04 UTC) #4

haraken

On 2016/02/14 21:47:04, sof wrote: > I would argue that it misses the point somewhat ...

4 years, 10 months ago (2016-02-15 01:28:06 UTC) #5

On 2016/02/14 21:47:04, sof wrote:
> I would argue that it misses the point somewhat to view these stacks as
"wasted"
> memory -- we don't normally consider a thread stack (8M?) to be a wasted
> pre-allocation.
> 
> If any of these stacks are oversized wrt to their actual use, then we should
> reduce them.
> 
>
https://codereview.chromium.org/1695653004/diff/20001/third_party/WebKit/Sour...
> File third_party/WebKit/Source/platform/heap/Heap.cpp (right):
> 
>
https://codereview.chromium.org/1695653004/diff/20001/third_party/WebKit/Sour...
> third_party/WebKit/Source/platform/heap/Heap.cpp:161: s_markingStack = new
> CallbackStack(4096);
> These are fairly arbitrary numbers, are they based on experiments showing that
> these are reasonable averages / N-percentile of what's commonly seen?

As far as I run telemetry benchmarks, it looks common that the marking stack
exceeds the 4096 limit, or even the current 8192 limit.

If we really want to make sure that the CallbackStack's allocation doesn't fail,
we need to pre-allocate a fairly large amount of memory for the CallbackStacks
(e.g., 4 MB per CallbackStack). However, that increases the memory pressure when
a GC is not running. We should consider a way to minimize the pressure.

Pre-allocating big CallbackStacks would be the most straightforward solution to
prevent a near-oom-GC from failing. However, there would be other solutions that
prevent a near-oom-GC from failing but consume less memory. For example:

- (If you're concerned about out-of-address-space, rather than
out-of-physical-memory,) reserve a 4 MB of address space per CallbackStack. Keep
only the first 4196 entries resident and decommit other entries.

- Create a pool of Blocks that can be shared by the CallbackStacks.

- Near-oom scenarios must be handled by the MemoryPressureListener (V8 GCs and
Oilpan GCs are triggered by the memory pressure notifications). We can make the
MemoryPressureListener purge other memory before triggering V8 GCs and Oilpan
GCs. Then the purging will release memory which can be used by the Oilpan GCs.
For example, just calling partitionPurgeMemory will release a bunch of memory.

That said, as I mentioned before, MemoryPressureListener is not yet fully
working. This means that we don't yet even have an infrastructure to trigger
Oilpan GCs in a near-oom scenario (the tab-discarding team is now working on
this). That is the problem we should solve first before worrying about the
allocation failure of the CallbackStacks -- IMHO, it seems premature to worry
about the allocation failure before fixing the larger problem and getting data
that shows that the allocation failure is a problem in a real world.

FWIW, V8 pre-allocates only 256 KB of memory for the marking stack.

sof

On 2016/02/15 01:28:06, haraken wrote: > ... > > FWIW, V8 pre-allocates only 256 KB ...

4 years, 10 months ago (2016-02-16 13:48:37 UTC) #6

haraken

On 2016/02/16 13:48:37, sof wrote: > On 2016/02/15 01:28:06, haraken wrote: > > > ...

4 years, 10 months ago (2016-02-22 11:38:00 UTC) #7

sof

On 2016/02/22 11:38:00, haraken wrote: > On 2016/02/16 13:48:37, sof wrote: > > On 2016/02/15 ...

4 years, 10 months ago (2016-02-22 11:39:57 UTC) #8

haraken

On 2016/02/22 11:39:57, sof wrote: > On 2016/02/22 11:38:00, haraken wrote: > > On 2016/02/16 ...

4 years, 10 months ago (2016-02-22 11:52:30 UTC) #9

sof

On 2016/02/22 11:52:30, haraken wrote: > On 2016/02/22 11:39:57, sof wrote: > > On 2016/02/22 ...

4 years, 10 months ago (2016-02-22 12:26:16 UTC) #10

On 2016/02/22 11:52:30, haraken wrote:
> On 2016/02/22 11:39:57, sof wrote:
> > On 2016/02/22 11:38:00, haraken wrote:
> > > On 2016/02/16 13:48:37, sof wrote:
> > > > On 2016/02/15 01:28:06, haraken wrote:
> > > > >
> > > > ...
> > > > > 
> > > > > FWIW, V8 pre-allocates only 256 KB of memory for the marking stack.
> > > > 
> > > > Been looking at that implementation to understand it some; it commits
> 256k,
> > > but
> > > > reserves 4M for its marking deque. With the not-unusual bailout handling
> on
> > > > marking stack overflow of just giving up and considered the rest
> > > > marked&reachable rather than attempt to extend the stack further. Oilpan
> > could
> > > > do something similar to avoid allocations during marking.
> > > > 
> > > > Reducing the initial size of the ephemeron stack seems reasonable
without
> > > > upsetting stability.
> > > 
> > > I want to reduce the size of CallbackStacks by the next branch cut. In
> > > conclusion, what's your proposal?
> > 
> > Reduce the ephemeron stack a bit; anything else wouldn't be wise in my view.
> 
> What about s_postMarkingCallbackStack?
> 

I've found the depths for that stack to be considerable; have you investigated
what's typically encountered?

> Reserving 2.3 MB of memory just for oilpan is not acceptable anyway. If you
> really don't want to just reduce the size and collect UMAs, I can implement
> whatever complicated mechanism (e.g., block pools, bailout a GC when the
marking
> stack overflows etc) to reduce the sizeof(CallbackStacks), but what do you
want
> exactly?

I don't want to compromise system stability; the size you quote compares
favorably to what v8 reserves. "Hiding" a fixed reservation by attempting to do
it for every GC, which  proposals here reduce to afaict, I honestly don't see
the merit of. But I've said that before.

Should there be hard limits on stack sizes for which the marking process just
bails out on or tries to recover from? It's worth considering, but that's extra
complexity that would be mostly orthogonal to your direct concerns here.

Have you consulted v8 GC people about the nature of the change proposed? Their
judgment/input would be valuable to have, regardless of conclusion.

haraken

On 2016/02/22 12:26:16, sof wrote: > On 2016/02/22 11:52:30, haraken wrote: > > On 2016/02/22 ...

4 years, 10 months ago (2016-02-22 13:41:57 UTC) #11

On 2016/02/22 12:26:16, sof wrote:
> On 2016/02/22 11:52:30, haraken wrote:
> > On 2016/02/22 11:39:57, sof wrote:
> > > On 2016/02/22 11:38:00, haraken wrote:
> > > > On 2016/02/16 13:48:37, sof wrote:
> > > > > On 2016/02/15 01:28:06, haraken wrote:
> > > > > >
> > > > > ...
> > > > > > 
> > > > > > FWIW, V8 pre-allocates only 256 KB of memory for the marking stack.
> > > > > 
> > > > > Been looking at that implementation to understand it some; it commits
> > 256k,
> > > > but
> > > > > reserves 4M for its marking deque. With the not-unusual bailout
handling
> > on
> > > > > marking stack overflow of just giving up and considered the rest
> > > > > marked&reachable rather than attempt to extend the stack further.
Oilpan
> > > could
> > > > > do something similar to avoid allocations during marking.
> > > > > 
> > > > > Reducing the initial size of the ephemeron stack seems reasonable
> without
> > > > > upsetting stability.
> > > > 
> > > > I want to reduce the size of CallbackStacks by the next branch cut. In
> > > > conclusion, what's your proposal?
> > > 
> > > Reduce the ephemeron stack a bit; anything else wouldn't be wise in my
view.
> > 
> > What about s_postMarkingCallbackStack?
> > 
> 
> I've found the depths for that stack to be considerable; have you investigated
> what's typically encountered?
> 
> > Reserving 2.3 MB of memory just for oilpan is not acceptable anyway. If you
> > really don't want to just reduce the size and collect UMAs, I can implement
> > whatever complicated mechanism (e.g., block pools, bailout a GC when the
> marking
> > stack overflows etc) to reduce the sizeof(CallbackStacks), but what do you
> want
> > exactly?
> 
> I don't want to compromise system stability; the size you quote compares
> favorably to what v8 reserves. "Hiding" a fixed reservation by attempting to
do
> it for every GC, which  proposals here reduce to afaict, I honestly don't see
> the merit of. But I've said that before.

If you worry about the system stability, I'd argue the other way. On Android, a
(foreground) renderer may be forcibly killed without receiving any pressure
notification. If we waste the 2.3 MB of memory, we increase the risk of the
force-kill. If this CL doesn't cause any allocation failure in reality, this CL
is doing a better thing. That's why I'm saying we should get data to see if the
allocation failure becomes a real issue or not.


> 
> Should there be hard limits on stack sizes for which the marking process just
> bails out on or tries to recover from? It's worth considering, but that's
extra
> complexity that would be mostly orthogonal to your direct concerns here.
> 
> Have you consulted v8 GC people about the nature of the change proposed? Their
> judgment/input would be valuable to have, regardless of conclusion.

I'll ask the V8 folks for advice.

haraken

I discussed this with Jochen and Hannes. - V8 is reserving 4 MB of address ...

4 years, 10 months ago (2016-02-23 14:55:24 UTC) #13

sof

On 2016/02/23 14:55:24, haraken wrote: > I discussed this with Jochen and Hannes. > > ...

4 years, 10 months ago (2016-02-23 15:18:37 UTC) #14

haraken

On 2016/02/23 15:18:37, sof wrote: > On 2016/02/23 14:55:24, haraken wrote: > > I discussed ...

4 years, 10 months ago (2016-02-23 16:52:07 UTC) #15

On 2016/02/23 15:18:37, sof wrote:
> On 2016/02/23 14:55:24, haraken wrote:
> > I discussed this with Jochen and Hannes.
> > 
> > - V8 is reserving 4 MB of address space and 256 KB of physical memory for
the
> > marking stack. V8 has a special logic for a case where the marking stack
> > overflows during the marking (as described in #6).
> > 
> > - In long term, Oilpan should do the same thing. Oilpan should implement a
> > memory pool shared by all CallbackStacks in the renderer process. The memory
> > pool should reserve X MB of address space and Y KB of physical memory, and
> > Oilpan should have a logic for a case where the stack overflows.
> > 
> > - In short term, we need some band-aid fix to reduce the 2.3 MB of physical
> > memory (in M50). One possible fix would be just to reduce the default size
of
> > the CallbackStacks (i.e., this CL) and add RELEASE_ASSERTs to understand how
> > often a GC fails at expanding the marking stack.
> > 
> > Any thoughts?
> 
> Thanks for doing that. Dynamic allocations during marking is not something to
> rely on, or do more of.
> 
> Unless we trust the OS' VMM to know better with how to handle a periodically
> used area that's accessed in a stack-like fashion, having one block reserved
to
> CallbackStacks that's decommit-hinted upon GC finish, is one thing to do
(i.e.,
> similar to what previously landed here.) UMA recording is preferable to
release
> asserts if you want to gauge high water marks on stack use?

I'm not sure how much we can rely on the OS (I suspect that if we explicitly say
"decommit it!", then I guess the OS will just decommit it regardless of the
historical access pattern on it), but I'm fine with trying the approach.

Maybe should we measure the performance of the previous CL
(https://codereview.chromium.org/1686943002/)?

> 
> (I don't understand what the term "physical memory" means in the above.)

I meant "committed memory" by "physical memory".

haraken

Yuta-san: Would you mind applying https://codereview.chromium.org/1686943002/ and measure smoothness.tough_animation_cases on mobile? We're curious if the ...

4 years, 10 months ago (2016-02-23 20:53:18 UTC) #17

sof

On 2016/02/23 20:53:18, haraken wrote: > Yuta-san: Would you mind applying https://codereview.chromium.org/1686943002/ > and measure ...

4 years, 10 months ago (2016-02-23 21:36:25 UTC) #18

Yuta Kitamura

On 2016/02/23 20:53:18, haraken wrote: > Yuta-san: Would you mind applying https://codereview.chromium.org/1686943002/ > and measure ...

4 years, 10 months ago (2016-02-24 09:51:17 UTC) #19

haraken

On 2016/02/24 09:51:17, Yuta Kitamura wrote: > On 2016/02/23 20:53:18, haraken wrote: > > Yuta-san: ...

4 years, 10 months ago (2016-02-24 09:55:15 UTC) #20

haraken

On 2016/02/24 09:55:15, haraken wrote: > On 2016/02/24 09:51:17, Yuta Kitamura wrote: > > On ...

4 years, 10 months ago (2016-02-25 09:22:23 UTC) #21

Yuta Kitamura

On 2016/02/25 09:22:23, haraken wrote: > On 2016/02/24 09:55:15, haraken wrote: > > On 2016/02/24 ...

4 years, 10 months ago (2016-02-25 09:53:04 UTC) #22

haraken

On 2016/02/25 09:53:04, Yuta Kitamura wrote: > On 2016/02/25 09:22:23, haraken wrote: > > On ...

4 years, 10 months ago (2016-02-25 10:04:53 UTC) #23

sof

Something to consider? As Blink does most of its Oilpan allocation on the main thread, ...

4 years, 9 months ago (2016-02-29 10:08:30 UTC) #24

haraken

On 2016/02/29 10:08:30, sof wrote: > Something to consider? As Blink does most of its ...

4 years, 9 months ago (2016-02-29 10:57:34 UTC) #25

sof

On 2016/02/29 10:57:34, haraken wrote: > On 2016/02/29 10:08:30, sof wrote: > > Something to ...

4 years, 9 months ago (2016-02-29 12:22:43 UTC) #26

sof

4 years, 9 months ago (2016-02-29 13:43:58 UTC) #27

On 2016/02/29 12:22:43, sof wrote:
> On 2016/02/29 10:57:34, haraken wrote:
> > On 2016/02/29 10:08:30, sof wrote:
> > > Something to consider? As Blink does most of its Oilpan allocation on the
> main
> > > thread, the thread-local callback stack reservation for other threads
> doesn't
> > > have to be equal to the main thread's.
> > 
> > What size do you prefer? Using CallbackStack::kMinimalBlockSize?
> > 
> > (As I said before, my preference is just removing the reservations and
adding
> > RELEASE_ASSERTs about OOM but using CallbackStack::kMinimalBlockSize is also
> > fine.)
> 
> To evaluate an appropriate initial block size, you need to look at the use of
> weak collections + registerWeakMembers() off the main thread. That might be
> limited to LifecycleNotifier<>'s weak set for workers.

https://codereview.chromium.org/1750553002/ does this. With plenty of extra
room, I'm now convinced.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages