Issue 1156863002: Oilpan: Tweak a GC threshold to consider memory increase in PartitionAlloc

Created:
5 years, 7 months ago by haraken

Modified:
5 years, 6 months ago

Reviewers:
oilpan-reviews, sof1, sof, Yuta Kitamura

CC:
blink-reviews, oilpan-reviews, kouhei+heap_chromium.org, Mads Ager (chromium)

Base URL:
svn://svn.chromium.org/blink/trunk

Target Ref:
refs/heads/master

Project:
blink

Visibility:
Public.

More Reviews

Description

Oilpan: Tweak a GC threshold to consider memory increase in PartitionAlloc There is a regression in peak memory usages in the memory.top_7_stress benchmark. The regression was (not all but mostly) fixed by making Oilpan consider the memory usage of PartitionAlloc (https://codereview.chromium.org/1063083002/). However, it seems that we unintentionally disabled the fix by landing https://codereview.chromium.org/1100283003/. Now we're hitting the peak memory regression again. This CL fixes the regression again. The problem is as follows: 1) The GC threshold in https://codereview.chromium.org/1063083002/ was problematic because Oilpan can trigger a GC before allocating any object on the heap. 2) https://codereview.chromium.org/1100283003/ fixed the problem by changing the GC threshold so that Oilpan doesn't trigger a GC until Oilpan allocates 1 MB of objects. 3) However, the GC threshold was problematic in another way -- if Oilpan allocates 800 KB of objects and the objects hold 30 MB of memory in PartitionAlloc, Oilpan doesn't trigger a GC and thus incrases the peak memory usage. 4) So this CL fixes the GC threshold so that: - Oilpan triggers a GC if Oilpan or/and PartitionAlloc has allocated 1 MB of objects. - But doesn't trigger a GC if the total allocated space of Oilpan is less than 1 MB. That way we can avoid the problems in both 1) and 2). We confirmed that the peak memory increase in memory.top_7_stress is (not completely but meaningfully) fixed with this CL. (The remaining regression seems to be caused by another reason, which needs more investigation.) BUG=474470 Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=195891

Patch Set 1 #

Total comments: 7

Patch Set 2 : #

Patch Set 3 : #

Total comments: 1

Created: 5 years, 6 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+10 lines, -12 lines)			Patch
	M	Source/platform/heap/ThreadState.cpp	View	1 2	3 chunks	+10 lines, -12 lines	1 comment	Download

Messages

Total messages: 37 (2 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

haraken

Sigbjornf: PTAL. If you have no objection, I might want to land this. yutak@ confirmed ...

5 years, 7 months ago (2015-05-25 09:30:59 UTC) #2

sof

Thanks for the clear & detailed description. I think it is fine to impose a ...

5 years, 7 months ago (2015-05-25 21:37:51 UTC) #3

haraken

https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp#newcode519 Source/platform/heap/ThreadState.cpp:519: && currentObjectSize >= 1024 * 1024 // Oilpan or/and ...

5 years, 7 months ago (2015-05-25 23:31:53 UTC) #4

sof

lgtm to go ahead with this; it does run the risk of GCing rather quickly ...

5 years, 7 months ago (2015-05-26 05:10:59 UTC) #5

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1156863002/1

5 years, 7 months ago (2015-05-26 05:13:09 UTC) #7

commit-bot: I haz the power

Committed patchset #1 (id:1) as https://src.chromium.org/viewvc/blink?view=rev&revision=195891

5 years, 7 months ago (2015-05-26 06:29:31 UTC) #8

sof

On 2015/05/26 05:10:59, sof wrote: > lgtm to go ahead with this; it does run ...

5 years, 7 months ago (2015-05-26 19:08:32 UTC) #9

sof

https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp#newcode544 Source/platform/heap/ThreadState.cpp:544: && currentObjectSize >= 1024 * 1024 // Oilpan or/and ...

5 years, 7 months ago (2015-05-26 19:25:37 UTC) #10

haraken

yuta-san: Would you re-confirm that the peak memory usage is (partially) gone with this CL? ...

5 years, 7 months ago (2015-05-27 00:39:01 UTC) #11

sof

https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp#newcode544 Source/platform/heap/ThreadState.cpp:544: && currentObjectSize >= 1024 * 1024 // Oilpan or/and ...

5 years, 7 months ago (2015-05-27 05:09:07 UTC) #12

haraken

(Either way, I'll wait for yutak's investigation before taking next actions.) https://codereview.chromium.org/1156863002/diff/1/Source/platform/heap/ThreadState.cpp File Source/platform/heap/ThreadState.cpp (right): ...

5 years, 7 months ago (2015-05-27 05:50:03 UTC) #13

sof

On 2015/05/27 05:50:03, haraken wrote: > (Either way, I'll wait for yutak's investigation before taking ...

5 years, 7 months ago (2015-05-27 05:57:28 UTC) #14

haraken

On 2015/05/27 05:57:28, sof wrote: > On 2015/05/27 05:50:03, haraken wrote: > > (Either way, ...

5 years, 7 months ago (2015-05-27 05:59:45 UTC) #15

sof1

On Wed, May 27, 2015 at 7:59 AM, <haraken@chromium.org> wrote: > .... > > >> ...

5 years, 6 months ago (2015-05-27 21:47:02 UTC) #16

haraken

On 2015/05/27 21:47:02, sof1 wrote: > On Wed, May 27, 2015 at 7:59 AM, <mailto:haraken@chromium.org> ...

5 years, 6 months ago (2015-05-28 01:33:36 UTC) #17

haraken

Updated the CL. yuta-san: Would you confirm if this CL (partially) fixes the peak memory ...

5 years, 6 months ago (2015-05-28 02:30:45 UTC) #18

haraken

On 2015/05/28 02:30:45, haraken wrote: > Updated the CL. > > yuta-san: Would you confirm ...

5 years, 6 months ago (2015-06-04 04:33:00 UTC) #19

Yuta Kitamura

On 2015/06/04 04:33:00, haraken wrote: > yuta-san: Can you put your data on memory.top_7_stress here? ...

5 years, 6 months ago (2015-06-04 05:09:29 UTC) #20

haraken

On 2015/06/04 05:09:29, Yuta Kitamura wrote: > On 2015/06/04 04:33:00, haraken wrote: > > yuta-san: ...

5 years, 6 months ago (2015-06-04 05:19:32 UTC) #21

sof

On 2015/06/04 05:19:32, haraken wrote: > On 2015/06/04 05:09:29, Yuta Kitamura wrote: > > On ...

5 years, 6 months ago (2015-06-04 08:15:27 UTC) #22

haraken

On 2015/06/04 08:15:27, sof wrote: > On 2015/06/04 05:19:32, haraken wrote: > > On 2015/06/04 ...

5 years, 6 months ago (2015-06-09 11:07:31 UTC) #23

sof

On 2015/06/09 11:07:31, haraken wrote: > On 2015/06/04 08:15:27, sof wrote: > > On 2015/06/04 ...

5 years, 6 months ago (2015-06-09 11:59:50 UTC) #24

haraken

On 2015/06/09 11:59:50, sof wrote: > On 2015/06/09 11:07:31, haraken wrote: > > On 2015/06/04 ...

5 years, 6 months ago (2015-06-09 12:19:17 UTC) #25

sof

On 2015/06/09 12:19:17, haraken wrote: > On 2015/06/09 11:59:50, sof wrote: > > On 2015/06/09 ...

5 years, 6 months ago (2015-06-09 12:29:46 UTC) #26

On 2015/06/09 12:19:17, haraken wrote:
> On 2015/06/09 11:59:50, sof wrote:
> > On 2015/06/09 11:07:31, haraken wrote:
> > > On 2015/06/04 08:15:27, sof wrote:
> > > > On 2015/06/04 05:19:32, haraken wrote:
> > > > > On 2015/06/04 05:09:29, Yuta Kitamura wrote:
> > > > > > On 2015/06/04 04:33:00, haraken wrote:
> > > > > > > yuta-san: Can you put your data on memory.top_7_stress here?
> > > > > > 
> > > > > >
> > > >
> >
https://www.googledrive.com/host/0B6XhrzUCq5lhUjloNWR3b0xxRXM/2015/mem60.html
> > > > > > 
> > > > > > v = vanilla, o = oilpan, o-mem = oilpan with this patch
> > > > > > 
> > > > > > This patch definitely improves the peak memory usage of Blogger,
> > > > > > and the other pages are unaffected.
> > > > > 
> > > > > Sigbjorn: are you ok to land this?
> > > > 
> > > > I'm seeing dom-modify.html OOM quite consistently with this GC policy
(on
> > > > win32.)
> > > > 
> > > > It might be counter-intuitive, but if I keep the conservative GC policy
as
> > it
> > > is
> > > > @ ToT, the OOMs are rare (but not gone completely.)
> > > > 
> > > > Is the threshold used here for the conservative GC something you need
for
> > > > memory.top_7_stress, or was it done for consistency?
> > > > 
> > > > Another question (sorry :) ), did the change to when idle GCs are
> scheduled
> > > make
> > > > a difference?
> > > 
> > > Thanks to Yuta-san's investigation, it turned out that this CL is not
> related
> > to
> > > the crash of dom-modify.html. That crash will be taken care of by
> > > https://codereview.chromium.org/1174463003/.
> > 
> > I'm seeing OOMs and other PartitionAlloc crashes on Win32, but there is a
> > separate ImageLoader-specific that shows up later on (at times.)
> 
> Thanks for the result!
> 
> It seems strange that the OOM is caused by this CL, because this CL should
> increase the GC frequency. Maybe the OOM is already happening in ToT?

As I said in the above, it happens more often with this CL (which isn't
intuitive.)

sof

After having poked at this a bit, I wonder if the near-OOM handling that triggers ...

5 years, 6 months ago (2015-06-09 15:49:56 UTC) #27

sof

On 2015/06/09 15:49:56, sof wrote: > After having poked at this a bit, I wonder ...

5 years, 6 months ago (2015-06-09 21:49:07 UTC) #28

haraken

On 2015/06/09 21:49:07, sof wrote: > On 2015/06/09 15:49:56, sof wrote: > > After having ...

5 years, 6 months ago (2015-06-10 01:50:50 UTC) #29

haraken

On 2015/06/10 01:50:50, haraken wrote: > On 2015/06/09 21:49:07, sof wrote: > > On 2015/06/09 ...

5 years, 6 months ago (2015-06-10 04:56:14 UTC) #30

On 2015/06/10 01:50:50, haraken wrote:
> On 2015/06/09 21:49:07, sof wrote:
> > On 2015/06/09 15:49:56, sof wrote:
> > > After having poked at this a bit, I wonder if the near-OOM handling that
> > > triggers a conservative GC needs to be stronger. i.e., it also needs to
> force
> > a
> > > v8 major GC.
> > > 
> > 
> > Looked at it some more this evening, some observations:
> > 
> >  - createTextNode is the subtest that's the memory hog.
> 
> createTextNode creates a lot of Strings in PA without allocating many objects
in
> Oilpan. Those Strings are not reclaimed until Oilpan schedules a GC. Maybe is
it
> possible that outOfLineAllocate is not called and thus Oilpan doesn't have a
> chance to schedule a GC? Or outOfLineAllocate is called but scheduleGCIfNeeded
> is not called (because Oilpan succeeds in allocating objects before
> completeSweep())?
> 
> >  - serious amount of allocation under-reporting; upon OOM, I see 2.5G of
> > committed memory, but ~300M of recorded allocations across Oilpan and PA.
> 
> Does this mean that currentObjectSize in shouldForceConservativeGC is 300 MB
> whereas you observe 2.5 GB of committed memory in the OS?
> 
> >  - The PA super pages allocation size is about 50% larger than the committed
> > page size
> 
> This might make sense. PA allocates super pages very aggressively (which is
> another issue).
> 
> >  - Oilpan allocated space is continuously growing, despite the live object
> size
> > fluctuating across GCs.
> 
> Does this mean that we have a fragmentation in the heap?
> 
> >  - Not halving the estimated object size after v8 major GCs doesn't have an
> > observable impact.
> 
> 
> Either way, thanks for all the investigations. I'll try to reproduce the
memory
> bloat in Linux and look into the issue.

I noticed that this would be a regression caused by
https://codereview.chromium.org/1161903004/.

Before the CL, estimatedLiveObjectSize is halved at each minor and major GC.
After the CL, estimatedLiveObjectSize is halved only at each major GC.

In the createTextNode benchmark, most TextNode wrappers are collected in minor
GCs. So, not halving estimatedLiveObjectSize at minor GCs will end up delaying
Oilpan's GC too much, which can cause OOM.

Hmm. Let me think about it a bit more.

sof

On 2015/06/10 04:56:14, haraken wrote: > On 2015/06/10 01:50:50, haraken wrote: > > On 2015/06/09 ...

5 years, 6 months ago (2015-06-10 06:50:25 UTC) #31

On 2015/06/10 04:56:14, haraken wrote:
> On 2015/06/10 01:50:50, haraken wrote:
> > On 2015/06/09 21:49:07, sof wrote:
> > > On 2015/06/09 15:49:56, sof wrote:
> > > > After having poked at this a bit, I wonder if the near-OOM handling that
> > > > triggers a conservative GC needs to be stronger. i.e., it also needs to
> > force
> > > a
> > > > v8 major GC.
> > > > 
> > > 
> > > Looked at it some more this evening, some observations:
> > > 
> > >  - createTextNode is the subtest that's the memory hog.
> > 
> > createTextNode creates a lot of Strings in PA without allocating many
objects
> in
> > Oilpan. Those Strings are not reclaimed until Oilpan schedules a GC. Maybe
is
> it
> > possible that outOfLineAllocate is not called and thus Oilpan doesn't have a
> > chance to schedule a GC? Or outOfLineAllocate is called but
scheduleGCIfNeeded
> > is not called (because Oilpan succeeds in allocating objects before
> > completeSweep())?
> > 
> > >  - serious amount of allocation under-reporting; upon OOM, I see 2.5G of
> > > committed memory, but ~300M of recorded allocations across Oilpan and PA.
> > 
> > Does this mean that currentObjectSize in shouldForceConservativeGC is 300 MB
> > whereas you observe 2.5 GB of committed memory in the OS?
> > 
> > >  - The PA super pages allocation size is about 50% larger than the
committed
> > > page size
> > 
> > This might make sense. PA allocates super pages very aggressively (which is
> > another issue).
> > 
> > >  - Oilpan allocated space is continuously growing, despite the live object
> > size
> > > fluctuating across GCs.
> > 
> > Does this mean that we have a fragmentation in the heap?
> > 
> > >  - Not halving the estimated object size after v8 major GCs doesn't have
an
> > > observable impact.
> > 
> > 
> > Either way, thanks for all the investigations. I'll try to reproduce the
> memory
> > bloat in Linux and look into the issue.
> 
> I noticed that this would be a regression caused by
> https://codereview.chromium.org/1161903004/.
> 
> Before the CL, estimatedLiveObjectSize is halved at each minor and major GC.
> After the CL, estimatedLiveObjectSize is halved only at each major GC.
> 
> In the createTextNode benchmark, most TextNode wrappers are collected in minor
> GCs. So, not halving estimatedLiveObjectSize at minor GCs will end up delaying
> Oilpan's GC too much, which can cause OOM.
> 
> Hmm. Let me think about it a bit more.

The other way around? Aggressively reducing the estimate will delay GCs. 

If you enable halving for minor GCs also, the estimated heap sizes become quite
clearly wrong & tiny. But even when doing it only for major GCs (i.e., what ToT
does), it gets to be too low just before OOM hits, as the sequence of major GCs
that v8 triggers then halves the heap sizes too much. It's almost as if the
reduction factor should be a function of how Oilpan allocation amounts since the
last v8 GC..

If no halving of the estimated heap size is done, it still OOMs, so the handling
of that is not the main&only factor.

(The structure of dom-modify.html is deceptive, it suggests a set of independent
micro benchmarks, but there's an outer loop over them, so allocations made by
all subtests play a part here in triggering the OOM condition.)

sof

On 2015/06/10 01:50:50, haraken wrote: > On 2015/06/09 21:49:07, sof wrote: > ... > > ...

5 years, 6 months ago (2015-06-10 07:23:33 UTC) #32

Yuta Kitamura

Re: dom-modify.html structure, The outer loop doesn't enclose the test functions below (note the missing ...

5 years, 6 months ago (2015-06-10 07:24:42 UTC) #33

sof

On 2015/06/10 07:23:33, sof wrote: > On 2015/06/10 01:50:50, haraken wrote: > > On 2015/06/09 ...

5 years, 6 months ago (2015-06-10 07:51:38 UTC) #34

haraken

> The other way around? Aggressively reducing the estimate will delay GCs. I still think ...

5 years, 6 months ago (2015-06-10 08:16:01 UTC) #35

sof

On 2015/06/10 08:16:01, haraken wrote: > > The other way around? Aggressively reducing the estimate ...

5 years, 6 months ago (2015-06-10 10:49:10 UTC) #36

On 2015/06/10 08:16:01, haraken wrote:
> > The other way around? Aggressively reducing the estimate will delay GCs. 
> 
> I still think my description is correct, if I'm not missing something.
> 
> > 
> > If you enable halving for minor GCs also, the estimated heap sizes become
> quite
> > clearly wrong & tiny. But even when doing it only for major GCs (i.e., what
> ToT
> > does), it gets to be too low just before OOM hits, as the sequence of major
> GCs
> > that v8 triggers then halves the heap sizes too much.
> 
> This doesn't happen in dom-modify because most TextNode wrappers are collected
> in minor GCs and thus major GCs are rarely scheduled. As far as I measure, the
> memory usage of PA goes up to 2 GB without any major GC scheduled.
> 
> > If no halving of the estimated heap size is done, it still OOMs, so the
> handling
> > of that is not the main&only factor.
> 
> I think that the didV8GC handling is the main factor of the OOM, but there is
> another problem. When we finish completeSweep(), ideally we want to set the
> estimated size to:
> 
>   Heap::markedObjectSize() + (The amount of memory in PA that was reachable
from
> the marked objects in Oilpan)
> 
> However, we are currently setting the estimated size to:
> 
>   Heap::markedObjectSize() + Heap::externalObjectSizeAtLastGC()
> 
> This is not correct because Heap::externalObjectSizeAtLastGC() includes memory
> that has been collected during the lazy sweeping. The difference can be very
big
> if the lazy sweeping has collected a lot of objects in PA.
> 
> We cannot use Partitions::totalSizeOfCommittedPages() either, because it
> includes memory that has been allocated during the lazy sweeping. The
difference
> can be very big if PA allocates a lot of objects during the lazy sweeping (in
> fact this happens in dom-modify).
> 
> We might want to use min(Heap::externalObjectSizeAtLastGC(),
> Partitions::totalSizeOfCommittedPages()), but this is not correct either.
> 
> Hmm, I need more time to suggest something reasonable.

Random thought: record the totalSizeOfCommittedPages() delta across a page sweep
& subtract that from the estimate?

sof

5 years, 6 months ago (2015-06-10 21:52:43 UTC) #37

On 2015/06/10 07:51:38, sof wrote:
> On 2015/06/10 07:23:33, sof wrote:
> > On 2015/06/10 01:50:50, haraken wrote:
> > > On 2015/06/09 21:49:07, sof wrote:
> > >
> > ...
> > > 
> > > >  - serious amount of allocation under-reporting; upon OOM, I see 2.5G of
> > > > committed memory, but ~300M of recorded allocations across Oilpan and
PA.
> > > 
> > > Does this mean that currentObjectSize in shouldForceConservativeGC is 300
MB
> > > whereas you observe 2.5 GB of committed memory in the OS?
> > > 
> > 
> > Yes, 2.5G of committed memory (as reported by vmmap on a Release binary once
> OOM
> > has struck). PA has ~120M of reported allocations at that point, so someone
is
> > either under-reporting or another allocator is claiming a lot.
> > 
> > Regarding accurately reporting allocations, isn't some of that missing from
> > NormalPageHeap::expandObject() (and shrinkObject())?
> > 
> > > >  - The PA super pages allocation size is about 50% larger than the
> committed
> > > > page size
> > > 
> > > This might make sense. PA allocates super pages very aggressively (which
is
> > > another issue).
> > 
> > It's not unbearable in this case; just wanted to verify that there wasn't
> > something pathological going on.
> > 
> > > 
> > > >  - Oilpan allocated space is continuously growing, despite the live
object
> > > size
> > > > fluctuating across GCs.
> > > 
> > > Does this mean that we have a fragmentation in the heap?
> > > 
> > 
> > I need to spend my time/days on matters other than Oilpan, so will not be
able
> > to investigate this, but let me make available in a spreadsheet the sequence
> of
> > values that shouldForceConservativeGC() works over.
> 
> Not too interesting bundle of raw data, but
> 
> 
>
https://docs.google.com/spreadsheets/d/1h7QCUl1-g40MgWIUSDjuo26X3G4zlGpKZiP4z...
> 
> shows how the amounts evolve.

That log stops where it gets interesting, but no Oilpan GCs end up being run (or
checked for). A series of major v8 GCs do go ahead though as the memory pressure
increases rapidly until OOM hits.

https://codereview.chromium.org/1174123002/ addresses & explains.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages