Issue 1316873004: heap: make array buffer maps disjoint

indutny

fedor.indutny@gmail.com changed reviewers: + fedor.indutny@gmail.com, mlippautz@chromium.org

5 years, 3 months ago (2015-08-26 23:55:01 UTC) #1

fedor.indutny

I'm no longer sure, which account was used to submit this CL. Emailing from the ...

5 years, 3 months ago (2015-08-27 05:34:29 UTC) #3

Michael Lippautz

mlippautz@chromium.org changed reviewers: + hpayer@chromium.org - fedor.indutny@gmail.com, svenpanne@google.com

5 years, 3 months ago (2015-08-27 11:22:13 UTC) #4

Michael Lippautz

-svenpanne, -indutny, +hpayer Generally fine; How about factoring out the whole logic into an ArrayBufferTracker ...

5 years, 3 months ago (2015-08-27 11:22:13 UTC) #5

fedor.indutny

Thank you, Michael! I don't mind moving it into a separate class. But maybe I ...

5 years, 3 months ago (2015-08-27 18:38:37 UTC) #6

Michael Lippautz

Hej, Thanks for highlighting hickups and taking the effort to improve on this! Let's leave ...

5 years, 3 months ago (2015-08-28 08:05:02 UTC) #7

Hej,

Thanks for highlighting hickups and taking the effort to improve on this! Let's
leave the refactoring in a separate class for another issue. 

Can you share a bit more on the bottleneck in node.js when this code is
involved? Any workloads or benchmarks you are tracking?

In any case, if tracking the buffers results in a performance issue we should
definitely take the effort of having a closer look. In general, the XXXHelper()
functions don't make much sense as they are either one liners, or hide
additional loops.

The FreeDeadArrayBuffers() method is in particular messy. During inlining I
found a way to simplify this a lot, in the end resulting only in a single loop
over not-yet-discovered-buffers (which is a different map depending on the
caller).

I think we should go this way as otherwise the logic is too complicate to see
what's going on.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc
File src/heap/heap.cc (right):

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1739
src/heap/heap.cc:1739: void Heap::RegisterNewArrayBufferHelper(std::map<void*,
size_t>& live_buffers,
Please get rid of this function and inline it into RegisterNewArrayBuffer.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1745
src/heap/heap.cc:1745: void Heap::UnregisterArrayBufferHelper(
Ditto, please inline the call manually.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1754
src/heap/heap.cc:1754: void Heap::RegisterLiveArrayBufferHelper(
Please get rid of this function and inline it into RegisterLiveArrayBuffer.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1760
src/heap/heap.cc:1760: size_t Heap::FreeDeadArrayBuffersHelper(
We should get rid of this one. See below.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1784
src/heap/heap.cc:1784: void Heap::TearDownArrayBuffersHelper(
Also inline please.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1834
src/heap/heap.cc:1834: void Heap::FreeDeadArrayBuffers(bool from_scavenge) {
The logic here is way too complicated as the semantic is to maintain two
distinct sets for array buffers.

I inlined, reshuffled and consolidated the calls. I think we should get rid of
the helper method as the actual logic seems to be pretty simple. 

Please check the simplification. I am using .erase() with buffers (there exists
an overload that takes a map). The loops (over maps) get unfolded into a single
loop in the last step.


======== Step 0 (baseline; inlined FreeDeadArrayBuffersHelper call from patch)
======== 

FreeDeadArrayBuffers(from_scavenge):
  if from_scavenge:
   
not_yet_discovered_array_buffers_.erase(not_yet_discovered_array_buffers_for_scavenge_)
  else:
    live_array_buffers_for_scavenge_.erase(not_yet_discovered_array_buffers_)

  # inlined FreeDeadArrayBuffersHelper
  # parameters are unfolded here
  if from_scavenge:
    not_yet_discovered_buffers = not_yet_discovered_array_buffers_for_scavenge_
  else:
    not_yet_discovered_buffers = not_yet_discovered_array_buffers_

  for buffer in not_yet_discovered_buffers:
    isolate.free_array_buffer(buffer)
    if (!from_scavenge):
      live_array_buffers_.erase(buffer)
    live_array_buffers_for_scavenge_.erase(buffer)

  if (from_scavenge):
    not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
  else:
    not_yet_discovered_buffers = live_array_buffers_;
    not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge.begin(),
...end())


======== Step 1 (shuffle inlined stuff around) ======== 

FreeDeadArrayBuffers(from_scavenge):
  if from_scavenge:
   
not_yet_discovered_array_buffers_.erase(not_yet_discovered_array_buffers_for_scavenge_)
    not_yet_discovered_buffers = not_yet_discovered_array_buffers_for_scavenge_
    live_array_buffers_for_scavenge_.erase(not_yet_discovered_buffers)
  else:
    live_array_buffers_for_scavenge_.erase(not_yet_discovered_array_buffers_)
    not_yet_discovered_buffers = not_yet_discovered_array_buffers_
    live_array_buffers_for_scavenge_.erase(not_yet_discovered_buffers)
    live_array_buffers_.erase(not_yet_discovered_buffers)

  # rest of inlined FreeDeadArrayBuffersHelper

  for buffer in not_yet_discovered_buffers:
    isolate.free_array_buffer(buffer)

  if (from_scavenge):
    not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
  else:
    not_yet_discovered_buffers = live_array_buffers_;
    not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)


======== Step 2 (re-orde / remove duplicates) ======== 

FreeDeadArrayBuffers(from_scavenge):
  if from_scavenge:
    tmp_not_yet_discovered_buffers =
not_yet_discovered_array_buffers_for_scavenge_
    not_yet_discovered_array_buffers_.erase((tmp_not_yet_discovered_buffers)
    live_array_buffers_for_scavenge_.erase(tmp_not_yet_discovered_buffers)
  else:
    tmp_not_yet_discovered_buffers = not_yet_discovered_array_buffers_
    live_array_buffers_for_scavenge_.erase(tmp_not_yet_discovered_buffers)
    live_array_buffers_.erase(tmp_not_yet_discovered_buffers)

  # rest of inlined FreeDeadArrayBuffersHelper

  for buffer in tmp_not_yet_discovered_buffers:
    isolate.free_array_buffer(buffer)

  if (from_scavenge):
    not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
  else:
    not_yet_discovered_buffers = live_array_buffers_;
    not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)


======== Step 3 (consolidate into a single loop) ======== 

FreeDeadArrayBuffers(from_scavenge):
  if from_scavenge:
    tmp_not_yet_discovered_buffers =
not_yet_discovered_array_buffers_for_scavenge_
  else:
    tmp_not_yet_discovered_buffers = not_yet_discovered_array_buffers_

  for buffer in tmp_not_yet_discovered_buffers:
    isolate.free_array_buffer(buffer)
    live_array_buffers_for_scavenge_.erase(buffer)
    if (from_scavenge):
      not_yet_discovered_array_buffers_.erase(buffer)
    else:
      live_array_buffers_.erase(tmp_not_yet_discovered_buffers)

  if (from_scavenge):
    not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
  else:
    not_yet_discovered_buffers = live_array_buffers_;
    not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)

fedor.indutny

Hello again, Thanks for the thorough review! Will fix everything mentioned as soon as possible, ...

5 years, 3 months ago (2015-08-28 09:48:40 UTC) #8

fedor.indutny

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc File src/heap/heap.cc (right): https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1784 src/heap/heap.cc:1784: void Heap::TearDownArrayBuffersHelper( On 2015/08/28 08:05:02, Michael Lippautz wrote: > ...

5 years, 3 months ago (2015-08-28 10:02:37 UTC) #9

Michael Lippautz

On 2015/08/28 09:48:40, fedor.indutny wrote: > Hello again, > > Thanks for the thorough review! ...

5 years, 3 months ago (2015-08-28 10:47:17 UTC) #10

Michael Lippautz

On 2015/08/28 10:02:37, fedor.indutny wrote: > https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc > File src/heap/heap.cc (right): > > https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1784 > ...

5 years, 3 months ago (2015-08-28 10:48:10 UTC) #11

fedor.indutny

Inlined everything, thank you again! (Sorry, going to sleep now. Hope it looks good now ...

5 years, 3 months ago (2015-08-28 11:03:34 UTC) #12

Inlined everything, thank you again!

(Sorry, going to sleep now. Hope it looks good now ;) )

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc
File src/heap/heap.cc (right):

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1739
src/heap/heap.cc:1739: void Heap::RegisterNewArrayBufferHelper(std::map<void*,
size_t>& live_buffers,
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> Please get rid of this function and inline it into RegisterNewArrayBuffer.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1745
src/heap/heap.cc:1745: void Heap::UnregisterArrayBufferHelper(
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> Ditto, please inline the call manually.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1754
src/heap/heap.cc:1754: void Heap::RegisterLiveArrayBufferHelper(
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> Please get rid of this function and inline it into RegisterLiveArrayBuffer.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1760
src/heap/heap.cc:1760: size_t Heap::FreeDeadArrayBuffersHelper(
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> We should get rid of this one. See below.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1784
src/heap/heap.cc:1784: void Heap::TearDownArrayBuffersHelper(
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> Also inline please.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/1/src/heap/heap.cc#newcode1834
src/heap/heap.cc:1834: void Heap::FreeDeadArrayBuffers(bool from_scavenge) {
On 2015/08/28 08:05:02, Michael Lippautz wrote:
> The logic here is way too complicated as the semantic is to maintain two
> distinct sets for array buffers.
> 
> I inlined, reshuffled and consolidated the calls. I think we should get rid of
> the helper method as the actual logic seems to be pretty simple. 
> 
> Please check the simplification. I am using .erase() with buffers (there
exists
> an overload that takes a map). The loops (over maps) get unfolded into a
single
> loop in the last step.
> 
> 
> ======== Step 0 (baseline; inlined FreeDeadArrayBuffersHelper call from patch)
> ======== 
> 
> FreeDeadArrayBuffers(from_scavenge):
>   if from_scavenge:
>    
>
not_yet_discovered_array_buffers_.erase(not_yet_discovered_array_buffers_for_scavenge_)
>   else:
>     live_array_buffers_for_scavenge_.erase(not_yet_discovered_array_buffers_)
> 
>   # inlined FreeDeadArrayBuffersHelper
>   # parameters are unfolded here
>   if from_scavenge:
>     not_yet_discovered_buffers =
not_yet_discovered_array_buffers_for_scavenge_
>   else:
>     not_yet_discovered_buffers = not_yet_discovered_array_buffers_
> 
>   for buffer in not_yet_discovered_buffers:
>     isolate.free_array_buffer(buffer)
>     if (!from_scavenge):
>       live_array_buffers_.erase(buffer)
>     live_array_buffers_for_scavenge_.erase(buffer)
> 
>   if (from_scavenge):
>     not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
>   else:
>     not_yet_discovered_buffers = live_array_buffers_;
>     not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge.begin(),
> ...end())
> 
> 
> ======== Step 1 (shuffle inlined stuff around) ======== 
> 
> FreeDeadArrayBuffers(from_scavenge):
>   if from_scavenge:
>    
>
not_yet_discovered_array_buffers_.erase(not_yet_discovered_array_buffers_for_scavenge_)
>     not_yet_discovered_buffers =
not_yet_discovered_array_buffers_for_scavenge_
>     live_array_buffers_for_scavenge_.erase(not_yet_discovered_buffers)
>   else:
>     live_array_buffers_for_scavenge_.erase(not_yet_discovered_array_buffers_)
>     not_yet_discovered_buffers = not_yet_discovered_array_buffers_
>     live_array_buffers_for_scavenge_.erase(not_yet_discovered_buffers)
>     live_array_buffers_.erase(not_yet_discovered_buffers)
> 
>   # rest of inlined FreeDeadArrayBuffersHelper
> 
>   for buffer in not_yet_discovered_buffers:
>     isolate.free_array_buffer(buffer)
> 
>   if (from_scavenge):
>     not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
>   else:
>     not_yet_discovered_buffers = live_array_buffers_;
>     not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)
> 
> 
> ======== Step 2 (re-orde / remove duplicates) ======== 
> 
> FreeDeadArrayBuffers(from_scavenge):
>   if from_scavenge:
>     tmp_not_yet_discovered_buffers =
> not_yet_discovered_array_buffers_for_scavenge_
>     not_yet_discovered_array_buffers_.erase((tmp_not_yet_discovered_buffers)
>     live_array_buffers_for_scavenge_.erase(tmp_not_yet_discovered_buffers)
>   else:
>     tmp_not_yet_discovered_buffers = not_yet_discovered_array_buffers_
>     live_array_buffers_for_scavenge_.erase(tmp_not_yet_discovered_buffers)
>     live_array_buffers_.erase(tmp_not_yet_discovered_buffers)
> 
>   # rest of inlined FreeDeadArrayBuffersHelper
> 
>   for buffer in tmp_not_yet_discovered_buffers:
>     isolate.free_array_buffer(buffer)
> 
>   if (from_scavenge):
>     not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
>   else:
>     not_yet_discovered_buffers = live_array_buffers_;
>     not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)
> 
> 
> ======== Step 3 (consolidate into a single loop) ======== 
> 
> FreeDeadArrayBuffers(from_scavenge):
>   if from_scavenge:
>     tmp_not_yet_discovered_buffers =
> not_yet_discovered_array_buffers_for_scavenge_
>   else:
>     tmp_not_yet_discovered_buffers = not_yet_discovered_array_buffers_
> 
>   for buffer in tmp_not_yet_discovered_buffers:
>     isolate.free_array_buffer(buffer)
>     live_array_buffers_for_scavenge_.erase(buffer)
>     if (from_scavenge):
>       not_yet_discovered_array_buffers_.erase(buffer)
>     else:
>       live_array_buffers_.erase(tmp_not_yet_discovered_buffers)
> 
>   if (from_scavenge):
>     not_yet_discovered_buffers = live_array_buffers_for_scavenge_;
>   else:
>     not_yet_discovered_buffers = live_array_buffers_;
>     not_yet_discovered_buffers.insert(live_array_buffers_for_scavenge_)
> 
> 
> 

Acknowledged.

Michael Lippautz

Fedor, Getting there! Now we can actually see what's happening :) The code will get ...

5 years, 3 months ago (2015-08-28 14:25:31 UTC) #13

Fedor,

Getting there! Now we can actually see what's happening :)

The code will get much better (and I suspect much faster). The main idea is that
we only process XXX_scavenge_ maps during scavenges and everything during the
full GC.

Let me also note that V8’s new generation (scavenger) is a semi-space GC where
we copy once before we promote to old space. As a result we should not move
references from the scavenge map to the full map after the
FreeDeadArrayBuffers() call, but only on the PromoteArrayBuffer() call.

So what we really want is
  live_array_buffers_
  not_yet_discovered_array_buffers_
and
  live_array_buffers_for_scavenge_
  not_yet_discovered_array_for_buffers_scavenge_
to be completely disjoint.

Then we can process only XXX_scavenge_ during a scavenge and everything
(including XXX_scavenge_) during a full GC. Rest of the comments inlined.

Can you then post the results of the changed behavior vs tip-of-tree ?

Also, please remove Sven from the review field (R=).

Thanks a lot!

-Michael

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc
File src/heap/heap.cc (right):

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1752: void Heap::UnregisterArrayBuffer(bool in_new_space, void*
data) {
We can simplify the main part a lot:

std::map<void*, size_t>*live_buffers = 
    in_new_space ? &live_array_buffers_for_scavenge_ : &live_array_buffers_;
std::map<void*, size_t>* not_yet_discovered_buffers = 
    in_new_space ? &not_yet_discovered_buffers_for_scavenge_ :
&not_yet_discovered_buffers_;
DCHECK(live_array_buffers->count(data) > 0);
live_array_buffers->erase(data);
not_yet_discovered_array_buffers->erase(data);

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1761: not_yet_discovered_array_buffers_.erase(data);
This  .erase() will not be part of the method as we keep the sets disjoint.

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1784: void Heap::FreeDeadArrayBuffers(bool from_scavenge) {
Since we keep the sets disjoint, we only need to visit XXX_scavenge_ for the new
space part and in the other case we visit all of them:

for buffer in not_yet_discovered_buffers_scavenge:
  isolate.free(buffer)
  freed_memory += …
  live_array_buffers_for_scavenge_.clear(buffer)
if !from_scavenge:
  for buffer in not_yet_discovered_buffers_:
    isolate.free(buffer)
    freed_memory += …
    live_array_buffers_.clear(buffer)


not_yet_discovered_buffers_for_scavenge_ = live_array_buffers_for_scavenge_
if !from_scavenge:
  not_yet_discovered_buffers_ = live_array_buffers_

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1828: void Heap::TearDownArrayBuffers() {
nit: While practically not relevant, we should still tear down gentle and record
the freed memory (like in FreeDeadArrayBuffers).

e.g.
  freed_memory += buffer.second;
throughout the loop

and 
  AdjustAmountOfExternalAllocatedMemory(freed_memory); 
in the end.

fedor.indutny

The results of benchmarks are: ./node-slow 1 && ./node-no-inline-fast 1 && ./node-fast 1 4997.4 ns/op ...

5 years, 3 months ago (2015-08-28 20:49:41 UTC) #14

The results of benchmarks are:

./node-slow 1 && ./node-no-inline-fast 1 && ./node-fast 1
4997.4 ns/op
4701.6 ns/op
4685.7 ns/op

NOTE: `no-inline-fast` is the initial version of patch, `fast` - is current,
`slow` - vanilla v8.

The other thing that I have noticed is that there are not much scavenges
happening:

[65202:0x101804c00]       22 ms: Scavenge 1.9 (38.0) -> 1.8 (38.0) MB, 0.6 ms
[allocation failure].
[65202:0x101804c00]       23 ms: Scavenge 2.0 (38.0) -> 2.0 (39.0) MB, 0.5 ms
[allocation failure].
[65202:0x101804c00]       48 ms: Scavenge 3.5 (39.0) -> 3.1 (39.0) MB, 3.4 ms
[allocation failure].
[65202:0x101804c00]       57 ms: Mark-sweep 3.7 (39.0) -> 2.9 (40.0) MB, 4.7 ms
[external memory allocation limit reached.] [GC in old space requested].
[65202:0x101804c00]       68 ms: Mark-sweep 3.4 (40.0) -> 2.9 (40.0) MB, 9.4 ms
[external memory allocation limit reached.] [GC in old space requested].
[65202:0x101804c00]       81 ms: Mark-sweep 2.9 (40.0) -> 2.9 (40.0) MB, 12.4 ms
[external memory allocation limit reached.] [GC in old space requested].
[65202:0x101804c00]       82 ms: Mark-sweep 2.9 (40.0) -> 2.7 (40.0) MB, 1.7 ms
[external memory allocation limit reached.] [GC in old space requested].
[65202:0x101804c00]       91 ms: Mark-sweep 3.3 (40.0) -> 2.7 (40.0) MB, 3.4 ms
[external memory allocation limit reached.] [GC in old space requested].

I guess making them a scavenges might be the next step to improve the
performance.

Thanks for review!

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc
File src/heap/heap.cc (right):

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1752: void Heap::UnregisterArrayBuffer(bool in_new_space, void*
data) {
On 2015/08/28 14:25:30, Michael Lippautz wrote:
> We can simplify the main part a lot:
> 
> std::map<void*, size_t>*live_buffers = 
>     in_new_space ? &live_array_buffers_for_scavenge_ : &live_array_buffers_;
> std::map<void*, size_t>* not_yet_discovered_buffers = 
>     in_new_space ? &not_yet_discovered_buffers_for_scavenge_ :
> &not_yet_discovered_buffers_;
> DCHECK(live_array_buffers->count(data) > 0);
> live_array_buffers->erase(data);
> not_yet_discovered_array_buffers->erase(data);

Acknowledged.

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1761: not_yet_discovered_array_buffers_.erase(data);
On 2015/08/28 14:25:30, Michael Lippautz wrote:
> This  .erase() will not be part of the method as we keep the sets disjoint.

Acknowledged.

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1776: if (from_scavenge) {
This one will need to become unconditional.

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1784: void Heap::FreeDeadArrayBuffers(bool from_scavenge) {
On 2015/08/28 14:25:30, Michael Lippautz wrote:
> Since we keep the sets disjoint, we only need to visit XXX_scavenge_ for the
new
> space part and in the other case we visit all of them:
> 
> for buffer in not_yet_discovered_buffers_scavenge:
>   isolate.free(buffer)
>   freed_memory += …
>   live_array_buffers_for_scavenge_.clear(buffer)
> if !from_scavenge:
>   for buffer in not_yet_discovered_buffers_:
>     isolate.free(buffer)
>     freed_memory += …
>     live_array_buffers_.clear(buffer)
> 
> 
> not_yet_discovered_buffers_for_scavenge_ = live_array_buffers_for_scavenge_
> if !from_scavenge:
>   not_yet_discovered_buffers_ = live_array_buffers_

Acknowledged.

https://codereview.chromium.org/1316873004/diff/20001/src/heap/heap.cc#newcod...
src/heap/heap.cc:1828: void Heap::TearDownArrayBuffers() {
On 2015/08/28 14:25:30, Michael Lippautz wrote:
> nit: While practically not relevant, we should still tear down gentle and
record
> the freed memory (like in FreeDeadArrayBuffers).
> 
> e.g.
>   freed_memory += buffer.second;
> throughout the loop
> 
> and 
>   AdjustAmountOfExternalAllocatedMemory(freed_memory); 
> in the end.

Acknowledged.

Michael Lippautz

On 2015/08/28 20:49:41, fedor.indutny wrote: > The results of benchmarks are: > > ./node-slow 1 ...

5 years, 3 months ago (2015-08-31 06:31:58 UTC) #15

On 2015/08/28 20:49:41, fedor.indutny wrote:
> The results of benchmarks are:
> 
> ./node-slow 1 && ./node-no-inline-fast 1 && ./node-fast 1
> 4997.4 ns/op
> 4701.6 ns/op
> 4685.7 ns/op
> 
> NOTE: `no-inline-fast` is the initial version of patch, `fast` - is current,
> `slow` - vanilla v8.

Please add the summary to the description.

> 
> The other thing that I have noticed is that there are not much scavenges
> happening:
> 
> [65202:0x101804c00]       22 ms: Scavenge 1.9 (38.0) -> 1.8 (38.0) MB, 0.6 ms
> [allocation failure].
> [65202:0x101804c00]       23 ms: Scavenge 2.0 (38.0) -> 2.0 (39.0) MB, 0.5 ms
> [allocation failure].
> [65202:0x101804c00]       48 ms: Scavenge 3.5 (39.0) -> 3.1 (39.0) MB, 3.4 ms
> [allocation failure].
> [65202:0x101804c00]       57 ms: Mark-sweep 3.7 (39.0) -> 2.9 (40.0) MB, 4.7
ms
> [external memory allocation limit reached.] [GC in old space requested].
> [65202:0x101804c00]       68 ms: Mark-sweep 3.4 (40.0) -> 2.9 (40.0) MB, 9.4
ms
> [external memory allocation limit reached.] [GC in old space requested].
> [65202:0x101804c00]       81 ms: Mark-sweep 2.9 (40.0) -> 2.9 (40.0) MB, 12.4
ms
> [external memory allocation limit reached.] [GC in old space requested].
> [65202:0x101804c00]       82 ms: Mark-sweep 2.9 (40.0) -> 2.7 (40.0) MB, 1.7
ms
> [external memory allocation limit reached.] [GC in old space requested].
> [65202:0x101804c00]       91 ms: Mark-sweep 3.3 (40.0) -> 2.7 (40.0) MB, 3.4
ms
> [external memory allocation limit reached.] [GC in old space requested].
> 
> I guess making them a scavenges might be the next step to improve the
> performance.
> 

I don't think there's much to improve on here with the current architecture.

The buffers in question are allocated externally (while the wrapping object
JSArrayBuffer is only conditionally allocated externally). We start an
incremental GC (see api.cc AdjustAmountOfExternalAllocatedMemory) once we hit
the limit. For the scavenge map you need the scavenge information and for the
full map you need a full transitive closure.

Michael Lippautz

lgtm with comments. The next step would be moving the logic out of Heap into ...

5 years, 3 months ago (2015-08-31 06:33:45 UTC) #16

fedor.indutny

The patchset sent to the CQ was uploaded after l-g-t-m from mlippautz@chromium.org Link to the ...

5 years, 3 months ago (2015-08-31 06:39:56 UTC) #18

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1316873004/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1316873004/60001

5 years, 3 months ago (2015-08-31 06:40:07 UTC) #19

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 3 months ago (2015-08-31 06:42:51 UTC) #20

commit-bot: I haz the power

Try jobs failed on following builders: v8_mac_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_mac_rel/builds/9247) v8_win_nosnap_shared_compile_rel on tryserver.v8 (JOB_FAILED, ...

5 years, 3 months ago (2015-08-31 06:42:51 UTC) #21

fedor.indutny

On 2015/08/31 06:31:58, Michael Lippautz wrote: > On 2015/08/28 20:49:41, fedor.indutny wrote: > > The ...

5 years, 3 months ago (2015-08-31 06:45:29 UTC) #22

On 2015/08/31 06:31:58, Michael Lippautz wrote:
> On 2015/08/28 20:49:41, fedor.indutny wrote:
> > The results of benchmarks are:
> > 
> > ./node-slow 1 && ./node-no-inline-fast 1 && ./node-fast 1
> > 4997.4 ns/op
> > 4701.6 ns/op
> > 4685.7 ns/op
> > 
> > NOTE: `no-inline-fast` is the initial version of patch, `fast` - is current,
> > `slow` - vanilla v8.
> 
> Please add the summary to the description.

Argh, looks like I missed this one before hitting the commit button. Sorry!

> 
> > 
> > The other thing that I have noticed is that there are not much scavenges
> > happening:
> > 
> > [65202:0x101804c00]       22 ms: Scavenge 1.9 (38.0) -> 1.8 (38.0) MB, 0.6
ms
> > [allocation failure].
> > [65202:0x101804c00]       23 ms: Scavenge 2.0 (38.0) -> 2.0 (39.0) MB, 0.5
ms
> > [allocation failure].
> > [65202:0x101804c00]       48 ms: Scavenge 3.5 (39.0) -> 3.1 (39.0) MB, 3.4
ms
> > [allocation failure].
> > [65202:0x101804c00]       57 ms: Mark-sweep 3.7 (39.0) -> 2.9 (40.0) MB, 4.7
> ms
> > [external memory allocation limit reached.] [GC in old space requested].
> > [65202:0x101804c00]       68 ms: Mark-sweep 3.4 (40.0) -> 2.9 (40.0) MB, 9.4
> ms
> > [external memory allocation limit reached.] [GC in old space requested].
> > [65202:0x101804c00]       81 ms: Mark-sweep 2.9 (40.0) -> 2.9 (40.0) MB,
12.4
> ms
> > [external memory allocation limit reached.] [GC in old space requested].
> > [65202:0x101804c00]       82 ms: Mark-sweep 2.9 (40.0) -> 2.7 (40.0) MB, 1.7
> ms
> > [external memory allocation limit reached.] [GC in old space requested].
> > [65202:0x101804c00]       91 ms: Mark-sweep 3.3 (40.0) -> 2.7 (40.0) MB, 3.4
> ms
> > [external memory allocation limit reached.] [GC in old space requested].
> > 
> > I guess making them a scavenges might be the next step to improve the
> > performance.
> > 
> 
> I don't think there's much to improve on here with the current architecture.
> 
> The buffers in question are allocated externally (while the wrapping object
> JSArrayBuffer is only conditionally allocated externally). We start an
> incremental GC (see api.cc AdjustAmountOfExternalAllocatedMemory) once we hit
> the limit. For the scavenge map you need the scavenge information and for the
> full map you need a full transitive closure.

We are allocating `kInternalized` ArrayBuffers in node. Is there any other way
to allocate them "internally"?

All of the allocated buffers are in the new space, may I ask you to elaborate a
bit more on why the Scavenge is not possible in this case?

Thank you!

fedor.indutny

Made a typo in debug-only line. Going to rebuild it on my machine, and resubmit ...

5 years, 3 months ago (2015-08-31 06:46:54 UTC) #23

fedor.indutny

The patchset sent to the CQ was uploaded after l-g-t-m from mlippautz@chromium.org Link to the ...

5 years, 3 months ago (2015-08-31 06:54:17 UTC) #25

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1316873004/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1316873004/80001

5 years, 3 months ago (2015-08-31 06:54:27 UTC) #26

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 3 months ago (2015-08-31 07:05:19 UTC) #27

commit-bot: I haz the power

Try jobs failed on following builders: v8_linux_nodcheck_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel/builds/6081)

5 years, 3 months ago (2015-08-31 07:05:20 UTC) #28

Michael Lippautz

On 2015/08/31 06:45:29, fedor.indutny wrote: > On 2015/08/31 06:31:58, Michael Lippautz wrote: > > On ...

5 years, 3 months ago (2015-08-31 07:16:39 UTC) #29

fedor.indutny

Michael, Looks like there is some issue on linux buildbot. I will investigate it, and ...

5 years, 3 months ago (2015-08-31 07:16:50 UTC) #30

fedor.indutny

On 2015/08/31 07:16:39, Michael Lippautz wrote: > On 2015/08/31 06:45:29, fedor.indutny wrote: > > On ...

5 years, 3 months ago (2015-08-31 07:21:23 UTC) #31

On 2015/08/31 07:16:39, Michael Lippautz wrote:
> On 2015/08/31 06:45:29, fedor.indutny wrote:
> > On 2015/08/31 06:31:58, Michael Lippautz wrote:
> > > On 2015/08/28 20:49:41, fedor.indutny wrote:
> > > > I guess making them a scavenges might be the next step to improve the
> > > > performance.
> > > > 
> > > 
> > > I don't think there's much to improve on here with the current
architecture.
> > > 
> > > The buffers in question are allocated externally (while the wrapping
object
> > > JSArrayBuffer is only conditionally allocated externally). We start an
> > > incremental GC (see api.cc AdjustAmountOfExternalAllocatedMemory) once we
> hit
> > > the limit. For the scavenge map you need the scavenge information and for
> the
> > > full map you need a full transitive closure.
> > 
> > We are allocating `kInternalized` ArrayBuffers in node. Is there any other
way
> > to allocate them "internally"?
> > 
> 
> With kExternalized we do not explicitly keep track of the buffers we are
handed
> when creating a JSArrayBuffer object. We just ignore the buffer contents wrt.
to
> garbage collection. With kInternalized we use the functions we just modified
to
> keep track of them. 
> 
> As far as I see the api though, in both cases we actually use a special
> allocator (v8::ArrayBuffer::Allocator  array_buffer_allocator) in the isolate
to
> allocate those buffers. This seems to be external at all times for now, i.e.,
> even for d8 we use a so-called ShellArrayBufferAllocator that essentially just
> wraps malloc. jochen@(currently OOO) should know why the buffers are allocated
> externally at all times.

Great, would love to learn more about it! Thanks!

> 
> > All of the allocated buffers are in the new space, may I ask you to
elaborate
> a
> > bit more on why the Scavenge is not possible in this case?
> > 
> 
> JSTypedArray (the wrapping JS object) usually starts in new space. The buffers
> (as described above) already live on some external heap.

I didn't mean the contents of ArrayBuffers, they are not direct subject to
scavenge
anyway. What I meant is that all of these GCed buffers was actually in a new
space,
so the Scavenge should be the fastest way to handle it. Running incremental GC,
with no "dead" objects in Old Space seems to be a bit wasteful.

Is there any way to modify this behavior to make it better for both us and
Chrome?
It would be great to have some sort of overridable callback which will decide,
what
to do when we external memory limits.

Michael Lippautz

On 2015/08/31 07:21:23, fedor.indutny wrote: > On 2015/08/31 07:16:39, Michael Lippautz wrote: > > On ...

5 years, 3 months ago (2015-08-31 08:09:44 UTC) #32

On 2015/08/31 07:21:23, fedor.indutny wrote:
> On 2015/08/31 07:16:39, Michael Lippautz wrote:
> > On 2015/08/31 06:45:29, fedor.indutny wrote:
> > > On 2015/08/31 06:31:58, Michael Lippautz wrote:
> > > > On 2015/08/28 20:49:41, fedor.indutny wrote:
> > > All of the allocated buffers are in the new space, may I ask you to
> elaborate
> > a
> > > bit more on why the Scavenge is not possible in this case?
> > > 
> > 
> > JSTypedArray (the wrapping JS object) usually starts in new space. The
buffers
> > (as described above) already live on some external heap.
> 
> I didn't mean the contents of ArrayBuffers, they are not direct subject to
> scavenge
> anyway. What I meant is that all of these GCed buffers was actually in a new
> space,
> so the Scavenge should be the fastest way to handle it. Running incremental
GC,
> with no "dead" objects in Old Space seems to be a bit wasteful.
> 

Let's fix some terminology:
* JS object: The frontend JS object.
* Buffer: the buffer holding the actual contents. 

The JS object (pretty small) holds a backend, our buffer. The buffer is
externally allocated. The JS object starts in new space and the scavenger IS
already handling it. However, the microbenchmark (that's why it's only a
microbenchmark...) is only allocating these small JS objects on the V8 heap and
thus only triggering a few scavenges (as you need to hit a limit to trigger a
scavenge). 

The incremental GC you see is triggered because the externally allocated buffers
(our backends) hit a limit where we need to synchronize with the embedder
(chrome, node, ...) using a GC.  The process with synchronizing with the
embedder is complex because we are dealing with a system where multiple GCs
interfere with each other and need to find a global transitive closure (or at
least a good enough approximation). This used to be a full GC but we switched to
an incremental one to keep the pause time small.

> Is there any way to modify this behavior to make it better for both us and
> Chrome?
> It would be great to have some sort of overridable callback which will decide,
> what
> to do when we external memory limits.

In general you don't know where the JS objects holding external memory live.
Doing a scavenge here only helps your specific microbenchmark (as you know that
most of you 65k allocated buffers are tied to objects in a tight loop.). You
don't gain general knowledge by doing a scavenge.

Strategies for hitting external allocation limits could also be discussed with
jochen@. Probably a good idea to get him in the loop once he is back. 

Having said all that, allocating a "SlowBuffer" of 65k in a tight loop is
probably not the best use case :)

Michael Lippautz

Besides fixing the double free, please also rebase on master. We had to land https://codereview.chromium.org/1325643002/ ...

5 years, 3 months ago (2015-08-31 16:08:14 UTC) #33

fedor.indutny

On 2015/08/31 16:08:14, Michael Lippautz wrote: > Besides fixing the double free, please also rebase ...

5 years, 3 months ago (2015-08-31 20:50:01 UTC) #34

fedor.indutny

On 2015/08/31 20:50:01, fedor.indutny wrote: > On 2015/08/31 16:08:14, Michael Lippautz wrote: > > Besides ...

5 years, 3 months ago (2015-08-31 20:51:32 UTC) #35

fedor.indutny

register new_space 0x1031335d0 register new_space 0x103560110 live inc 0x102381a30 live inc 0x102381970 live inc 0x103560110 ...

5 years, 3 months ago (2015-09-01 01:19:19 UTC) #36

fedor.indutny

Michael, I have figured out the problem. See my changes to mark-compact.cc. The buffers from ...

5 years, 3 months ago (2015-09-01 06:19:38 UTC) #37

fedor.indutny

The patchset sent to the CQ was uploaded after l-g-t-m from mlippautz@chromium.org Link to the ...

5 years, 3 months ago (2015-09-01 06:25:57 UTC) #39

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1316873004/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1316873004/140001

5 years, 3 months ago (2015-09-01 06:26:06 UTC) #40

commit-bot: I haz the power

Patchset 8 (id:??) landed as https://crrev.com/9e3676da9ab1aaf7de3e8582cb3fdefcc3dbaf33 Cr-Commit-Position: refs/heads/master@{#30495}

5 years, 3 months ago (2015-09-01 06:52:21 UTC) #42

Michael Lippautz

Please wait next time when doing a change in another component, even if l-g-t-m-ed already. ...

5 years, 3 months ago (2015-09-01 08:30:06 UTC) #43

fedor.indutny

Sorry for pushing it out without your ack. I hope this patch looks ok, if ...

5 years, 3 months ago (2015-09-01 08:50:08 UTC) #44

Michael Lippautz

https://codereview.chromium.org/1316873004/diff/140001/src/heap/mark-compact.cc File src/heap/mark-compact.cc (right): https://codereview.chromium.org/1316873004/diff/140001/src/heap/mark-compact.cc#newcode4436 src/heap/mark-compact.cc:4436: // NOTE: ArrayBuffers must be evacuated first, before freeing ...

5 years, 3 months ago (2015-09-01 09:01:36 UTC) #45

fedor.indutny

On 2015/09/01 09:01:36, Michael Lippautz wrote: > https://codereview.chromium.org/1316873004/diff/140001/src/heap/mark-compact.cc > File src/heap/mark-compact.cc (right): > > https://codereview.chromium.org/1316873004/diff/140001/src/heap/mark-compact.cc#newcode4436 ...

5 years, 3 months ago (2015-09-01 09:08:14 UTC) #46

fedor.indutny

Ok, you are definitely right. Looking through the logs that I have collected I see ...

5 years, 3 months ago (2015-09-01 09:15:46 UTC) #47

Michael Lippautz

On 2015/09/01 09:15:46, fedor.indutny wrote: > Ok, you are definitely right. Looking through the logs ...

5 years, 3 months ago (2015-09-01 09:56:52 UTC) #48

Message was sent while issue was closed.

On 2015/09/01 09:15:46, fedor.indutny wrote:
> Ok, you are definitely right. Looking through the logs that I have collected I
> see the following scenario:
> 
> 1. Marking starts
> 2. ArrayBuffers in NewSpace are visited and marked as live by removing them
from
> not_yet_..._for_scavenge
> 3. Scavenge starts, PrepareArrayBufferDiscovery is called,
> not_yet_..._for_scavenge is overwritten
> 4. ArrayBuffers in NewSpace are visited again and removed from the
> not_yet_..._for_scavenge
> 5. Scavenge ends, buffers are not freed `not_yet_..._for_scavenge` is
> overwritten by `live_..._for_scavenge`
> 6. Mark-Compact GC resumes
> 7. `live_..._for_scavenge === not_yet_..._for_scavenge`
> 8. Buffers are freed
> 9. EvacuateNewSpaceAndCandidates is invoked, it finds these buffers and
promotes
> them to Old Space
> 10. Crash happens when Old Space is GCed.
> 
> I think the reordering `Evacuate` and `FreeDeadBuffers` fixes it with the
price
> of loosing the invariant. The other choice is to disable Scavenge during
> Mark-Compact GC.
> 
> Thoughts?

As far as I see this only mitigates the problem.

The problem is resetting the buffers in your step 5. As soon as you do the
mark-compact GC you would delete all buffers in the not_yet_discovered set
(which has been reset for scavenge).

Moving it after evacuate helps for fixing the state of promoted objects.
However, objects could be allocated between a scavenge and the final GC, which
would would not make them eligible for promotion (objects are copied once before
promoted). As a result, for these objects is that they are live, but freed in
FreeDeadArrayBuffers, as there state is not fixed during PromoteArrayBuffer.

The solution is to not only fix promoted objects, but also those that are only
copied in new space during
MarkCompactCollector::DiscoverAndEvacuateBlackObjectsOnPage. 

I will precautionally revert this as I would like it to be a single CL. You can
send me another CL that includes the additional fix in mark-compact.cc. The
order of EvacuateNewSpaceAndCandidates and FreeDeadArrayBuffers can stay as in
this CL.

Michael Lippautz

5 years, 3 months ago (2015-09-01 09:58:07 UTC) #49

Message was sent while issue was closed.

A revert of this CL (patchset #8 id:140001) has been created in
https://codereview.chromium.org/1302233007/ by mlippautz@chromium.org.

The reason for reverting is: Precautionary revert. The change is incomplete..

Issue 1316873004: heap: make array buffer maps disjoint (Closed)

Description

Patch Set 1 #

Patch Set 2 : inline everything #

Patch Set 3 : completely disjoint sets #

Patch Set 4 : fix last nits #

Patch Set 5 : fix typo #

Patch Set 6 : remove useless code #

Patch Set 7 : fix double free #

Patch Set 8 : rebase #

Messages