Issue 431433005: cc: Use pointers for heaps in tile manager raster tile priority queue.

Issue 431433005: cc: Use pointers for heaps in tile manager raster tile priority queue. (Closed)

Created:
6 years, 4 months ago by vmpstr

Modified:
6 years, 4 months ago

Reviewers:
reveman

CC:
chromium-reviews, cc-bugs_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Project:
chromium

Visibility:
Public.

More Reviews

Description

cc: Use pointers for heaps in tile manager raster tile priority queue. This patch changes what we heapify in tile manager raster tile priority queue. Instead of using objects, we now use pointers to those objects. The reasoning behind this is that since make_heap et al copy things around to maintain a heap order, it is significantly cheaper to move pointer around instead of objects. Perf results: Before: tile_manager_raster_tile_queue_construct: 2= 1348545 runs/s 10= 1297846.75 runs/s 50= 1098756.25 runs/s tile_manager_raster_tile_queue_construct_and_iterate: 2_16= 74439.5546875 runs/s 2_32= 36854.30078125 runs/s 2_64= 17582.240234375 runs/s 2_128= 8018.2802734375 runs/s After: tile_manager_raster_tile_queue_construct: 2= 1294338.75 runs/s (-4%) 10= 1254695 runs/s (-3%) 50= 1047427.375 runs/s (-5%) tile_manager_raster_tile_queue_construct_and_iterate: 2_16= 298292.03125 runs/s (+300%) 2_32= 176878.578125 runs/s (+380%) 2_64= 93562.8515625 runs/s (+430%) 2_128= 43368.1796875 runs/s (+440%) R=reveman Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=287378

Patch Set 1 #

Total comments: 5

Patch Set 2 : ScopedPtrVector #

Created: 6 years, 4 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+37 lines, -24 lines)			Patch
M	cc/base/scoped_ptr_vector.h	View	1	1 chunk	+15 lines, -0 lines	0 comments	Download
M	cc/resources/raster_tile_priority_queue.h	View	1	1 chunk	+4 lines, -1 line	0 comments	Download
M	cc/resources/raster_tile_priority_queue.cc	View	1	2 chunks	+18 lines, -23 lines	0 comments	Download

Messages

Total messages: 14 (0 generated)

Expand Messages | Collapse Messages

vmpstr

PTAL. Note that tiling level raster iterator doesn't have vectors, so the objects here should ...

6 years, 4 months ago (2014-07-30 01:30:23 UTC) #1

reveman

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_priority_queue.cc File cc/resources/raster_tile_priority_queue.cc (right): https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_priority_queue.cc#newcode80 cc/resources/raster_tile_priority_queue.cc:80: TreePriority tree_priority) { That this function is not being ...

6 years, 4 months ago (2014-07-30 16:04:09 UTC) #2

vmpstr

6 years, 4 months ago (2014-07-30 16:45:51 UTC) #3

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_pri...
File cc/resources/raster_tile_priority_queue.cc (right):

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_pri...
cc/resources/raster_tile_priority_queue.cc:80: TreePriority tree_priority) {
On 2014/07/30 16:04:09, reveman wrote:
> That this function is not being faster makes me very suspicious. Why is the
> greatest saving not from make_heap? Once you start iterating you'll end up
with
> non-empty PairedPictureLayerQueue::returned_shared_tiles. Does that have a
> significant factor in why this patch makes iterating cheaper?

One possible explanation is that reallocations in paired_queues_heap_ coupled
with an extra iteration is the thing that prevents us from seeing savings here.

Another more likely case is that make_heap does no work in perftests, since the
layers initially have equal priority (every layer has a now bin tile at the
top). So the cost we're seeing here is just from the loop construction. I did an
experiment where I kept the code the same, just added an extra loop to push
paired layers pointers into a temp vector and I see the same type of regression.

> 
> Another general observation of these results. The improvement is 300% to 440%.
> The reason for this being faster would be because less data has to be copied
> when moving elements around in the heap. Reducing the size of
> PairedPictureLayerQueue to 1/4 of its current size should have the same
effect.
> What's the current size of PairedPictureLayerQueue? It would be interesting to
> see a break down of each of the members. I would not be surprised if we could
> reduce the size of this significantly. That would not only provide the heap
> performance we're looking for but also improve the performance of these
> iterators in other aspects, such as construction.

I don't know if 1/4 the current size is the magic number here. Just from moving
objects around (ignoring pointer indirection when accessing elements), wouldn't
we have to reduce the layer iterator to the size of a single pointer? I agree
that reducing the size is always a good thing, but at the same time I'm weary of
the fact that our performance profile would depend heavily on the size of layer
iterator (note that it has the tiling iterator as members as well, so the
performance would depend on the size of all the iterator layers below this).
That makes it difficult for us to change any meaningful part of the iterator (ie
adding new variables, caching something) without negatively affecting
performance.

In other words, I'd rather us see take a black box approach to this performance,
and not have to assume anything about the size of the iterators that we're
using. Note the same general statements that we normally state when talking
about iterators still apply: they are fairly cheap to create and are appropriate
for stack creation, since they aren't large. IMO, talking about particular size
and relying on the characteristic that if we reduce the size by half, our
performance doubles encourages micro-optimizations that favor tiny object sizes
over clarity.

For reference here's what the layer raster iterator contains:
    enum IteratorType { LOW_RES, HIGH_RES, NUM_ITERATORS };
    struct IterationStage {
      IteratorType iterator_type;
      TilePriority::PriorityBin tile_type;
    };

    PictureLayerImpl* layer_;
    int current_stage_;
    IterationStage stages_[4];
    PictureLayerTiling::TilingRasterTileIterator iterators_[NUM_ITERATORS];

That's probably not the optimal size (we can remove stages array in favor of
determining the next stage computationally for instance), but for the sake of
clarity I think this is pretty streamlined.

Wdyt?

reveman

Btw, you probably need to add a make_heap function to ScopedPtrVector (just like we have ...

6 years, 4 months ago (2014-07-31 03:05:19 UTC) #4

Btw, you probably need to add a make_heap function to ScopedPtrVector (just like
we have for sorting) to use that.

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_pri...
File cc/resources/raster_tile_priority_queue.cc (right):

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_pri...
cc/resources/raster_tile_priority_queue.cc:80: TreePriority tree_priority) {
On 2014/07/30 16:45:51, vmpstr wrote:
> On 2014/07/30 16:04:09, reveman wrote:
> > That this function is not being faster makes me very suspicious. Why is the
> > greatest saving not from make_heap? Once you start iterating you'll end up
> with
> > non-empty PairedPictureLayerQueue::returned_shared_tiles. Does that have a
> > significant factor in why this patch makes iterating cheaper?
> 
> One possible explanation is that reallocations in paired_queues_heap_ coupled
> with an extra iteration is the thing that prevents us from seeing savings
here.
> 
> Another more likely case is that make_heap does no work in perftests, since
the
> layers initially have equal priority (every layer has a now bin tile at the
> top). So the cost we're seeing here is just from the loop construction. I did
an
> experiment where I kept the code the same, just added an extra loop to push
> paired layers pointers into a temp vector and I see the same type of
regression.

Ok, let's start by fixing the perftests then.

> 
> > 
> > Another general observation of these results. The improvement is 300% to
440%.
> > The reason for this being faster would be because less data has to be copied
> > when moving elements around in the heap. Reducing the size of
> > PairedPictureLayerQueue to 1/4 of its current size should have the same
> effect.
> > What's the current size of PairedPictureLayerQueue? It would be interesting
to
> > see a break down of each of the members. I would not be surprised if we
could
> > reduce the size of this significantly. That would not only provide the heap
> > performance we're looking for but also improve the performance of these
> > iterators in other aspects, such as construction.
> 
> I don't know if 1/4 the current size is the magic number here. Just from
moving
> objects around (ignoring pointer indirection when accessing elements),
wouldn't
> we have to reduce the layer iterator to the size of a single pointer? 

Yes, no idea what the actual number is but I'd guess that the difference will
become insignificant much before you reach the size of a pointer. It's hard to
say as moving some data around might be free unless it causes a cache miss. You
need to access the data even if you use pointers so we might get the same cache
misses and the pointers might not buy us much at all.

> I agree
> that reducing the size is always a good thing, but at the same time I'm weary
of
> the fact that our performance profile would depend heavily on the size of
layer
> iterator (note that it has the tiling iterator as members as well, so the
> performance would depend on the size of all the iterator layers below this).
> That makes it difficult for us to change any meaningful part of the iterator
(ie
> adding new variables, caching something) without negatively affecting
> performance.
> 
> In other words, I'd rather us see take a black box approach to this
performance,
> and not have to assume anything about the size of the iterators that we're
> using. Note the same general statements that we normally state when talking
> about iterators still apply: they are fairly cheap to create and are
appropriate
> for stack creation, since they aren't large. IMO, talking about particular
size
> and relying on the characteristic that if we reduce the size by half, our
> performance doubles encourages micro-optimizations that favor tiny object
sizes
> over clarity.

Not making assumptions about the size makes some sense. 

Although, I think that reducing the size of these are one of the most useful
optimizations we can do. It improves performance not only when accessing these
objects but when constructing them and moving them. That said, we should be
careful about making changes that reduce readability.

> 
> For reference here's what the layer raster iterator contains:
>     enum IteratorType { LOW_RES, HIGH_RES, NUM_ITERATORS };
>     struct IterationStage {
>       IteratorType iterator_type;
>       TilePriority::PriorityBin tile_type;
>     };
> 
>     PictureLayerImpl* layer_;
>     int current_stage_;
>     IterationStage stages_[4];
>     PictureLayerTiling::TilingRasterTileIterator iterators_[NUM_ITERATORS];
> 
> That's probably not the optimal size (we can remove stages array in favor of
> determining the next stage computationally for instance), but for the sake of
> clarity I think this is pretty streamlined.

I actually think it's a great idea to remove that array and compute the stage as
we iterate instead of at construction time. I think it can improve readability
if done right.

> 
> Wdyt?

I'm not completely against this patch. I just need some more data to be
convinced that this is a good idea. After updating the perftests, can you do
some profiling and post the results to this CL?

vmpstr

> I'm not completely against this patch. I just need some more data to be ...

6 years, 4 months ago (2014-07-31 21:52:13 UTC) #5

> I'm not completely against this patch. I just need some more data to be
> convinced that this is a good idea. After updating the perftests, can you do
> some profiling and post the results to this CL?

I reran the fixed perftests with and without this patch, here are the result:
Before:
[ RUN      ] TileManagerPerfTest.RasterTileQueueConstruct
*RESULT tile_manager_raster_tile_queue_construct: 2= 1295742.375 runs/s
*RESULT tile_manager_raster_tile_queue_construct: 10= 271312.28125 runs/s
*RESULT tile_manager_raster_tile_queue_construct: 50= 33427.76171875 runs/s
[       OK ] TileManagerPerfTest.RasterTileQueueConstruct (6028 ms)
[ RUN      ] TileManagerPerfTest.RasterTileQueueConstructAndIterate
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_16=
104820.0703125 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_32=
52762.65234375 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_64=
25451.34765625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_128=
11645.6328125 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_16=
65317.77734375 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_32=
36263.56640625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_64=
19423.873046875 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_128=
9491.0283203125 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_16=
21731.697265625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_32=
16152.537109375 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_64=
11045.8193359375 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_128=
6502.76611328125 runs/s
[       OK ] TileManagerPerfTest.RasterTileQueueConstructAndIterate (24199 ms)

After:
[ RUN      ] TileManagerPerfTest.RasterTileQueueConstruct
*RESULT tile_manager_raster_tile_queue_construct: 2= 1360116.625 runs/s
*RESULT tile_manager_raster_tile_queue_construct: 10= 303830.90625 runs/s
*RESULT tile_manager_raster_tile_queue_construct: 50= 37095.8828125 runs/s
[       OK ] TileManagerPerfTest.RasterTileQueueConstruct (6027 ms)
[ RUN      ] TileManagerPerfTest.RasterTileQueueConstructAndIterate
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_16= 337294.65625
runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_32=
204357.953125 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_64= 105868.625
runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_128=
46488.0234375 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_16= 137583
runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_32= 89561.90625
runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_64=
52665.28515625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_128=
27036.296875 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_16=
28990.201171875 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_32=
24570.134765625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_64=
19634.06640625 runs/s
*RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_128=
12886.591796875 runs/s
[       OK ] TileManagerPerfTest.RasterTileQueueConstructAndIterate (24187 ms)

Now, construct also improves, the construct and iterate is still a substantial
improvement.

Can you explain what is your objection to this patch?

vmpstr

Here's profile numbers for ConstructAndIterate test (i put all things >1% prunning at layer raster ...

6 years, 4 months ago (2014-08-01 17:52:47 UTC) #6

reveman

On 2014/08/01 17:52:47, vmpstr wrote: > Here's profile numbers for ConstructAndIterate test (i put all ...

6 years, 4 months ago (2014-08-01 19:21:41 UTC) #7

On 2014/08/01 17:52:47, vmpstr wrote:
> Here's profile numbers for ConstructAndIterate test (i put all things >1%
> prunning at layer raster tile iterator level):
> Before:
> RunConstructAndIterate 99%
>   RasterTilePriorityQueue::Pop 77.2%
>      adjust heap (from push/pop heap) 22.7%
>      PairedPictureLayerQueue::Pop 11.1%
>        LayerRasterTileIterator::operator++ 7.1%
>        std::vector::insert 1.1%
>      free 11.1%
>      memmove 9.4%
>      new 7.5%

I assume free/new is from returned_tiles_?

>      std::vector::operator= 7.2%
>      RasterOrderComparator::operator() 3.1%
>      LayerRasterTileIterator::~LayerRasterTileIterator 1.3%
>   BuildRasterQueue 17.7%
>     RasterTilePriorityQueue::Build 16.8%
>       LayerRasterTileIterator::LayerRasterTileIterator 8.9%
>       adjust heap (from make heap) 3.0%
>       std::vector::insert 2.8%
>   RasterTilePriorityQueue::~RasterTilePriorityQueue 1.2%
> 
> After:
> RunConstructAndIterate 100%
>   RasterTilePriorityQueue::Pop 56.3%
>     PairedPictureLayerQueue::Pop 33.6%
>       LayerRasterTileIterator::operator++ 19.8%
>       std::vector::insert 8.2%
>       LayerRasterTileIterator::operator bool 1.7%
>       LayerRasterTileIterator::operator* 1.2%
>     RasterOrderComparator::operator() 16.0%
>       LayerRasterTileIterator::operator bool 5.0%
>       LayerRasterTileIterator::operator* 3.7%
>   BuildRasterQueue 33.4%
>     RasterTilePriority::Build 31.8%
>       LayerRasterTileIterator::LayerRasterTileIterator 18.2%
>       std::vector<PairedPictureLayerQueue>::insert 5.8%
>       std::vector<PairedPictureLayerQueue*>::insert 2.3%
>       adjust heap (from make heap) 2.1%
>     GetPairedPictureLayers 1.3%
>   RasterTilePriorityQueue::~RasterTilePriorityQueue 3.0%
>     free 2.5%
>   RasterTilePriorityQueue::Top 1.9%
>   (LapTimer::NextLap 1.8%)
>   RasterTilePriorityQueue::IsEmpty 1.4%

I think of this change as a last resort that can be applied when we're done with
all optimizations that don't add complexity. This might be a significant win
right now but it's no clear that it will be in the future. This patch actually
adds some more work too. That happens to work out for the better right now but
maybe it doesn't after we've made other optimizations. Maybe we end up with the
increased complexity from this patch without any benefit.

ScopedPtrVector would be a bit better but I'd prefer to hold off on that too for
a bit longer.

Here's what I'd like us to do before we land this:
1. Remove returned_tiles_ if possible
2. Get a breakdown of the LayerRasterTileIterator struct to see if any
low-hanging fruit is left that would significantly reduce the size

vmpstr

On 2014/08/01 19:21:41, reveman wrote: > On 2014/08/01 17:52:47, vmpstr wrote: > > Here's profile ...

6 years, 4 months ago (2014-08-01 20:57:40 UTC) #8

On 2014/08/01 19:21:41, reveman wrote:
> On 2014/08/01 17:52:47, vmpstr wrote:
> > Here's profile numbers for ConstructAndIterate test (i put all things >1%
> > prunning at layer raster tile iterator level):
> > Before:
> > RunConstructAndIterate 99%
> >   RasterTilePriorityQueue::Pop 77.2%
> >      adjust heap (from push/pop heap) 22.7%
> >      PairedPictureLayerQueue::Pop 11.1%
> >        LayerRasterTileIterator::operator++ 7.1%
> >        std::vector::insert 1.1%
> >      free 11.1%
> >      memmove 9.4%
> >      new 7.5%
> 
> I assume free/new is from returned_tiles_?

I'm not sure. After this change, new memmove and free don't show up at these
high levels, but the patch doesn't touch returned tiles logic.
> 
> >      std::vector::operator= 7.2%
> >      RasterOrderComparator::operator() 3.1%
> >      LayerRasterTileIterator::~LayerRasterTileIterator 1.3%
> >   BuildRasterQueue 17.7%
> >     RasterTilePriorityQueue::Build 16.8%
> >       LayerRasterTileIterator::LayerRasterTileIterator 8.9%
> >       adjust heap (from make heap) 3.0%
> >       std::vector::insert 2.8%
> >   RasterTilePriorityQueue::~RasterTilePriorityQueue 1.2%
> > 
> > After:
> > RunConstructAndIterate 100%
> >   RasterTilePriorityQueue::Pop 56.3%
> >     PairedPictureLayerQueue::Pop 33.6%
> >       LayerRasterTileIterator::operator++ 19.8%
> >       std::vector::insert 8.2%
> >       LayerRasterTileIterator::operator bool 1.7%
> >       LayerRasterTileIterator::operator* 1.2%
> >     RasterOrderComparator::operator() 16.0%
> >       LayerRasterTileIterator::operator bool 5.0%
> >       LayerRasterTileIterator::operator* 3.7%
> >   BuildRasterQueue 33.4%
> >     RasterTilePriority::Build 31.8%
> >       LayerRasterTileIterator::LayerRasterTileIterator 18.2%
> >       std::vector<PairedPictureLayerQueue>::insert 5.8%
> >       std::vector<PairedPictureLayerQueue*>::insert 2.3%
> >       adjust heap (from make heap) 2.1%
> >     GetPairedPictureLayers 1.3%
> >   RasterTilePriorityQueue::~RasterTilePriorityQueue 3.0%
> >     free 2.5%
> >   RasterTilePriorityQueue::Top 1.9%
> >   (LapTimer::NextLap 1.8%)
> >   RasterTilePriorityQueue::IsEmpty 1.4%
> 
> I think of this change as a last resort that can be applied when we're done
with
> all optimizations that don't add complexity. This might be a significant win
> right now but it's no clear that it will be in the future. This patch actually
> adds some more work too. That happens to work out for the better right now but
> maybe it doesn't after we've made other optimizations. Maybe we end up with
the
> increased complexity from this patch without any benefit.

I'm afraid that this will end up being a necessary change. For instance, I
applied the eviction version of this patch on top of the layer eviction patch
(ie assuming everything on layer side lands), and I still see about a 100% gain.
In the end I think it's very hard to beat shuffling pointers by reducing the
size of the object.

> 
> ScopedPtrVector would be a bit better but I'd prefer to hold off on that too
for
> a bit longer.
> 
> Here's what I'd like us to do before we land this:
> 1. Remove returned_tiles_ if possible
> 2. Get a breakdown of the LayerRasterTileIterator struct to see if any
> low-hanging fruit is left that would significantly reduce the size

I think if we bring the size of the layer iterator down it will definitely help,
but I think this patch will still be significantly better. See my previous
comment with eviction iterators.

reveman

We should find out why these new,memmove,free calls are removed with this change as that ...

6 years, 4 months ago (2014-08-04 14:37:16 UTC) #9

vmpstr

On 2014/08/04 14:37:16, reveman wrote: > We should find out why these new,memmove,free calls are ...

6 years, 4 months ago (2014-08-04 16:46:19 UTC) #10

vmpstr

https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_priority_queue.h File cc/resources/raster_tile_priority_queue.h (right): https://codereview.chromium.org/431433005/diff/1/cc/resources/raster_tile_priority_queue.h#newcode51 cc/resources/raster_tile_priority_queue.h:51: std::vector<PairedPictureLayerQueue*> paired_queues_heap_; On 2014/08/04 14:37:16, reveman wrote: > Please ...

6 years, 4 months ago (2014-08-04 16:46:26 UTC) #11

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vmpstr@chromium.org/431433005/20001

6 years, 4 months ago (2014-08-04 16:48:38 UTC) #13

Message was sent while issue was closed.

Change committed as 287378

Expand Messages | Collapse Messages