Issue 144463012: cc: Reuse the same TaskGraph and completed tasks vector.

Issue 144463012: cc: Reuse the same TaskGraph and completed tasks vector. (Closed)

Created:
6 years, 10 months ago by reveman

Modified:
6 years, 10 months ago

Reviewers:
vmpstr

CC:
chromium-reviews, cc-bugs_chromium.org

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

cc: Reuse the same TaskGraph and completed tasks vector. This improves performance by avoiding unnecessary resizing of vectors. RasterWorkerPoolPerfTest.ScheduleTasks is ~25% faster for ImageRasterWorkerPool with this change. BUG=246546 Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=250305

Patch Set 1 #

Total comments: 4

Patch Set 2 : update comments #

Created: 6 years, 10 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+34 lines, -24 lines)			Patch
M	cc/resources/image_raster_worker_pool.h	View	1	1 chunk	+5 lines, -0 lines	0 comments	Download
M	cc/resources/image_raster_worker_pool.cc	View		4 chunks	+12 lines, -12 lines	0 comments	Download
M	cc/resources/pixel_buffer_raster_worker_pool.h	View	1	1 chunk	+5 lines, -0 lines	0 comments	Download
M	cc/resources/pixel_buffer_raster_worker_pool.cc	View		8 chunks	+12 lines, -12 lines	0 comments	Download

Messages

Total messages: 11 (0 generated)

Expand Messages | Collapse Messages

vmpstr

https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_raster_worker_pool.h File cc/resources/pixel_buffer_raster_worker_pool.h (right): https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_raster_worker_pool.h#newcode92 cc/resources/pixel_buffer_raster_worker_pool.h:92: // Avoid unnecessary heap allocations by reusing the same ...

6 years, 10 months ago (2014-02-10 18:00:18 UTC) #2

reveman

6 years, 10 months ago (2014-02-10 19:13:26 UTC) #3

https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
File cc/resources/pixel_buffer_raster_worker_pool.h (right):

https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
cc/resources/pixel_buffer_raster_worker_pool.h:92: // Avoid unnecessary heap
allocations by reusing the same graph and
On 2014/02/10 18:00:19, vmpstr wrote:
> This comment kind of describes the patch, instead of describing what these
> variables do. Could you instead change it to something like "Graph to use when
> scheduling tasks, and  a vector to gather completed tasks"?

Done.

https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
cc/resources/pixel_buffer_raster_worker_pool.h:95: internal::Task::Vector
completed_tasks_;
On 2014/02/10 18:00:19, vmpstr wrote:
> I'm kind of conflicted about this change. What I mean to say is that I suspect
> this is optimizing the perftest, but makes little difference in practice. I
mean
> completed_tasks_ really saves a local std::vector, which isn't that horribly
> expensive. It's also called in CheckForCompletedTasks: something we normally
> call once every 6ms?
> 
> Do you know what the actual time savings are per call, at least approximately?

The performance tests reflect real usage quite well now and this optimization is
definitely worth it imo. The cost we're avoiding is having to create+grow+delete
all these vectors with each use. It's hard to give you a savings per call number
but what I can tell is that the heap allocation functions used to process these
vectors are number 2 after ref-counting when profiling this code. Eliminating
them makes a big difference and if it wasn't for the current ref-counting
bottleneck this would provide a much greater percentage improvement.

FYI, I have another patch that does the same at the TileManager to
RasterWorkerPool interaction level but decided it was better to do that as a
follow up.

vmpstr

On 2014/02/10 19:13:26, David Reveman wrote: > https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_raster_worker_pool.h > File cc/resources/pixel_buffer_raster_worker_pool.h (right): > > https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_raster_worker_pool.h#newcode92 ...

6 years, 10 months ago (2014-02-10 19:40:15 UTC) #4

On 2014/02/10 19:13:26, David Reveman wrote:
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> File cc/resources/pixel_buffer_raster_worker_pool.h (right):
> 
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> cc/resources/pixel_buffer_raster_worker_pool.h:92: // Avoid unnecessary heap
> allocations by reusing the same graph and
> On 2014/02/10 18:00:19, vmpstr wrote:
> > This comment kind of describes the patch, instead of describing what these
> > variables do. Could you instead change it to something like "Graph to use
when
> > scheduling tasks, and  a vector to gather completed tasks"?
> 
> Done.
> 
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> cc/resources/pixel_buffer_raster_worker_pool.h:95: internal::Task::Vector
> completed_tasks_;
> On 2014/02/10 18:00:19, vmpstr wrote:
> > I'm kind of conflicted about this change. What I mean to say is that I
suspect
> > this is optimizing the perftest, but makes little difference in practice. I
> mean
> > completed_tasks_ really saves a local std::vector, which isn't that horribly
> > expensive. It's also called in CheckForCompletedTasks: something we normally
> > call once every 6ms?
> > 
> > Do you know what the actual time savings are per call, at least
approximately?
> 
> The performance tests reflect real usage quite well now and this optimization
is
> definitely worth it imo. The cost we're avoiding is having to
create+grow+delete
> all these vectors with each use. It's hard to give you a savings per call
number
> but what I can tell is that the heap allocation functions used to process
these
> vectors are number 2 after ref-counting when profiling this code. Eliminating
> them makes a big difference and if it wasn't for the current ref-counting
> bottleneck this would provide a much greater percentage improvement.
> 

Ok, I can buy that this actually has savings in the fact that the vector retains
its capacity which allows us to allocate less (or nothing) next time around.
lgtm.

Re: time per call, I meant the perftest.. you said there's 25% increase. What's
the run_time / number_of_runs before and after? I'm curious if we're talking
about 1ms -> 0.75ms or something like 0.010ms -> 0.007ms (on whatever machine is
fine, I just want to get a gut feeling for the speed of the whole schedule)

> FYI, I have another patch that does the same at the TileManager to
> RasterWorkerPool interaction level but decided it was better to do that as a
> follow up.

reveman

On 2014/02/10 19:40:15, vmpstr wrote: > On 2014/02/10 19:13:26, David Reveman wrote: > > > ...

6 years, 10 months ago (2014-02-10 20:06:23 UTC) #5

On 2014/02/10 19:40:15, vmpstr wrote:
> On 2014/02/10 19:13:26, David Reveman wrote:
> >
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> > File cc/resources/pixel_buffer_raster_worker_pool.h (right):
> > 
> >
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> > cc/resources/pixel_buffer_raster_worker_pool.h:92: // Avoid unnecessary heap
> > allocations by reusing the same graph and
> > On 2014/02/10 18:00:19, vmpstr wrote:
> > > This comment kind of describes the patch, instead of describing what these
> > > variables do. Could you instead change it to something like "Graph to use
> when
> > > scheduling tasks, and  a vector to gather completed tasks"?
> > 
> > Done.
> > 
> >
>
https://codereview.chromium.org/144463012/diff/1/cc/resources/pixel_buffer_ra...
> > cc/resources/pixel_buffer_raster_worker_pool.h:95: internal::Task::Vector
> > completed_tasks_;
> > On 2014/02/10 18:00:19, vmpstr wrote:
> > > I'm kind of conflicted about this change. What I mean to say is that I
> suspect
> > > this is optimizing the perftest, but makes little difference in practice.
I
> > mean
> > > completed_tasks_ really saves a local std::vector, which isn't that
horribly
> > > expensive. It's also called in CheckForCompletedTasks: something we
normally
> > > call once every 6ms?
> > > 
> > > Do you know what the actual time savings are per call, at least
> approximately?
> > 
> > The performance tests reflect real usage quite well now and this
optimization
> is
> > definitely worth it imo. The cost we're avoiding is having to
> create+grow+delete
> > all these vectors with each use. It's hard to give you a savings per call
> number
> > but what I can tell is that the heap allocation functions used to process
> these
> > vectors are number 2 after ref-counting when profiling this code.
Eliminating
> > them makes a big difference and if it wasn't for the current ref-counting
> > bottleneck this would provide a much greater percentage improvement.
> > 
> 
> Ok, I can buy that this actually has savings in the fact that the vector
retains
> its capacity which allows us to allocate less (or nothing) next time around.
> lgtm.
> 
> Re: time per call, I meant the perftest.. you said there's 25% increase.
What's
> the run_time / number_of_runs before and after? I'm curious if we're talking
> about 1ms -> 0.75ms or something like 0.010ms -> 0.007ms (on whatever machine
is
> fine, I just want to get a gut feeling for the speed of the whole schedule)

~200000 -> 250000 run/s, so 0.005ms -> 0.004ms on my workstation. as it's heap
allocation related, I expect the same or better on Android.

> 
> > FYI, I have another patch that does the same at the TileManager to
> > RasterWorkerPool interaction level but decided it was better to do that as a
> > follow up.

vmpstr

> ~200000 -> 250000 run/s, so 0.005ms -> 0.004ms on my workstation. as it's heap ...

6 years, 10 months ago (2014-02-10 22:02:26 UTC) #6

reveman

On 2014/02/10 22:02:26, vmpstr wrote: > > ~200000 -> 250000 run/s, so 0.005ms -> 0.004ms ...

6 years, 10 months ago (2014-02-10 23:12:03 UTC) #7

On 2014/02/10 22:02:26, vmpstr wrote:
> > ~200000 -> 250000 run/s, so 0.005ms -> 0.004ms on my workstation. as it's
heap
> > allocation related, I expect the same or better on Android.
> > 
> 
> That's a 1us savings on a call that, again, is not that frequent in
non-perftest
> environment. I mean, I'm happy with the fact that it's faster, but I fear that
> this will lock us in into a particular micro-optimized solution. That is,
adding
> a trace event or something in this function will probably invalidate the
savings
> and make the perfbots graph look like it's a regression.
> 
> We had patches before that removed some heap allocations
> (https://codereview.chromium.org/38553002/ comes to mind), but the consensus
was
> that it might not be worth it. Why the change of heart? :)

Profiling. It shows that all these heap allocations add up as we're doing them
at every level and they are likely costing us more in a non-perftest environment
(where other heap allocations also happen) than the other way around. They are
also more expensive on mobile than desktop.

This is also not just about performance. I've generally been against this kind
of optimizations when they made the code more complicated and we had no data
demonstrating their usefulness. Profiling of our new "useful" perftests show
that they are definitely worth it and we now have a pattern of swapping between
vectors that grow to their desired size at the core TaskGraphRunner level that I
find clean and doesn't add additional complexity to the code. I think we should
apply the same pattern throughout the whole worker-pool/tile-manager system.
Even in places where it doesn't make a difference in performance, just for the
sake of consistency.

In general profiling is the key. ~90% of the CPU time spent in RWP and TGR used
to be heap allocations and ref counting. I'm starting at the bottom (TGR) and
removing it one level at a time. Each level makes a small difference but
together it changes everything.

> 
> > > 
> > > > FYI, I have another patch that does the same at the TileManager to
> > > > RasterWorkerPool interaction level but decided it was better to do that
as
> a
> > > > follow up.

vmpstr

On 2014/02/10 23:12:03, David Reveman wrote: > On 2014/02/10 22:02:26, vmpstr wrote: > > > ...

6 years, 10 months ago (2014-02-10 23:16:35 UTC) #8

On 2014/02/10 23:12:03, David Reveman wrote:
> On 2014/02/10 22:02:26, vmpstr wrote:
> > > ~200000 -> 250000 run/s, so 0.005ms -> 0.004ms on my workstation. as it's
> heap
> > > allocation related, I expect the same or better on Android.
> > > 
> > 
> > That's a 1us savings on a call that, again, is not that frequent in
> non-perftest
> > environment. I mean, I'm happy with the fact that it's faster, but I fear
that
> > this will lock us in into a particular micro-optimized solution. That is,
> adding
> > a trace event or something in this function will probably invalidate the
> savings
> > and make the perfbots graph look like it's a regression.
> > 
> > We had patches before that removed some heap allocations
> > (https://codereview.chromium.org/38553002/ comes to mind), but the consensus
> was
> > that it might not be worth it. Why the change of heart? :)
> 
> Profiling. It shows that all these heap allocations add up as we're doing them
> at every level and they are likely costing us more in a non-perftest
environment
> (where other heap allocations also happen) than the other way around. They are
> also more expensive on mobile than desktop.
> 
> This is also not just about performance. I've generally been against this kind
> of optimizations when they made the code more complicated and we had no data
> demonstrating their usefulness. Profiling of our new "useful" perftests show
> that they are definitely worth it and we now have a pattern of swapping
between
> vectors that grow to their desired size at the core TaskGraphRunner level that
I
> find clean and doesn't add additional complexity to the code. I think we
should
> apply the same pattern throughout the whole worker-pool/tile-manager system.
> Even in places where it doesn't make a difference in performance, just for the
> sake of consistency.
> 
> In general profiling is the key. ~90% of the CPU time spent in RWP and TGR
used
> to be heap allocations and ref counting. I'm starting at the bottom (TGR) and
> removing it one level at a time. Each level makes a small difference but
> together it changes everything.
> 
> > 
> > > > 
> > > > > FYI, I have another patch that does the same at the TileManager to
> > > > > RasterWorkerPool interaction level but decided it was better to do
that
> as
> > a
> > > > > follow up.

sgtm.

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/reveman@chromium.org/144463012/60001

6 years, 10 months ago (2014-02-11 00:06:48 UTC) #10

Message was sent while issue was closed.

Change committed as 250305

Expand Messages | Collapse Messages