Issue 1814103002: CC Animation: Use unordered_map instead of vector.

loyso (OOO)

Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization. ...

4 years, 9 months ago (2016-03-18 00:16:51 UTC) #1

loyso (OOO)

Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization. ...

4 years, 9 months ago (2016-03-18 00:18:50 UTC) #2

loyso (OOO)

loyso@chromium.org changed reviewers: + ajuma@chromium.org, vollick@chromium.org

4 years, 9 months ago (2016-03-18 00:18:51 UTC) #3

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1814103002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1814103002/20001

4 years, 9 months ago (2016-03-20 23:52:52 UTC) #8

commit-bot: I haz the power

Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization ...

4 years, 9 months ago (2016-03-21 01:02:01 UTC) #9

commit-bot: I haz the power

Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization ...

4 years, 9 months ago (2016-03-21 01:03:07 UTC) #11

commit-bot: I haz the power

Patchset 2 (id:??) landed as https://crrev.com/79f47b36532d07544d99c07ad06c8a45e3e62f92 Cr-Commit-Position: refs/heads/master@{#382241}

4 years, 9 months ago (2016-03-21 01:03:08 UTC) #12

danakj

danakj@chromium.org changed reviewers: + danakj@chromium.org

4 years, 9 months ago (2016-03-21 21:16:14 UTC) #13

danakj

> A performance optimization to align the implementation with > AnimationTimeline::id_to_player_map_ Why does this performance ...

4 years, 9 months ago (2016-03-21 21:16:15 UTC) #14

loyso (OOO)

On 2016/03/21 21:16:15, danakj wrote: > Why does this performance change include no performance numbers ...

4 years, 9 months ago (2016-03-21 23:41:19 UTC) #15

danakj

On 2016/03/21 23:41:19, loyso wrote: > On 2016/03/21 21:16:15, danakj wrote: > > Why does ...

4 years, 9 months ago (2016-03-21 23:43:48 UTC) #16

loyso (OOO)

On 2016/03/21 23:43:48, danakj wrote: > On 2016/03/21 23:41:19, loyso wrote: > > On 2016/03/21 ...

4 years, 9 months ago (2016-03-21 23:52:47 UTC) #17

loyso (OOO)

p.s. JFYI, I think that crbug/588172 is unrelated to this. And I'm looking for any ...

4 years, 9 months ago (2016-03-21 23:55:03 UTC) #18

danakj

On Mon, Mar 21, 2016 at 4:52 PM, <loyso@chromium.org> wrote: > On 2016/03/21 23:43:48, danakj ...

4 years, 9 months ago (2016-03-22 00:00:32 UTC) #19

loyso (OOO)

On 2016/03/22 00:00:32, danakj wrote: > > > Vector is faster for small numbers, so ...

4 years, 9 months ago (2016-03-22 01:25:46 UTC) #20

Message was sent while issue was closed.

On 2016/03/22 00:00:32, danakj wrote:

> > > Vector is faster for small numbers, so this sounds like it would be a
> > perf
> > > regression?
> > vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One day
> > in
> > the future we can get a case with 100 timelines.
> Can you explain? What is changing?
> How often is 100 going to happen? Is this going to be a bottleneck and
> worth the perf change for potentially more common cases?
AnimationTimeline is basically a group of players. I don't see any sense in
limiting any clients of AnimationHost
in a number of timelines and keeping the algorithm as O(n^2) avg.
The assumption was that on small n (<20), vector and hashmap are different only
by some small constant.

> > I could put a DCHECK or additional logic to limit the number of timelines.
> > Alternatively, I have chosen O(n) instead of O(n^2) which scales with n.
> >
> > No problem, I'll add a perf test with 100 timelines. But I need to make it
> > short
> > (in time) to make it cheap.
> > Would it be ok?
> A test with realistic small number and large number could be nice.
https://codereview.chromium.org/1823773003/

Some stats for my z620 (averaged):
n=1000 vector-based PushProperties: 600 runs/sec, hashmap-based PushProperties:
17500 runs/sec
n=100 vector-based PushProperties: 58300 runs/sec, hashmap-based PushProperties:
186300 runs/sec
n=10 vector-based PushProperties: 2,420,000 runs/sec, hashmap-based
PushProperties: 1,767,400 runs/sec
n=4 vector-based PushProperties: 6,000,000 runs/sec, hashmap-based
PushProperties: 3,900,000 runs/sec

My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps vector and
bubble-sort-O(n^2)-like algorithms.
I suspect that on typical ARM with 16 kb (leess associative) cache these two
approaches would be on par for n<10.

> Have you considered base::SmallMap?
No, I didn't know that it exist. Thanks for letting me know.

I think, we shouldn't optimize the case with 4 timelines. It doesn't matter,
whether we spend 100 or 200 CPU cycles per commit on this.
What matters in this case is realistic numbers like '100+ players' and overall
complexity of the algorithm.

By the way, I know how to make AnimationHost::PushProperties to be
O(n)-in-average AND O(n)-in-worst case (with hash map it still has the O(n^2)
worst case) plus how to remove some O(n) passes.
But the price would be a slightly more complicated implementation.

danakj

On Mon, Mar 21, 2016 at 6:25 PM, <loyso@chromium.org> wrote: > On 2016/03/22 00:00:32, danakj ...

4 years, 9 months ago (2016-03-22 01:33:38 UTC) #21

Message was sent while issue was closed.

On Mon, Mar 21, 2016 at 6:25 PM, <loyso@chromium.org> wrote:

> On 2016/03/22 00:00:32, danakj wrote:
>
> > > > Vector is faster for small numbers, so this sounds like it would be a
> > > perf
> > > > regression?
> > > vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One
> day
> > > in
> > > the future we can get a case with 100 timelines.
> > Can you explain? What is changing?
> > How often is 100 going to happen? Is this going to be a bottleneck and
> > worth the perf change for potentially more common cases?
> AnimationTimeline is basically a group of players. I don't see any sense in
> limiting any clients of AnimationHost
> in a number of timelines and keeping the algorithm as O(n^2) avg.
> The assumption was that on small n (<20), vector and hashmap are different
> only
> by some small constant.
>
> > > I could put a DCHECK or additional logic to limit the number of
> timelines.
> > > Alternatively, I have chosen O(n) instead of O(n^2) which scales with
> n.
> > >
> > > No problem, I'll add a perf test with 100 timelines. But I need to
> make it
> > > short
> > > (in time) to make it cheap.
> > > Would it be ok?
> > A test with realistic small number and large number could be nice.
> https://codereview.chromium.org/1823773003/
>
> Some stats for my z620 (averaged):
> n=1000 vector-based PushProperties: 600 runs/sec, hashmap-based
> PushProperties:
> 17500 runs/sec
> n=100 vector-based PushProperties: 58300 runs/sec, hashmap-based
> PushProperties:
> 186300 runs/sec
> n=10 vector-based PushProperties: 2,420,000 runs/sec, hashmap-based
> PushProperties: 1,767,400 runs/sec
> n=4 vector-based PushProperties: 6,000,000 runs/sec, hashmap-based
> PushProperties: 3,900,000 runs/sec
>

Thanks for the numbers.


> My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps vector
> and
> bubble-sort-O(n^2)-like algorithms.
> I suspect that on typical ARM with 16 kb (leess associative) cache these
> two
> approaches would be on par for n<10.
>

You can run the perf test on an android device. I think you should *always*
judge your perf measurements on such a device, and not trust your desktop
that much anyways. It's pretty simple to run gtest-based perftests on a
phone.

> Have you considered base::SmallMap?
> No, I didn't know that it exist. Thanks for letting me know.
>
> I think, we shouldn't optimize the case with 4 timelines. It doesn't
> matter,
> whether we spend 100 or 200 CPU cycles per commit on this.
> What matters in this case is realistic numbers like '100+ players' and
> overall
> complexity of the algorithm.
>

I somewhat agree, but this patch exists as a performance change, so it
requires looking at performance. I'm pretty strongly allergic to
performance-oriented changes that are not data driven.

And without data showing otherwise, I tend to prefer that we default to
vector or SmallMap over more-pointer-based structures.

By the way, I know how to make AnimationHost::PushProperties to be
> O(n)-in-average AND O(n)-in-worst case (with hash map it still has the
> O(n^2)
> worst case) plus how to remove some O(n) passes.
> But the price would be a slightly more complicated implementation.
>

If concrete numbers show it's worth it, then that'd be awesome. If the win
is inside of the noise margins, simpler code wins IMO.

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

loyso (OOO)

On 2016/03/22 01:33:38, danakj wrote: > > My Xenon E5-2680 has enormous 256kb L1 cache. ...

4 years, 9 months ago (2016-03-22 02:11:36 UTC) #22

danakj

4 years, 9 months ago (2016-03-22 18:32:42 UTC) #23

Message was sent while issue was closed.

On Mon, Mar 21, 2016 at 7:11 PM, <loyso@chromium.org> wrote:

> On 2016/03/22 01:33:38, danakj wrote:
> > > My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps
> vector
> > > and
> > > bubble-sort-O(n^2)-like algorithms.
> > > I suspect that on typical ARM with 16 kb (leess associative) cache
> these
> > > two
> > > approaches would be on par for n<10.
> > You can run the perf test on an android device. I think you should
> *always*
> > judge your perf measurements on such a device, and not trust your desktop
> > that much anyways. It's pretty simple to run gtest-based perftests on a
> > phone.
> Yup. The only blocker is to get that corp phone :)
>
> > > I think, we shouldn't optimize the case with 4 timelines. It doesn't
> > > matter,
> > > whether we spend 100 or 200 CPU cycles per commit on this.
> > > What matters in this case is realistic numbers like '100+ players' and
> > > overall
> > > complexity of the algorithm.
> > I somewhat agree, but this patch exists as a performance change, so it
> > requires looking at performance. I'm pretty strongly allergic to
> > performance-oriented changes that are not data driven.
> >
> > And without data showing otherwise, I tend to prefer that we default to
> > vector or SmallMap over more-pointer-based structures.
> I understand. By the way, I have another CL with perf change where I get
> rid of
> pointer-heap-alloc-based container:
> crrev.com/1806223002. I also don't introduce any tests there because
> that's a
> win-win change IMO.
> Any suggestions?
>

That's using a less pointery data structure, which looks awesome. Numbers
are good anyway - they're good for your perf and showing your impact, and
they're good practice to make sure you're doing the right thing.

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Issue 1814103002: CC Animation: Use unordered_map instead of vector. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Use move semantics on insertions. #

Messages