|
|
Chromium Code Reviews|
Created:
4 years, 9 months ago by loyso (OOO) Modified:
4 years, 9 months ago CC:
chromium-reviews, cc-bugs_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionCC Animation: Use unordered_map instead of vector.
A performance optimization to align the implementation with
AnimationTimeline::id_to_player_map_
BUG=595584
CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel
Committed: https://crrev.com/79f47b36532d07544d99c07ad06c8a45e3e62f92
Cr-Commit-Position: refs/heads/master@{#382241}
Patch Set 1 #Patch Set 2 : Use move semantics on insertions. #
Messages
Total messages: 23 (7 generated)
Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization. BUG=595584 ========== to ========== CC Animation: Use unordered_map instead of vector. A performance optimization. BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization. BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== CC Animation: Use unordered_map instead of vector. A performance optimization to align the implementation with AnimationTimeline::id_to_player_map_ BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
loyso@chromium.org changed reviewers: + ajuma@chromium.org, vollick@chromium.org
lgtml
On 2016/03/18 13:31:17, ajuma wrote: > lgtml lgtm even
The CQ bit was checked by loyso@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1814103002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1814103002/20001
Message was sent while issue was closed.
Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization to align the implementation with AnimationTimeline::id_to_player_map_ BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== CC Animation: Use unordered_map instead of vector. A performance optimization to align the implementation with AnimationTimeline::id_to_player_map_ BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001)
Message was sent while issue was closed.
Description was changed from ========== CC Animation: Use unordered_map instead of vector. A performance optimization to align the implementation with AnimationTimeline::id_to_player_map_ BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== CC Animation: Use unordered_map instead of vector. A performance optimization to align the implementation with AnimationTimeline::id_to_player_map_ BUG=595584 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel Committed: https://crrev.com/79f47b36532d07544d99c07ad06c8a45e3e62f92 Cr-Commit-Position: refs/heads/master@{#382241} ==========
Message was sent while issue was closed.
Patchset 2 (id:??) landed as https://crrev.com/79f47b36532d07544d99c07ad06c8a45e3e62f92 Cr-Commit-Position: refs/heads/master@{#382241}
Message was sent while issue was closed.
danakj@chromium.org changed reviewers: + danakj@chromium.org
Message was sent while issue was closed.
> A performance optimization to align the implementation with > AnimationTimeline::id_to_player_map_ Why does this performance change include no performance numbers to justify it?
Message was sent while issue was closed.
On 2016/03/21 21:16:15, danakj wrote: > Why does this performance change include no performance numbers to justify it? On the one hand, this replaces the average O(n^2) (std::vector) property update with the average O(n) (hash map). On the other hand, the number of timelines is very low so the n is 1-4. Main motivation was just to align the implementation with AnimationTimeline::id_to_player_map_, where we have a perf test. Yes, we could consider this a code health issue rather than a performance one.
Message was sent while issue was closed.
On 2016/03/21 23:41:19, loyso wrote: > On 2016/03/21 21:16:15, danakj wrote: > > Why does this performance change include no performance numbers to justify it? > On the one hand, this replaces the average O(n^2) (std::vector) property update > with the average O(n) (hash map). > On the other hand, the number of timelines is very low so the n is 1-4. > Main motivation was just to align the implementation with > AnimationTimeline::id_to_player_map_, where we have a perf test. > Yes, we could consider this a code health issue rather than a performance one. Vector is faster for small numbers, so this sounds like it would be a perf regression? Is a map better than a vector for AnimationTimeline::id_to_player_map_? If it is, I don't think that's a reason for using map everywhere is it? At least I'm not understanding why you'd want this because of that.
Message was sent while issue was closed.
On 2016/03/21 23:43:48, danakj wrote: > On 2016/03/21 23:41:19, loyso wrote: > > On 2016/03/21 21:16:15, danakj wrote: > Vector is faster for small numbers, so this sounds like it would be a perf > regression? vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One day in the future we can get a case with 100 timelines. I could put a DCHECK or additional logic to limit the number of timelines. Alternatively, I have chosen O(n) instead of O(n^2) which scales with n. No problem, I'll add a perf test with 100 timelines. But I need to make it short (in time) to make it cheap. Would it be ok? > Is a map better than a vector for AnimationTimeline::id_to_player_map_? If it > is, I don't think that's a reason for using map everywhere is it? At least I'm > not understanding why you'd want this because of that. Agreed.
Message was sent while issue was closed.
p.s. JFYI, I think that crbug/588172 is unrelated to this. And I'm looking for any resource leaks (if any).
Message was sent while issue was closed.
On Mon, Mar 21, 2016 at 4:52 PM, <loyso@chromium.org> wrote: > On 2016/03/21 23:43:48, danakj wrote: > > On 2016/03/21 23:41:19, loyso wrote: > > > On 2016/03/21 21:16:15, danakj wrote: > > Vector is faster for small numbers, so this sounds like it would be a > perf > > regression? > vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One day > in > the future we can get a case with 100 timelines. > Can you explain? What is changing? How often is 100 going to happen? Is this going to be a bottleneck and worth the perf change for potentially more common cases? > I could put a DCHECK or additional logic to limit the number of timelines. > Alternatively, I have chosen O(n) instead of O(n^2) which scales with n. > > No problem, I'll add a perf test with 100 timelines. But I need to make it > short > (in time) to make it cheap. > Would it be ok? > A test with realistic small number and large number could be nice. Have you considered base::SmallMap? > > > Is a map better than a vector for AnimationTimeline::id_to_player_map_? > If it > > is, I don't think that's a reason for using map everywhere is it? At > least I'm > > not understanding why you'd want this because of that. > Agreed. > > > https://codereview.chromium.org/1814103002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Message was sent while issue was closed.
On 2016/03/22 00:00:32, danakj wrote: > > > Vector is faster for small numbers, so this sounds like it would be a > > perf > > > regression? > > vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One day > > in > > the future we can get a case with 100 timelines. > Can you explain? What is changing? > How often is 100 going to happen? Is this going to be a bottleneck and > worth the perf change for potentially more common cases? AnimationTimeline is basically a group of players. I don't see any sense in limiting any clients of AnimationHost in a number of timelines and keeping the algorithm as O(n^2) avg. The assumption was that on small n (<20), vector and hashmap are different only by some small constant. > > I could put a DCHECK or additional logic to limit the number of timelines. > > Alternatively, I have chosen O(n) instead of O(n^2) which scales with n. > > > > No problem, I'll add a perf test with 100 timelines. But I need to make it > > short > > (in time) to make it cheap. > > Would it be ok? > A test with realistic small number and large number could be nice. https://codereview.chromium.org/1823773003/ Some stats for my z620 (averaged): n=1000 vector-based PushProperties: 600 runs/sec, hashmap-based PushProperties: 17500 runs/sec n=100 vector-based PushProperties: 58300 runs/sec, hashmap-based PushProperties: 186300 runs/sec n=10 vector-based PushProperties: 2,420,000 runs/sec, hashmap-based PushProperties: 1,767,400 runs/sec n=4 vector-based PushProperties: 6,000,000 runs/sec, hashmap-based PushProperties: 3,900,000 runs/sec My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps vector and bubble-sort-O(n^2)-like algorithms. I suspect that on typical ARM with 16 kb (leess associative) cache these two approaches would be on par for n<10. > Have you considered base::SmallMap? No, I didn't know that it exist. Thanks for letting me know. I think, we shouldn't optimize the case with 4 timelines. It doesn't matter, whether we spend 100 or 200 CPU cycles per commit on this. What matters in this case is realistic numbers like '100+ players' and overall complexity of the algorithm. By the way, I know how to make AnimationHost::PushProperties to be O(n)-in-average AND O(n)-in-worst case (with hash map it still has the O(n^2) worst case) plus how to remove some O(n) passes. But the price would be a slightly more complicated implementation.
Message was sent while issue was closed.
On Mon, Mar 21, 2016 at 6:25 PM, <loyso@chromium.org> wrote: > On 2016/03/22 00:00:32, danakj wrote: > > > > > Vector is faster for small numbers, so this sounds like it would be a > > > perf > > > > regression? > > > vector is (maybe) faster for n=[1-4]. But it wouldn't be scalable. One > day > > > in > > > the future we can get a case with 100 timelines. > > Can you explain? What is changing? > > How often is 100 going to happen? Is this going to be a bottleneck and > > worth the perf change for potentially more common cases? > AnimationTimeline is basically a group of players. I don't see any sense in > limiting any clients of AnimationHost > in a number of timelines and keeping the algorithm as O(n^2) avg. > The assumption was that on small n (<20), vector and hashmap are different > only > by some small constant. > > > > I could put a DCHECK or additional logic to limit the number of > timelines. > > > Alternatively, I have chosen O(n) instead of O(n^2) which scales with > n. > > > > > > No problem, I'll add a perf test with 100 timelines. But I need to > make it > > > short > > > (in time) to make it cheap. > > > Would it be ok? > > A test with realistic small number and large number could be nice. > https://codereview.chromium.org/1823773003/ > > Some stats for my z620 (averaged): > n=1000 vector-based PushProperties: 600 runs/sec, hashmap-based > PushProperties: > 17500 runs/sec > n=100 vector-based PushProperties: 58300 runs/sec, hashmap-based > PushProperties: > 186300 runs/sec > n=10 vector-based PushProperties: 2,420,000 runs/sec, hashmap-based > PushProperties: 1,767,400 runs/sec > n=4 vector-based PushProperties: 6,000,000 runs/sec, hashmap-based > PushProperties: 3,900,000 runs/sec > Thanks for the numbers. > My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps vector > and > bubble-sort-O(n^2)-like algorithms. > I suspect that on typical ARM with 16 kb (leess associative) cache these > two > approaches would be on par for n<10. > You can run the perf test on an android device. I think you should *always* judge your perf measurements on such a device, and not trust your desktop that much anyways. It's pretty simple to run gtest-based perftests on a phone. > Have you considered base::SmallMap? > No, I didn't know that it exist. Thanks for letting me know. > > I think, we shouldn't optimize the case with 4 timelines. It doesn't > matter, > whether we spend 100 or 200 CPU cycles per commit on this. > What matters in this case is realistic numbers like '100+ players' and > overall > complexity of the algorithm. > I somewhat agree, but this patch exists as a performance change, so it requires looking at performance. I'm pretty strongly allergic to performance-oriented changes that are not data driven. And without data showing otherwise, I tend to prefer that we default to vector or SmallMap over more-pointer-based structures. By the way, I know how to make AnimationHost::PushProperties to be > O(n)-in-average AND O(n)-in-worst case (with hash map it still has the > O(n^2) > worst case) plus how to remove some O(n) passes. > But the price would be a slightly more complicated implementation. > If concrete numbers show it's worth it, then that'd be awesome. If the win is inside of the noise margins, simpler code wins IMO. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Message was sent while issue was closed.
On 2016/03/22 01:33:38, danakj wrote: > > My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps vector > > and > > bubble-sort-O(n^2)-like algorithms. > > I suspect that on typical ARM with 16 kb (leess associative) cache these > > two > > approaches would be on par for n<10. > You can run the perf test on an android device. I think you should *always* > judge your perf measurements on such a device, and not trust your desktop > that much anyways. It's pretty simple to run gtest-based perftests on a > phone. Yup. The only blocker is to get that corp phone :) > > I think, we shouldn't optimize the case with 4 timelines. It doesn't > > matter, > > whether we spend 100 or 200 CPU cycles per commit on this. > > What matters in this case is realistic numbers like '100+ players' and > > overall > > complexity of the algorithm. > I somewhat agree, but this patch exists as a performance change, so it > requires looking at performance. I'm pretty strongly allergic to > performance-oriented changes that are not data driven. > > And without data showing otherwise, I tend to prefer that we default to > vector or SmallMap over more-pointer-based structures. I understand. By the way, I have another CL with perf change where I get rid of pointer-heap-alloc-based container: crrev.com/1806223002. I also don't introduce any tests there because that's a win-win change IMO. Any suggestions?
Message was sent while issue was closed.
On Mon, Mar 21, 2016 at 7:11 PM, <loyso@chromium.org> wrote: > On 2016/03/22 01:33:38, danakj wrote: > > > My Xenon E5-2680 has enormous 256kb L1 cache. Cache locality helps > vector > > > and > > > bubble-sort-O(n^2)-like algorithms. > > > I suspect that on typical ARM with 16 kb (leess associative) cache > these > > > two > > > approaches would be on par for n<10. > > You can run the perf test on an android device. I think you should > *always* > > judge your perf measurements on such a device, and not trust your desktop > > that much anyways. It's pretty simple to run gtest-based perftests on a > > phone. > Yup. The only blocker is to get that corp phone :) > > > > I think, we shouldn't optimize the case with 4 timelines. It doesn't > > > matter, > > > whether we spend 100 or 200 CPU cycles per commit on this. > > > What matters in this case is realistic numbers like '100+ players' and > > > overall > > > complexity of the algorithm. > > I somewhat agree, but this patch exists as a performance change, so it > > requires looking at performance. I'm pretty strongly allergic to > > performance-oriented changes that are not data driven. > > > > And without data showing otherwise, I tend to prefer that we default to > > vector or SmallMap over more-pointer-based structures. > I understand. By the way, I have another CL with perf change where I get > rid of > pointer-heap-alloc-based container: > crrev.com/1806223002. I also don't introduce any tests there because > that's a > win-win change IMO. > Any suggestions? > That's using a less pointery data structure, which looks awesome. Numbers are good anyway - they're good for your perf and showing your impact, and they're good practice to make sure you're doing the right thing. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
