|
|
Descriptioncc: Change Nodes collection to use base::ContiguousContainer instead of deque.
This patch changes the the underlying structure to be
base::ContiguousContainer. This provides better memory behavior
and impacts performance as demonstrated by the results below.
This is an acceptable change in performance with the benefit
of memory savings.
(N7v2)
Before:
[ RUN ] RTreePerfTest.Construct
*RESULT rtree_construct: 100= 91842.1953125 runs/s
*RESULT rtree_construct: 1000= 9855.638671875 runs/s
*RESULT rtree_construct: 10000= 747.183837890625 runs/s
*RESULT rtree_construct: 100000= 63.701416015625 runs/s
[ OK ] RTreePerfTest.Construct (8245 ms)
[ RUN ] RTreePerfTest.Search
*RESULT rtree_search: 100= 363610 runs/s
*RESULT rtree_search: 1000= 97453.4921875 runs/s
*RESULT rtree_search: 10000= 13022.0244140625 runs/s
*RESULT rtree_search: 100000= 848.8214111328125 runs/s
[ OK ] RTreePerfTest.Search (8093 ms)
After:
[ RUN ] RTreePerfTest.Construct
*RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%)
*RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%)
*RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%)
*RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%)
[ OK ] RTreePerfTest.Construct (8182 ms)
[ RUN ] RTreePerfTest.Search
*RESULT rtree_search: 100= 351755 runs/s
*RESULT rtree_search: 1000= 96155 runs/s
*RESULT rtree_search: 10000= 10232.65625 runs/s
*RESULT rtree_search: 100000= 863.5242309570312 runs/s
[ OK ] RTreePerfTest.Search (8176 ms)
Additionally this changes the union operation to be faster than
gfx::Rect::Union, since there are some assumptions that we can
make. Also, Union shows up as the hottest function on Linux profiles.
BUG=674169
R=danakj@chromium.org,dskiba@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
Committed: https://crrev.com/ba09803911f3a0dc8d8e3b87f3fb8529baf4888d
Cr-Commit-Position: refs/heads/master@{#439258}
Patch Set 1 #Patch Set 2 : update #
Total comments: 2
Patch Set 3 : update #Patch Set 4 : win compile fix #Messages
Total messages: 31 (13 generated)
Description was changed from ========== cc: Change Nodes collection to use std::list instead of deque. This patch changes the the underlying structure to be std::list. This provides better memory behavior and only minimally impacts performance as demonstrated by the perftests. Before the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 425673.71875 runs/s *RESULT rtree_construct: 1000= 49820.3671875 runs/s *RESULT rtree_construct: 10000= 3905.325439453125 runs/s *RESULT rtree_construct: 100000= 369.26568603515625 runs/s [ OK ] RTreePerfTest.Construct (8057 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 3307098.25 runs/s *RESULT rtree_search: 1000= 658108.375 runs/s *RESULT rtree_search: 10000= 42578.42578125 runs/s *RESULT rtree_search: 100000= 3886.0595703125 runs/s [ OK ] RTreePerfTest.Search (8014 ms) After the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 390278.84375 runs/s *RESULT rtree_construct: 1000= 49568.08984375 runs/s *RESULT rtree_construct: 10000= 3539.865478515625 runs/s *RESULT rtree_construct: 100000= 306.250732421875 runs/s [ OK ] RTreePerfTest.Construct (8068 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 2986147 runs/s *RESULT rtree_search: 1000= 620176.875 runs/s *RESULT rtree_search: 10000= 41681.3515625 runs/s *RESULT rtree_search: 100000= 3994.408935546875 runs/s [ OK ] RTreePerfTest.Search (8012 ms) BUG=674169 R=danakj@chromium.org,dskiba@chromium.org ========== to ========== cc: Change Nodes collection to use std::list instead of deque. This patch changes the the underlying structure to be std::list. This provides better memory behavior and only minimally impacts performance as demonstrated by the perftests. Before the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 425673.71875 runs/s *RESULT rtree_construct: 1000= 49820.3671875 runs/s *RESULT rtree_construct: 10000= 3905.325439453125 runs/s *RESULT rtree_construct: 100000= 369.26568603515625 runs/s [ OK ] RTreePerfTest.Construct (8057 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 3307098.25 runs/s *RESULT rtree_search: 1000= 658108.375 runs/s *RESULT rtree_search: 10000= 42578.42578125 runs/s *RESULT rtree_search: 100000= 3886.0595703125 runs/s [ OK ] RTreePerfTest.Search (8014 ms) After the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 390278.84375 runs/s *RESULT rtree_construct: 1000= 49568.08984375 runs/s *RESULT rtree_construct: 10000= 3539.865478515625 runs/s *RESULT rtree_construct: 100000= 306.250732421875 runs/s [ OK ] RTreePerfTest.Construct (8068 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 2986147 runs/s *RESULT rtree_search: 1000= 620176.875 runs/s *RESULT rtree_search: 10000= 41681.3515625 runs/s *RESULT rtree_search: 100000= 3994.408935546875 runs/s [ OK ] RTreePerfTest.Search (8012 ms) BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Behold, a patch. This does seem to change the performance a little bit, but I think it's worth the memory savings, since it still remains a fast build/search structure.
Looks good! But I wonder what is the perf impact on Android? Can you check please?
I am thoroughly amazed that searching a list can be at all comparable. Confirming on android would be nice. On Wed, Dec 14, 2016 at 3:01 PM, <dskiba@chromium.org> wrote: > Looks good! > > But I wonder what is the perf impact on Android? Can you check please? > > https://codereview.chromium.org/2570393002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/12/14 20:14:43, danakj_OOO_and_slow wrote: > I am thoroughly amazed that searching a list can be at all comparable. > Confirming on android would be nice. > > On Wed, Dec 14, 2016 at 3:01 PM, <mailto:dskiba@chromium.org> wrote: > > > Looks good! > > > > But I wonder what is the perf impact on Android? Can you check please? > > > > https://codereview.chromium.org/2570393002/ > > (On N7v2) Before the patch: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 95188.5703125 runs/s *RESULT rtree_construct: 1000= 9984.5458984375 runs/s *RESULT rtree_construct: 10000= 746.3670654296875 runs/s *RESULT rtree_construct: 100000= 64.21125793457031 runs/s [ OK ] RTreePerfTest.Construct (8169 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 356425 runs/s *RESULT rtree_search: 1000= 98105.484375 runs/s *RESULT rtree_search: 10000= 10095.6845703125 runs/s *RESULT rtree_search: 100000= 839.9487915039062 runs/s [ OK ] RTreePerfTest.Search (8068 ms) After the patch: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 65850.984375 runs/s *RESULT rtree_construct: 1000= 6763.86376953125 runs/s *RESULT rtree_construct: 10000= 538.56201171875 runs/s *RESULT rtree_construct: 100000= 37.05949401855469 runs/s [ OK ] RTreePerfTest.Construct (8465 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 356060 runs/s *RESULT rtree_search: 1000= 96958.5 runs/s *RESULT rtree_search: 10000= 10021.33203125 runs/s *RESULT rtree_search: 100000= 815.2977905273438 runs/s [ OK ] RTreePerfTest.Search (8127 ms) There is a notable difference in the build time but not in the search time. I'm not too surprised by the results, since the container of nodes here is only used as lifetime management. The actual structure is traversed using node pointers stored on the nodes. The performance problems come from the fact that a list would allocate an item at a time when doing a push, instead of block allocation that deque would provide. I'm open to suggestions. For the cc usage, the build operation would typically happen during paint/commit (both for display list rtree and the discardable image map), whereas the search would happen repeatedly with different queries. I wouldn't mind landing this if the memory savings are both a win and a priority. If not, then maybe we should consider a different structure. We should be able to change the build algorithm a bit and estimate a decent lower bound on the number of nodes we will need, then resize a vector to that bound... That's probably a bit more involved than s/deque/list/ though.
On Wed, Dec 14, 2016 at 4:02 PM, <vmpstr@chromium.org> wrote: > On 2016/12/14 20:14:43, danakj_OOO_and_slow wrote: > > I am thoroughly amazed that searching a list can be at all comparable. > > Confirming on android would be nice. > > > > On Wed, Dec 14, 2016 at 3:01 PM, <mailto:dskiba@chromium.org> wrote: > > > > > Looks good! > > > > > > But I wonder what is the perf impact on Android? Can you check please? > > > > > > https://codereview.chromium.org/2570393002/ > > > > > (On N7v2) > Before the patch: > [ RUN ] RTreePerfTest.Construct > *RESULT rtree_construct: 100= 95188.5703125 runs/s > *RESULT rtree_construct: 1000= 9984.5458984375 runs/s > *RESULT rtree_construct: 10000= 746.3670654296875 runs/s > *RESULT rtree_construct: 100000= 64.21125793457031 runs/s > [ OK ] RTreePerfTest.Construct (8169 ms) > [ RUN ] RTreePerfTest.Search > *RESULT rtree_search: 100= 356425 runs/s > *RESULT rtree_search: 1000= 98105.484375 runs/s > *RESULT rtree_search: 10000= 10095.6845703125 runs/s > *RESULT rtree_search: 100000= 839.9487915039062 runs/s > [ OK ] RTreePerfTest.Search (8068 ms) > > After the patch: > [ RUN ] RTreePerfTest.Construct > *RESULT rtree_construct: 100= 65850.984375 runs/s > *RESULT rtree_construct: 1000= 6763.86376953125 runs/s > *RESULT rtree_construct: 10000= 538.56201171875 runs/s > *RESULT rtree_construct: 100000= 37.05949401855469 runs/s > [ OK ] RTreePerfTest.Construct (8465 ms) > [ RUN ] RTreePerfTest.Search > *RESULT rtree_search: 100= 356060 runs/s > *RESULT rtree_search: 1000= 96958.5 runs/s > *RESULT rtree_search: 10000= 10021.33203125 runs/s > *RESULT rtree_search: 100000= 815.2977905273438 runs/s > [ OK ] RTreePerfTest.Search (8127 ms) > > There is a notable difference in the build time but not in the search > time. I'm > not too surprised by the results, since the container of nodes here is > only used > as lifetime management. The actual structure is traversed using node > pointers > stored on the nodes. The performance problems come from the fact that a > list > would allocate an item at a time when doing a push, instead of block > allocation > that deque would provide. > Did you consider moving the lifetime management to the node pointers on the nodes? > > I'm open to suggestions. For the cc usage, the build operation would > typically > happen during paint/commit (both for display list rtree and the discardable > image map), whereas the search would happen repeatedly with different > queries. I > wouldn't mind landing this if the memory savings are both a win and a > priority. > > If not, then maybe we should consider a different structure. We should be > able > to change the build algorithm a bit and estimate a decent lower bound on > the > number of nodes we will need, then resize a vector to that bound... That's > probably a bit more involved than s/deque/list/ though. > > https://codereview.chromium.org/2570393002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/12/14 21:27:16, danakj_OOO_and_slow wrote: > On Wed, Dec 14, 2016 at 4:02 PM, <mailto:vmpstr@chromium.org> wrote: > > > On 2016/12/14 20:14:43, danakj_OOO_and_slow wrote: > > > I am thoroughly amazed that searching a list can be at all comparable. > > > Confirming on android would be nice. > > > > > > On Wed, Dec 14, 2016 at 3:01 PM, <mailto:dskiba@chromium.org> wrote: > > > > > > > Looks good! > > > > > > > > But I wonder what is the perf impact on Android? Can you check please? > > > > > > > > https://codereview.chromium.org/2570393002/ > > > > > > > > (On N7v2) > > Before the patch: > > [ RUN ] RTreePerfTest.Construct > > *RESULT rtree_construct: 100= 95188.5703125 runs/s > > *RESULT rtree_construct: 1000= 9984.5458984375 runs/s > > *RESULT rtree_construct: 10000= 746.3670654296875 runs/s > > *RESULT rtree_construct: 100000= 64.21125793457031 runs/s > > [ OK ] RTreePerfTest.Construct (8169 ms) > > [ RUN ] RTreePerfTest.Search > > *RESULT rtree_search: 100= 356425 runs/s > > *RESULT rtree_search: 1000= 98105.484375 runs/s > > *RESULT rtree_search: 10000= 10095.6845703125 runs/s > > *RESULT rtree_search: 100000= 839.9487915039062 runs/s > > [ OK ] RTreePerfTest.Search (8068 ms) > > > > After the patch: > > [ RUN ] RTreePerfTest.Construct > > *RESULT rtree_construct: 100= 65850.984375 runs/s > > *RESULT rtree_construct: 1000= 6763.86376953125 runs/s > > *RESULT rtree_construct: 10000= 538.56201171875 runs/s > > *RESULT rtree_construct: 100000= 37.05949401855469 runs/s > > [ OK ] RTreePerfTest.Construct (8465 ms) > > [ RUN ] RTreePerfTest.Search > > *RESULT rtree_search: 100= 356060 runs/s > > *RESULT rtree_search: 1000= 96958.5 runs/s > > *RESULT rtree_search: 10000= 10021.33203125 runs/s > > *RESULT rtree_search: 100000= 815.2977905273438 runs/s > > [ OK ] RTreePerfTest.Search (8127 ms) > > > > There is a notable difference in the build time but not in the search > > time. I'm > > not too surprised by the results, since the container of nodes here is > > only used > > as lifetime management. The actual structure is traversed using node > > pointers > > stored on the nodes. The performance problems come from the fact that a > > list > > would allocate an item at a time when doing a push, instead of block > > allocation > > that deque would provide. > > > > Did you consider moving the lifetime management to the node pointers on the > nodes? I just wrote up a patch to do this and it looks like the performance is even worse than using a list: (from z620) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 421009.375 runs/s *RESULT rtree_construct: 1000= 46561.2734375 runs/s *RESULT rtree_construct: 10000= 3284.48095703125 runs/s *RESULT rtree_construct: 100000= 369.307373046875 runs/s [ OK ] RTreePerfTest.Construct (8029 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 3999955 runs/s *RESULT rtree_search: 1000= 819258.375 runs/s *RESULT rtree_search: 10000= 52963.2265625 runs/s *RESULT rtree_search: 100000= 5261.9189453125 runs/s [ OK ] RTreePerfTest.Search (8012 ms) after: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 228269.546875 runs/s *RESULT rtree_construct: 1000= 26848.1484375 runs/s *RESULT rtree_construct: 10000= 2003.7196044921875 runs/s *RESULT rtree_construct: 100000= 195.78868103027344 runs/s [ OK ] RTreePerfTest.Construct (8087 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 3261191.75 runs/s *RESULT rtree_search: 1000= 606021.6875 runs/s *RESULT rtree_search: 10000= 40588.9453125 runs/s *RESULT rtree_search: 100000= 4410.404296875 runs/s [ OK ] RTreePerfTest.Search (8027 ms) (the patch essentially replaces a raw ptr with a unique_ptr in the node and creates a new unique ptr anywhere we would've allocated a node). > > > > > > I'm open to suggestions. For the cc usage, the build operation would > > typically > > happen during paint/commit (both for display list rtree and the discardable > > image map), whereas the search would happen repeatedly with different > > queries. I > > wouldn't mind landing this if the memory savings are both a win and a > > priority. > > > > If not, then maybe we should consider a different structure. We should be > > able > > to change the build algorithm a bit and estimate a decent lower bound on > > the > > number of nodes we will need, then resize a vector to that bound... That's > > probably a bit more involved than s/deque/list/ though. > > > > https://codereview.chromium.org/2570393002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
Interesting, is it that list's allocator ends up allocating all the nodes together in memory since thats the only allocations taking place at the time? But us calling new does not do the same? On Wed, Dec 14, 2016 at 2:03 PM, <vmpstr@chromium.org> wrote: > On 2016/12/14 21:27:16, danakj_OOO_and_slow wrote: > > > On Wed, Dec 14, 2016 at 4:02 PM, <mailto:vmpstr@chromium.org> wrote: > > > > > On 2016/12/14 20:14:43, danakj_OOO_and_slow wrote: > > > > I am thoroughly amazed that searching a list can be at all > comparable. > > > > Confirming on android would be nice. > > > > > > > > On Wed, Dec 14, 2016 at 3:01 PM, <mailto:dskiba@chromium.org> wrote: > > > > > > > > > Looks good! > > > > > > > > > > But I wonder what is the perf impact on Android? Can you check > please? > > > > > > > > > > https://codereview.chromium.org/2570393002/ > > > > > > > > > > > (On N7v2) > > > Before the patch: > > > [ RUN ] RTreePerfTest.Construct > > > *RESULT rtree_construct: 100= 95188.5703125 runs/s > > > *RESULT rtree_construct: 1000= 9984.5458984375 runs/s > > > *RESULT rtree_construct: 10000= 746.3670654296875 runs/s > > > *RESULT rtree_construct: 100000= 64.21125793457031 runs/s > > > [ OK ] RTreePerfTest.Construct (8169 ms) > > > [ RUN ] RTreePerfTest.Search > > > *RESULT rtree_search: 100= 356425 runs/s > > > *RESULT rtree_search: 1000= 98105.484375 runs/s > > > *RESULT rtree_search: 10000= 10095.6845703125 runs/s > > > *RESULT rtree_search: 100000= 839.9487915039062 runs/s > > > [ OK ] RTreePerfTest.Search (8068 ms) > > > > > > After the patch: > > > [ RUN ] RTreePerfTest.Construct > > > *RESULT rtree_construct: 100= 65850.984375 runs/s > > > *RESULT rtree_construct: 1000= 6763.86376953125 runs/s > > > *RESULT rtree_construct: 10000= 538.56201171875 runs/s > > > *RESULT rtree_construct: 100000= 37.05949401855469 runs/s > > > [ OK ] RTreePerfTest.Construct (8465 ms) > > > [ RUN ] RTreePerfTest.Search > > > *RESULT rtree_search: 100= 356060 runs/s > > > *RESULT rtree_search: 1000= 96958.5 runs/s > > > *RESULT rtree_search: 10000= 10021.33203125 runs/s > > > *RESULT rtree_search: 100000= 815.2977905273438 runs/s > > > [ OK ] RTreePerfTest.Search (8127 ms) > > > > > > There is a notable difference in the build time but not in the search > > > time. I'm > > > not too surprised by the results, since the container of nodes here is > > > only used > > > as lifetime management. The actual structure is traversed using node > > > pointers > > > stored on the nodes. The performance problems come from the fact that a > > > list > > > would allocate an item at a time when doing a push, instead of block > > > allocation > > > that deque would provide. > > > > > > > Did you consider moving the lifetime management to the node pointers on > the > > nodes? > > I just wrote up a patch to do this and it looks like the performance is > even > worse than using a list: > > (from z620) > Before: > [ RUN ] RTreePerfTest.Construct > *RESULT rtree_construct: 100= 421009.375 runs/s > *RESULT rtree_construct: 1000= 46561.2734375 runs/s > *RESULT rtree_construct: 10000= 3284.48095703125 runs/s > *RESULT rtree_construct: 100000= 369.307373046875 runs/s > [ OK ] RTreePerfTest.Construct (8029 ms) > [ RUN ] RTreePerfTest.Search > *RESULT rtree_search: 100= 3999955 runs/s > *RESULT rtree_search: 1000= 819258.375 runs/s > *RESULT rtree_search: 10000= 52963.2265625 runs/s > *RESULT rtree_search: 100000= 5261.9189453125 <(918)%20945-3125> runs/s > [ OK ] RTreePerfTest.Search (8012 ms) > > > after: > [ RUN ] RTreePerfTest.Construct > *RESULT rtree_construct: 100= 228269.546875 runs/s > *RESULT rtree_construct: 1000= 26848.1484375 runs/s > *RESULT rtree_construct: 10000= 2003.7196044921875 runs/s > *RESULT rtree_construct: 100000= 195.78868103027344 runs/s > [ OK ] RTreePerfTest.Construct (8087 ms) > [ RUN ] RTreePerfTest.Search > *RESULT rtree_search: 100= 3261191.75 runs/s > *RESULT rtree_search: 1000= 606021.6875 runs/s > *RESULT rtree_search: 10000= 40588.9453125 runs/s > *RESULT rtree_search: 100000= 4410.404296875 runs/s > [ OK ] RTreePerfTest.Search (8027 ms) > > (the patch essentially replaces a raw ptr with a unique_ptr in the node and > creates a new unique ptr anywhere we would've allocated a node). > > > > > > > > > > > I'm open to suggestions. For the cc usage, the build operation would > > > typically > > > happen during paint/commit (both for display list rtree and the > discardable > > > image map), whereas the search would happen repeatedly with different > > > queries. I > > > wouldn't mind landing this if the memory savings are both a win and a > > > priority. > > > > > > If not, then maybe we should consider a different structure. We should > be > > > able > > > to change the build algorithm a bit and estimate a decent lower bound > on > > > the > > > number of nodes we will need, then resize a vector to that bound... > That's > > > probably a bit more involved than s/deque/list/ though. > > > > > > https://codereview.chromium.org/2570393002/ > > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > https://codereview.chromium.org/2570393002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/12/15 15:53:31, danakj_OOO_and_slow wrote: > Interesting, is it that list's allocator ends up allocating all the nodes > together in memory since thats the only allocations taking place at the > time? But us calling new does not do the same? > It's possible, but I would be surprised if that was the case. I don't think MakeUnique or whatever list does to allocate the node would differ significantly in the location of memory. Maybe list prefetches stuff, idk. I also had to remove the union from Node since it's now a std::unique_ptr. Maybe that's a contributing factor. Also, there's an extra indirection in accessing the contents (unique_ptr vs raw ptr), which could show up on these perftests.
Out of all permutations of "optimizing union" and using either list or contiguous container, the current CL produces the best results on android: Aefore: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22% ???) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms)
LGTM, update CL description pls https://codereview.chromium.org/2570393002/diff/20001/cc/base/rtree.cc File cc/base/rtree.cc (right): https://codereview.chromium.org/2570393002/diff/20001/cc/base/rtree.cc#newcode85 cc/base/rtree.cc:85: auto& bounds = (*branches)[current_branch].bounds; leave a comment why you're writing union out
Description was changed from ========== cc: Change Nodes collection to use std::list instead of deque. This patch changes the the underlying structure to be std::list. This provides better memory behavior and only minimally impacts performance as demonstrated by the perftests. Before the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 425673.71875 runs/s *RESULT rtree_construct: 1000= 49820.3671875 runs/s *RESULT rtree_construct: 10000= 3905.325439453125 runs/s *RESULT rtree_construct: 100000= 369.26568603515625 runs/s [ OK ] RTreePerfTest.Construct (8057 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 3307098.25 runs/s *RESULT rtree_search: 1000= 658108.375 runs/s *RESULT rtree_search: 10000= 42578.42578125 runs/s *RESULT rtree_search: 100000= 3886.0595703125 runs/s [ OK ] RTreePerfTest.Search (8014 ms) After the change: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 390278.84375 runs/s *RESULT rtree_construct: 1000= 49568.08984375 runs/s *RESULT rtree_construct: 10000= 3539.865478515625 runs/s *RESULT rtree_construct: 100000= 306.250732421875 runs/s [ OK ] RTreePerfTest.Construct (8068 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 2986147 runs/s *RESULT rtree_search: 1000= 620176.875 runs/s *RESULT rtree_search: 10000= 41681.3515625 runs/s *RESULT rtree_search: 100000= 3994.408935546875 runs/s [ OK ] RTreePerfTest.Search (8012 ms) BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== cc: Change Nodes collection to use std::list instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Description was changed from ========== cc: Change Nodes collection to use std::list instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== cc: Change Nodes collection to use base::ContiguousContainer instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
https://codereview.chromium.org/2570393002/diff/20001/cc/base/rtree.cc File cc/base/rtree.cc (right): https://codereview.chromium.org/2570393002/diff/20001/cc/base/rtree.cc#newcode85 cc/base/rtree.cc:85: auto& bounds = (*branches)[current_branch].bounds; On 2016/12/16 14:36:22, danakj_OOO_and_slow wrote: > leave a comment why you're writing union out Done.
The CQ bit was checked by vmpstr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2570393002/#ps40001 (title: "update")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
The CQ bit was checked by vmpstr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2570393002/#ps60001 (title: "win compile fix")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by vmpstr@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 60001, "attempt_start_ts": 1481930055651470, "parent_rev": "ecdaaa67119095a9db3899c678cce6980cbeb06a", "commit_rev": "e44c1b4bb52d8406c4b4fa5025f1f6786684bfe5"}
Message was sent while issue was closed.
Description was changed from ========== cc: Change Nodes collection to use base::ContiguousContainer instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== cc: Change Nodes collection to use base::ContiguousContainer instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel Review-Url: https://codereview.chromium.org/2570393002 ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== cc: Change Nodes collection to use base::ContiguousContainer instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel Review-Url: https://codereview.chromium.org/2570393002 ========== to ========== cc: Change Nodes collection to use base::ContiguousContainer instead of deque. This patch changes the the underlying structure to be base::ContiguousContainer. This provides better memory behavior and impacts performance as demonstrated by the results below. This is an acceptable change in performance with the benefit of memory savings. (N7v2) Before: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 91842.1953125 runs/s *RESULT rtree_construct: 1000= 9855.638671875 runs/s *RESULT rtree_construct: 10000= 747.183837890625 runs/s *RESULT rtree_construct: 100000= 63.701416015625 runs/s [ OK ] RTreePerfTest.Construct (8245 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 363610 runs/s *RESULT rtree_search: 1000= 97453.4921875 runs/s *RESULT rtree_search: 10000= 13022.0244140625 runs/s *RESULT rtree_search: 100000= 848.8214111328125 runs/s [ OK ] RTreePerfTest.Search (8093 ms) After: [ RUN ] RTreePerfTest.Construct *RESULT rtree_construct: 100= 74468.8828125 runs/s (-19%) *RESULT rtree_construct: 1000= 12053.529296875 runs/s (+22%) *RESULT rtree_construct: 10000= 716.7188720703125 runs/s (-4%) *RESULT rtree_construct: 100000= 54.01991653442383 runs/s (-16%) [ OK ] RTreePerfTest.Construct (8182 ms) [ RUN ] RTreePerfTest.Search *RESULT rtree_search: 100= 351755 runs/s *RESULT rtree_search: 1000= 96155 runs/s *RESULT rtree_search: 10000= 10232.65625 runs/s *RESULT rtree_search: 100000= 863.5242309570312 runs/s [ OK ] RTreePerfTest.Search (8176 ms) Additionally this changes the union operation to be faster than gfx::Rect::Union, since there are some assumptions that we can make. Also, Union shows up as the hottest function on Linux profiles. BUG=674169 R=danakj@chromium.org,dskiba@chromium.org CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel Committed: https://crrev.com/ba09803911f3a0dc8d8e3b87f3fb8529baf4888d Cr-Commit-Position: refs/heads/master@{#439258} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/ba09803911f3a0dc8d8e3b87f3fb8529baf4888d Cr-Commit-Position: refs/heads/master@{#439258} |