|
|
Created:
7 years, 2 months ago by danakj Modified:
7 years, 1 month ago CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, cc-bugs_chromium.org, erikwright+watch_chromium.org, jam, rvargas (doing something else) Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionIPC pickling optimization for render passes.
Call Pickle::Reserve() for the size of the data in the shared quad
state and quad lists. This prevents the WriteFoo() invocations for
all of the quad/shared quad states from causing memory re-allocations
and moves.
This is based on https://codereview.chromium.org/34413002/ from piman@.
This is also based after https://codereview.chromium.org/30593005/ and
perf numbers (both before and after) include that CL also.
content_perftest results on linux chromeos release official build:
BEFORE
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_4000= 50 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_100000= 1888 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_4000_4000= 728 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_100000_100000= 23771 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyRenderPasses_10000_100= 24118 us
AFTER
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_4000= 48 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_100000= 1626 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_4000_4000= 460 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_100000_100000= 14771 us
*RESULT mean_frame_serialization_time: DelegatedFrame_ManyRenderPasses_10000_100= 15626 us
This gives a further ~1.5x improvement in serialization time for shared quad
states and render passes.
R=jar@chromium.org, piman@chromium.org, tsepez@chromium.org, piman
BUG=307480
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=231656
Patch Set 1 : ccmessagesperf-reserve: #Patch Set 2 : ccmessagesperf-reserve: one reserve #Patch Set 3 : ccmessagesperf-reserve: noextraquadwalk #Patch Set 4 : ccmessagesperf-reserve: add algorithm header #
Total comments: 10
Patch Set 5 : ccmessagesperf-reserve: rename #Patch Set 6 : ccmessagesperf-reserve: move ReserveSize out of header #Patch Set 7 : ccmessagesperf-reserve: undo-returnvalues #Patch Set 8 : ccmessagesperf-reserve: fixbuild #
Total comments: 4
Patch Set 9 : ccmessagesperf-reserve: nits #
Messages
Total messages: 41 (0 generated)
I originally wrote this in the description: BEFORE *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_4000= 50 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_100000= 1888 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_4000_4000= 728 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_100000_100000= 23771 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyRenderPasses_10000_100= 24118 us AFTER *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_4000= 58 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_100000= 1685 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_4000_4000= 503 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_100000_100000= 15362 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyRenderPasses_10000_100= 18014 us However patch set 2 does better by doing one reserve for the entire DelegatedFrameData: AFTER *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_4000= 48 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_1_100000= 1626 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_4000_4000= 460 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyQuads_100000_100000= 14771 us *RESULT mean_frame_serialization_time: DelegatedFrame_ManyRenderPasses_10000_100= 15626 us
https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... File content/common/cc_messages.cc (right): https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... content/common/cc_messages.cc:388: to_reserve += sizeof(cc::CheckerboardDrawQuad); note: a bunch of the objects serialize bigger than the class. Each serialized field is effectively aligned on 32 bits when serialized, so we lose a lot of packing (e.g. consecutive bools). Given that, I'm not sure it's meaningful to spend time being tight on the allocation - it's short lived, and it's still less than a few MB in the worst case, but if we miss by just one byte, we'll have to pay a full realloc.
On 2013/10/22 20:22:26, piman wrote: > https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... > File content/common/cc_messages.cc (right): > > https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... > content/common/cc_messages.cc:388: to_reserve += > sizeof(cc::CheckerboardDrawQuad); > note: a bunch of the objects serialize bigger than the class. Each serialized > field is effectively aligned on 32 bits when serialized, so we lose a lot of > packing (e.g. consecutive bools). > > Given that, I'm not sure it's meaningful to spend time being tight on the > allocation - it's short lived, and it's still less than a few MB in the worst > case, but if we miss by just one byte, we'll have to pay a full realloc. Fair enough, went back to patchset 2, which just uses sizeof(cc::RenderPass). Think that's reasonable?
On 2013/10/22 20:37:27, danakj wrote: > On 2013/10/22 20:22:26, piman wrote: > > > https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... > > File content/common/cc_messages.cc (right): > > > > > https://codereview.chromium.org/35893002/diff/70002/content/common/cc_message... > > content/common/cc_messages.cc:388: to_reserve += > > sizeof(cc::CheckerboardDrawQuad); > > note: a bunch of the objects serialize bigger than the class. Each serialized > > field is effectively aligned on 32 bits when serialized, so we lose a lot of > > packing (e.g. consecutive bools). > > > > Given that, I'm not sure it's meaningful to spend time being tight on the > > allocation - it's short lived, and it's still less than a few MB in the worst > > case, but if we miss by just one byte, we'll have to pay a full realloc. > > Fair enough, went back to patchset 2, which just uses sizeof(cc::RenderPass). > Think that's reasonable? Oh, sorry, I should have been explicit. The main thing I was thinking about is avoiding the switch on the quad type, and instead take the largest quad type, especially since it's likely some of these quad types will serialize larger than their C++ size, we can use some extra headroom. LGTM whichever way you think is best.
On 2013/10/22 23:54:38, piman wrote: > Oh, sorry, I should have been explicit. The main thing I was thinking about is > avoiding the switch on the quad type, and instead take the largest quad type, > especially since it's likely some of these quad types will serialize larger than > their C++ size, we can use some extra headroom. ...but if the largest quad type is common, and also serializes larger than its C++ size, aren't we setting ourselves up for lots of reallocations anyway?
So, my thought is that I don't like hardcoding the "largest" quad cuz if that changes one day, are we going to update this code? We'll forget I am sure. If the serialization is a bit bigger for each quad, it won't be more than 2x the size of the quad itself. So in the end no matter what we should only get 1 extra reallocation, since it always doubles. It would be nice to not realloc one extra time, but to do that I think we'd need to pack the structs on 32bit boundaries as well to make them the same size.. I don't have any other cool ideas. (And we could serialize bools as bitmasks.)
On Wed, Oct 23, 2013 at 8:19 AM, <danakj@chromium.org> wrote: > So, my thought is that I don't like hardcoding the "largest" quad cuz if > that > changes one day, are we going to update this code? We'll forget I am sure. > Maybe we can add a test for that :) > If the serialization is a bit bigger for each quad, it won't be more than > 2x the > size of the quad itself. So in the end no matter what we should only get 1 > extra > reallocation, since it always doubles. > > It would be nice to not realloc one extra time, but to do that I think > we'd need > to pack the structs on 32bit boundaries as well to make them the same > size.. I > don't have any other cool ideas. (And we could serialize bools as > bitmasks.) > > https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2013/10/23 23:52:49, piman wrote: > On Wed, Oct 23, 2013 at 8:19 AM, <mailto:danakj@chromium.org> wrote: > > > So, my thought is that I don't like hardcoding the "largest" quad cuz if > > that > > changes one day, are we going to update this code? We'll forget I am sure. > > > > Maybe we can add a test for that :) Also, I measured the cost of walking the quad list and doing the switch to be about 10% on the benchmark. > > > > If the serialization is a bit bigger for each quad, it won't be more than > > 2x the > > size of the quad itself. So in the end no matter what we should only get 1 > > extra > > reallocation, since it always doubles. > > > > It would be nice to not realloc one extra time, but to do that I think > > we'd need > > to pack the structs on 32bit boundaries as well to make them the same > > size.. I > > don't have any other cool ideas. (And we could serialize bools as > > bitmasks.) > > > > > https://codereview.chromium.**org/35893002/%3Chttps://codereview.chromium.org...> > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
On Thu, Oct 24, 2013 at 2:18 AM, <piman@chromium.org> wrote: > On 2013/10/23 23:52:49, piman wrote: > > On Wed, Oct 23, 2013 at 8:19 AM, <mailto:danakj@chromium.org> wrote: >> > > > So, my thought is that I don't like hardcoding the "largest" quad cuz if >> > that >> > changes one day, are we going to update this code? We'll forget I am >> sure. >> > >> > > Maybe we can add a test for that :) >> > > Also, I measured the cost of walking the quad list and doing the switch to > be > about 10% on the benchmark. Oh jeez, really! I hadn't tried/seen that. Thanks. A test sounds like a nice idea. > > > > > If the serialization is a bit bigger for each quad, it won't be more >> than >> > 2x the >> > size of the quad itself. So in the end no matter what we should only >> get 1 >> > extra >> > reallocation, since it always doubles. >> > >> > It would be nice to not realloc one extra time, but to do that I think >> > we'd need >> > to pack the structs on 32bit boundaries as well to make them the same >> > size.. I >> > don't have any other cool ideas. (And we could serialize bools as >> > bitmasks.) >> > >> > >> > > https://codereview.chromium.****org/35893002/%3Chttps://codere** > view.chromium.org/35893002/ <http://codereview.chromium.org/35893002/>> > >> > >> > > To unsubscribe from this group and stop receiving emails from it, send an >> > email > >> to mailto:chromium-reviews+**unsubscribe@chromium.org<chromium-reviews%2Bunsubscribe@chromium.org> >> . >> > > > > https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2013/10/24 14:41:07, danakj wrote: > On Thu, Oct 24, 2013 at 2:18 AM, <mailto:piman@chromium.org> wrote: > > > On 2013/10/23 23:52:49, piman wrote: > > > > On Wed, Oct 23, 2013 at 8:19 AM, <mailto:danakj@chromium.org> wrote: > >> > > > > > So, my thought is that I don't like hardcoding the "largest" quad cuz if > >> > that > >> > changes one day, are we going to update this code? We'll forget I am > >> sure. > >> > > >> > > > > Maybe we can add a test for that :) > >> > > > > Also, I measured the cost of walking the quad list and doing the switch to > > be > > about 10% on the benchmark. > > > Oh jeez, really! I hadn't tried/seen that. Thanks. Hm I'm still not seeing this. Any change from doing the walk or not is completely imperceptible to me.
+jar for base/ OWNERS
+tsepez for cc_messages.h (no IPC message change, just reserving memory ahead of time)
On 2013/10/24 15:03:47, danakj wrote: > > > Also, I measured the cost of walking the quad list and doing the switch to > > > be > > > about 10% on the benchmark. > > > > > > Oh jeez, really! I hadn't tried/seen that. Thanks. > > Hm I'm still not seeing this. Any change from doing the walk or not is > completely imperceptible to me. Oh, I was looking at the tests dominated by SharedQuadStates, I see what you say on the DelegatedFrame_ManyQuads_1_100000 test. Changed it to not do the walk and added a unit test.
https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc#newcode312 base/pickle.cc:312: Resize(capacity_ * 2 + needed_size); Why 2x? Also, this should be a bool return value, see below. https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc#newcode340 base/pickle.cc:340: CHECK_NE(capacity_, kCapacityReadOnly); There's a minor problem here in that if realloc fails, the old memory block is not freed. So you'd leak here if you don't check the return codes as was done previously. https://codereview.chromium.org/35893002/diff/460002/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/35893002/diff/460002/base/pickle.h#newcode279 base/pickle.h:279: void Reserve(size_t size); Is |size| the additional amount of space to reserve, or the total space to reserve? I'd name the parameter |additional_capacity| or |new_capacity| accordingly to match Resize() below. https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... File content/common/cc_messages.h (right): https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... content/common/cc_messages.h:80: static size_t ReserveSizeForWrite(const param_type& p); I'm not sure making this a method of this particular specialization of the templated paramtraits class is the right thing to do since none of the other specializations will implement this. It doesn't need to be called oustide of the write() method, so I'd suggest using a static function or a function in an anonymous namespace in the .cc file. https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... File content/common/cc_messages_unittest.cc (right): https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... content/common/cc_messages_unittest.cc:606: EXPECT_EQ(sizeof(RenderPassDrawQuad), largest); Create a union consisting of all these types, and use sizeof() on it to get the biggest size.
https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc#newcode312 base/pickle.cc:312: Resize(capacity_ * 2 + needed_size); On 2013/10/24 17:48:06, Tom Sepez wrote: > Why 2x? Also, this should be a bool return value, see below. Same reason as in BeginWrite, in order to amortize the cost of expanding the size of the array. https://codereview.chromium.org/35893002/diff/460002/base/pickle.cc#newcode340 base/pickle.cc:340: CHECK_NE(capacity_, kCapacityReadOnly); On 2013/10/24 17:48:06, Tom Sepez wrote: > There's a minor problem here in that if realloc fails, the old memory block is > not freed. So you'd leak here if you don't check the return codes as was done > previously. if realloc fails, we will crash inside tcmalloc no? https://codereview.chromium.org/35893002/diff/460002/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/35893002/diff/460002/base/pickle.h#newcode279 base/pickle.h:279: void Reserve(size_t size); On 2013/10/24 17:48:06, Tom Sepez wrote: > Is |size| the additional amount of space to reserve, or the total space to > reserve? I'd name the parameter |additional_capacity| or |new_capacity| > accordingly to match Resize() below. Additional space, I will rename it to additional_capacity. https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... File content/common/cc_messages.h (right): https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... content/common/cc_messages.h:80: static size_t ReserveSizeForWrite(const param_type& p); On 2013/10/24 17:48:06, Tom Sepez wrote: > I'm not sure making this a method of this particular specialization of the > templated paramtraits class is the right thing to do since none of the other > specializations will implement this. It doesn't need to be called oustide of the > write() method, so I'd suggest using a static function or a function in an > anonymous namespace in the .cc file. Alright. I wanted to keep it close to the Write() method so that people would more obviously see to update both together.. https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... File content/common/cc_messages_unittest.cc (right): https://codereview.chromium.org/35893002/diff/460002/content/common/cc_messag... content/common/cc_messages_unittest.cc:606: EXPECT_EQ(sizeof(RenderPassDrawQuad), largest); On 2013/10/24 17:48:06, Tom Sepez wrote: > Create a union consisting of all these types, and use sizeof() on it to get the > biggest size. That does not create a compiler error, however, if a new type is added, which was my goal here.
> if realloc fails, we will crash inside tcmalloc no? > Ideally, we want this to run against a bunch of different allocators, not just TCMalloc, since it could be replaced at any time. It's a minor point however, but I don't see a reason to change this -- its orthogonal to the purpose of the CL. > That does not create a compiler error, however, if a new type is added, which > was my goal here. Ok, given that it's only test code.
On Thu, Oct 24, 2013 at 2:00 PM, <tsepez@chromium.org> wrote: > > if realloc fails, we will crash inside tcmalloc no? >> > > Ideally, we want this to run against a bunch of different allocators, not > just > TCMalloc, > since it could be replaced at any time. It's a minor point however, but I > don't > see a reason to change this -- its orthogonal to the purpose of the CL. I can make it a separate CL, where we can measure the perf impact more objectively then. > > > That does not create a compiler error, however, if a new type is added, >> which >> was my goal here. >> > Ok, given that it's only test code. > > > > https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
LGTM, but you still need an owner for base/
Thanks. jar@ please have a look at base/
On Thu, Oct 24, 2013 at 11:00 AM, <tsepez@chromium.org> wrote: > > if realloc fails, we will crash inside tcmalloc no? >> > > Ideally, we want this to run against a bunch of different allocators, not > just > TCMalloc, > since it could be replaced at any time. It's a minor point however, but I > don't > see a reason to change this -- its orthogonal to the purpose of the CL. In my testing, allowing the Write* to fail had a fairly high performance impact. Most places don't test it at all, so I don't think we can legitimately say we could change the allocator and allowing it to fail. > > > That does not create a compiler error, however, if a new type is added, >> which >> was my goal here. >> > Ok, given that it's only test code. > > > > https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Thu, Oct 24, 2013 at 3:40 PM, Antoine Labour <piman@chromium.org> wrote: > > > > On Thu, Oct 24, 2013 at 11:00 AM, <tsepez@chromium.org> wrote: > >> >> if realloc fails, we will crash inside tcmalloc no? >>> >> >> Ideally, we want this to run against a bunch of different allocators, not >> just >> TCMalloc, >> since it could be replaced at any time. It's a minor point however, but >> I don't >> see a reason to change this -- its orthogonal to the purpose of the CL. > > > In my testing, allowing the Write* to fail had a fairly high performance > impact. > Most places don't test it at all, so I don't think we can legitimately say > we could change the allocator and allowing it to fail. > I'll put up another CL with it and some numbers to justify it alone. > >> >> That does not create a compiler error, however, if a new type is added, >>> which >>> was my goal here. >>> >> Ok, given that it's only test code. >> >> >> >> https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... >> > > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Reading these CLs, I'm beginning to wonder about whether there is a better way of passing this data structure around, then the pickling of a bazillion tiny items. I'm assuming it is really hot in the profiler... and wondering if there is a shared memory solution that would work better (much faster overall)? https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc#newcode308 base/pickle.cc:308: // write at a uint32-aligned offset from the beginning of the header nit: Start with Upper case, end with a period. (may as well fix line 318 too). https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(capacity_ * 2 + needed_size); Why did you double the existing capacity? Also... Are you at all concerned about overflows, especially since this is called as a public function, with presumably less care and scrutiny than other callers of Resize()? FWIW: Realloc already plays such games (exponential growth, by a factor of 2 as I recall) when increasing sizes. As a result, the number of calls to malloc (or certainly TCMalloc) during these Resize() operations is limited to about log2(final size). Was this really a perf issue?? TCMalloc is fast... and log2 of most things are small. It strikes me that perchance the benchmarks are run with a very cold TCMalloc cache, which needed to do page allocations afresh, and hence you're measuring TCMalloc initialization time. When you "skip" those intermediate allocations, you avoid having to warm up the cache (via OS calls). Can you verify that re-running the tests more than once still shows a the same perf numbers? In a live Chromium app, the cache is generally quite well warmed (sadly, having allocated lots of memory).
I don't think we need to avoid using the ipc pickle for these things if we can make it fast enough. Doing some shared memory thing should be a last resort IMO as it's a lot more work+code+surface area for bugs. I'd much prefer to make pickling fast enough to work for this use case. Doing a Reserve() up front is minimal and allows us to make use of all the normal pickle methods that verify each element of the message as usual and discard invalid messages. This it the value of using pickle, I feel. https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc#newcode308 base/pickle.cc:308: // write at a uint32-aligned offset from the beginning of the header On 2013/10/25 01:54:18, jar wrote: > nit: Start with Upper case, end with a period. (may as well fix line 318 too). Done. https://codereview.chromium.org/35893002/diff/700001/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(capacity_ * 2 + needed_size); On 2013/10/25 01:54:18, jar wrote: > Why did you double the existing capacity? For the same reason we double it in BeginWrite(). Pickling N equally sized things via reserve should not cause N reallocs to take place. This acts like BeginWrite but is a hint that there are more writes coming after it that will take up at least this much space. It does not try to be any more special than that. > Also... > > Are you at all concerned about overflows, especially since this is called as a > public function, with presumably less care and scrutiny than other callers of > Resize()? I'm no more concerned than we are in BeginWrite, and I'm not sure why I would want to be? If we don't Reserve() then we're going to just call WriteBytes()->BeginWrite() this many times anyways. I don't see the difference here, am I overlooking something? > FWIW: Realloc already plays such games (exponential growth, by a factor of 2 as > I recall) when increasing sizes. As a result, the number of calls to malloc (or > certainly TCMalloc) during these Resize() operations is limited to about > log2(final size). Right, but if we write 4000 objects with 10 fields each (and previously 2-16 elements in each field), we end up with a lot of calls to realloc anyways. Doing it once is far superior to log_2(n) times, which was I guess around 14 times. > Was this really a perf issue?? TCMalloc is fast... and log2 of most things are > small. Yeh, you can see the bug for more information if you like. We were spending 3ms just pickling a frame of data for the spaceport benchmark. This is a pathalogical case, but one we care about being fast for. > It strikes me that perchance the benchmarks are run with a very cold TCMalloc > cache, which needed to do page allocations afresh, and hence you're measuring > TCMalloc initialization time. > > When you "skip" those intermediate allocations, you avoid having to warm up the > cache (via OS calls). > > Can you verify that re-running the tests more than once still shows a the same > perf numbers? In a live Chromium app, the cache is generally quite well warmed > (sadly, having allocated lots of memory). The perf tests are here: https://code.google.com/p/chromium/codesearch#chromium/src/content/common/cc_... We write the ipc 10 times as a warm up, then write it 100 times and take the mean of those 100 times as the result. Is that what you are thinking of?
drive-by: I'd expect that its the repeated memcpy's as part of the reallocs that cost the most. Also, looking at the smallest possible time in each case is often insightful, as the mean can get blown out by other activity on the system.
On Fri, Oct 25, 2013 at 2:52 PM, <tsepez@chromium.org> wrote: > drive-by: I'd expect that its the repeated memcpy's as part of the > reallocs that > cost the most. Also, looking at the smallest possible time in each case is > often > insightful, as the mean can get blown out by other activity on the system. > Yes I agree about the memcpy completely. Also thanks for the min idea, I'll do that. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
The more I look at it, the more I like the idea of removing the bool return from resize and simply CHECK()ing for NULL returns from calloc, should we wind up in a world where such a thing is possible. Propogating this upward, it looks like we could remove checks from about two dozen other files. Did you (danakj) produce the follow-up?
We shouldn't CHECK the return value from allocation functions, that's a waste of cycles and code size. Chromium requires that all heap allocators either return a valid pointer or crash and always has. If some allocator is unable to do this we should either fix it or add a check inside the allocator or in allocator-specific code and not the common codepath. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Mon, Oct 28, 2013 at 1:00 PM, <tsepez@chromium.org> wrote: > The more I look at it, the more I like the idea of removing the bool > return from > resize and simply CHECK()ing for NULL returns from calloc, should we wind > up in > a world where such a thing is possible. Propogating this upward, it looks > like > we could remove checks from about two dozen other files. Did you (danakj) > produce the follow-up? > I think piman@ has a CL up to remove the return value checking but I'm having trouble finding it now. > > https://codereview.chromium.**org/35893002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Mon, Oct 28, 2013 at 1:35 PM, Dana Jansens <danakj@chromium.org> wrote: > On Mon, Oct 28, 2013 at 1:00 PM, <tsepez@chromium.org> wrote: > >> The more I look at it, the more I like the idea of removing the bool >> return from >> resize and simply CHECK()ing for NULL returns from calloc, should we wind >> up in >> a world where such a thing is possible. Propogating this upward, it looks >> like >> we could remove checks from about two dozen other files. Did you (danakj) >> produce the follow-up? >> > > I think piman@ has a CL up to remove the return value checking but I'm > having trouble finding it now. > Oh it's https://codereview.chromium.org/34413002/ but that needs to be rebased since this CL doesn't make Resize void anymore. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
jar: ping
LGTM
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/danakj@chromium.org/35893002/800001
Retried try job too often on win7_aura for step(s) browser_tests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win7_aura&...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/danakj@chromium.org/35893002/800001
One question I kept meaning to ask.... On what platform did you do the perf comparison (that you listed in the description of this patch)? As I recall, on Windows, all unit tests use the default Windows allocator, and not the much more efficient TCMalloc.
On Tue, Oct 29, 2013 at 11:56 AM, <jar@chromium.org> wrote: > One question I kept meaning to ask.... > > On what platform did you do the perf comparison (that you listed in the > description of this patch)? > > As I recall, on Windows, all unit tests use the default Windows allocator, > and > not the much more efficient TCMalloc. > Z620 Linux on a chromeos=1 buildtype=Official build. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2013/10/29 15:56:51, jar wrote: > One question I kept meaning to ask.... > > On what platform did you do the perf comparison (that you listed in the > description of this patch)? > > As I recall, on Windows, all unit tests use the default Windows allocator, and > not the much more efficient TCMalloc. If we aren't using TCMalloc on Windows or Android, shouldn't we be comparing against the slowest allocator, rather than the fastest? I'd love to get tcmalloc on Android for performance reasons, but the memory bloat is considered unacceptable because we're even more tightly RAM-limited than CPU-limited.
On Tue, Oct 29, 2013 at 12:06 PM, <tomhudson@chromium.org> wrote: > On 2013/10/29 15:56:51, jar wrote: > >> One question I kept meaning to ask.... >> > > On what platform did you do the perf comparison (that you listed in the >> description of this patch)? >> > > As I recall, on Windows, all unit tests use the default Windows >> allocator, and >> not the much more efficient TCMalloc. >> > > If we aren't using TCMalloc on Windows or Android, shouldn't we be > comparing > against the slowest allocator, rather than the fastest? > I'd love to get tcmalloc on Android for performance reasons, but the memory > bloat is considered unacceptable because we're even more tightly > RAM-limited > than CPU-limited. > If we get wins by avoiding allocation on a fast allocator, the wins should be larger with a slow allocator. What we should be doing is running perf tests on android bots, but I think the change demonstrated on linux justifies it for android as well. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/danakj@chromium.org/35893002/800001
Message was sent while issue was closed.
Committed patchset #9 manually as r231656 (presubmit successful). |