|
|
Created:
7 years, 2 months ago by piman Modified:
7 years, 1 month ago CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, cc-bugs_chromium.org, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionPickle::Write* micro-optimizations
Pickle is hot in some benchmarks. This helps by reworking WriteBytes in a few of ways:
1- Fold BeginWrite and EndWrite into WriteBytes since it's the only caller now.
2- keep track of the write offset, always aligned, separately, so that we can do alignment checks on the field sizes rather than the payload size (for next point).
3- provides a template version of WriteBytes with specializations for predefined sizes, that inline/optimize away alignment checks, padding, and memcpy.
4- change the meaning of capacity_ to not take the header size into account. This simplifies some arithmetic.
BUG=307480
R=jar@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=231987
Patch Set 1 #
Total comments: 18
Patch Set 2 : rebase #
Total comments: 2
Patch Set 3 : Common WriteBytes definition #
Total comments: 9
Patch Set 4 : add overflow DCHECKs #
Total comments: 2
Patch Set 5 : Rebase on top of latest https://codereview.chromium.org/35893002/ , fold Resize-returns-void here #
Total comments: 2
Patch Set 6 : capacity_ -> capacity_after_header_ #Patch Set 7 : Fix windows #
Messages
Total messages: 28 (0 generated)
https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (left): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#oldcode314 base/pickle.cc:314: return NULL; behavior change: Resize can only fail when realloc fails, but in this case tcmalloc should abort anyway. It adds significant savings. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode328 base/pickle.cc:328: header_->payload_size = new_size; behavior change: header_->payload_size now counts the padding bytes. That may be iffy, but I think I can fix it. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode378 base/pickle.cc:378: size_t data_len = AlignInt(length, sizeof(uint32)); This gets optimized out for length%4==0 https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode381 base/pickle.cc:381: Resize(std::max(capacity_ * 2, new_size)); I tried to amortize the cost of this with the Reserve(), but just the if test has a cost. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode386 base/pickle.cc:386: memset(write + length, 0, data_len - length); This gets optimized out too for length%4==0
Cool :D I have some questions.. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, needed_size)); This will generally cause 2 resizes instead of one, since the needed_size is going to be filled and there will usually be another field after? What about resizing larger than needed_size? https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode340 base/pickle.cc:340: bool Pickle::Resize(size_t new_capacity) { make it void? https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode381 base/pickle.cc:381: Resize(std::max(capacity_ * 2, new_size)); On 2013/10/22 06:05:58, piman wrote: > I tried to amortize the cost of this with the Reserve(), but just the if test > has a cost. If Reserve gave you a buffer larger than just what you requested, could you use that instead? https://codereview.chromium.org/34413002/diff/1/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.h#newcode379 base/pickle.h:379: DCHECK_LE(length, kuint32max); how come? and why ifdefed only on 64bit? https://codereview.chromium.org/34413002/diff/1/content/common/cc_messages.cc File content/common/cc_messages.cc (right): https://codereview.chromium.org/34413002/diff/1/content/common/cc_messages.cc... content/common/cc_messages.cc:358: m->Reserve(p.shared_quad_state_list.size() * sizeof(cc::SharedQuadState) + p.quad_list.size() * sizeof(cc::RenderPassDrawQuad)); We could walk the quad list and determine the size to reserve, wdyt? https://codereview.chromium.org/34413002/diff/1/ipc/ipc_message_utils.cc File ipc/ipc_message_utils.cc (right): https://codereview.chromium.org/34413002/diff/1/ipc/ipc_message_utils.cc#newc... ipc/ipc_message_utils.cc:328: m->WriteData(reinterpret_cast<const char*>(&p), sizeof(param_type)); This is funny too, should be WriteBytes
https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, needed_size)); On 2013/10/22 17:22:05, danakj wrote: > This will generally cause 2 resizes instead of one, since the needed_size is > going to be filled and there will usually be another field after? What about > resizing larger than needed_size? Maybe add them instead of max?
https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, needed_size)); On 2013/10/22 17:23:16, danakj wrote: > On 2013/10/22 17:22:05, danakj wrote: > > This will generally cause 2 resizes instead of one, since the needed_size is > > going to be filled and there will usually be another field after? What about > > resizing larger than needed_size? > > Maybe add them instead of max? I added another test with 1000 render passes instead of one. Doing max here: 19195us Doing + here: 16344
Ready for review now, I rebased on top of https://codereview.chromium.org/35893002/ and https://codereview.chromium.org/38693003/ , and extracted a separable part into https://codereview.chromium.org/39363002/ +jam for OWNERS https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, needed_size)); On 2013/10/22 18:23:54, danakj wrote: > On 2013/10/22 17:23:16, danakj wrote: > > On 2013/10/22 17:22:05, danakj wrote: > > > This will generally cause 2 resizes instead of one, since the needed_size is > > > going to be filled and there will usually be another field after? What about > > > resizing larger than needed_size? > > > > Maybe add them instead of max? > > I added another test with 1000 render passes instead of one. > Doing max here: 19195us > Doing + here: 16344 I rebased on top of your patch (that does this). https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode328 base/pickle.cc:328: header_->payload_size = new_size; On 2013/10/22 06:05:58, piman wrote: > behavior change: header_->payload_size now counts the padding bytes. That may be > iffy, but I think I can fix it. I fixed this. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode340 base/pickle.cc:340: bool Pickle::Resize(size_t new_capacity) { On 2013/10/22 17:22:05, danakj wrote: > make it void? rebased on top of your patch that does this. https://codereview.chromium.org/34413002/diff/1/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.h#newcode379 base/pickle.h:379: DCHECK_LE(length, kuint32max); On 2013/10/22 17:22:05, danakj wrote: > how come? and why ifdefed only on 64bit? This comes from the original BeginWriteData. I assume the ifdef is because the DCHECK is always true on 32 bits. https://codereview.chromium.org/34413002/diff/1/ipc/ipc_message_utils.cc File ipc/ipc_message_utils.cc (right): https://codereview.chromium.org/34413002/diff/1/ipc/ipc_message_utils.cc#newc... ipc/ipc_message_utils.cc:328: m->WriteData(reinterpret_cast<const char*>(&p), sizeof(param_type)); On 2013/10/22 17:22:05, danakj wrote: > This is funny too, should be WriteBytes Done, though I extracted this part to a separate CL - https://codereview.chromium.org/39363002/
https://codereview.chromium.org/34413002/diff/140001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/140001/base/pickle.h#newcode219 base/pickle.h:219: return WritePOD(value); note: these actually always return true, so we can change to void. There's a whole bunch of callers that test the return value, so I was going to do that as a follow-up.
https://codereview.chromium.org/34413002/diff/140001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/140001/base/pickle.cc#newcode273 base/pickle.cc:273: template <size_t length> void Pickle::WriteBytesStatic(const void* data) { BTW, I looked at inlining this one, it gives us a boost, but at the cost of ~150KB in the binary, so I don't think it's worth it.
i'm not an owner here
Oops, -jam +jar
New version shares the code between WriteBytes and WriteBytesStatic. I verified that the same code is generated before/after.
I'm curious... have you looked to see what optimizing compilers do to the pre-existing code? It might be too hard for some... but constants at calling sites, being propagated to short function via inlining, might already be producing some of this. With the modified code, it would be good <sadly> to see assembly on each platform. You could have carried the specialization lower, and then you might have forced your intended optimizations, but for some platforms, alignment is critical, and without such assurances (which I don't think you gave the compiler), it must use memcpy or equivalent (rather than faster POD move instructions). Finally.... have you benchmarked the resulting code (if it is different)? https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode254 base/pickle.cc:254: inline void Pickle::WriteBytesCommon(const void* data, size_t length) { Note that inline is a "hint" and is traditionally totally ignored by compilers, since the compiler can usually figure this out better than the programmer.... so they ignore it. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257 base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max); nit: length is already of type size_t Also... if you have concerns about overflow, wouldn't you want to be sure you were: DCHECK_LE(length, kuint32max - sizeof(uint32) + 1) since you're about to potentially increase length a bit? ... or you could check that RSN: DCHECK(data_len >= length); and/or maybe: DCHECK(static_cast<uint32>(data_len), static_cast<uint32>(length)); ...which makes me wonder why these are DCHECKs. Is there something in the code which assures we won't have such problems, and these are really fallback developer protection? https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode260 base/pickle.cc:260: size_t new_size = write_offset_ + data_len; How are you assured that this is not an overflow? Same question around line 287. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode278 base/pickle.cc:278: WriteBytesCommon(data, length); For all the template magic... it appears that you changed calling with an argument: sizeof(foo) to calling a template which handled the type which forwarded to a template that specialized for the size that called into the common code taking length. I was *expecting* specialization to use type assignments such as moving across a uint32 when size was 4..... and using this carefully only when you had proper alignment... but that does not appear to be the case. Can you clarify why this is an optimization, rather than just a lot of use of templates? Is the hope that your explicit statement in lines 281-283 will cause the templates to be instantiated, and then all the code within them to be inlined? To be honest, I thought there was a good chance that a killer compiler could already have done most of the inlining and propagation of calling constants that you seem to be chasing. Have you looked at compiled assembly code after full optimization to see how the code already looked?
Perf-wise, using the DelegatedFrame_ManyQuads_100000_100000 benchmark which is driving this (modeled from the limiting factor in the spaceport benchmark regression cited in the bug): x86-64 gcc before: 17453 us after: 16604 us x86-64 clang before: 14447 us after: 13455 us (lower is better). So about 5 to 7% on the overall benchmark. Note, because of https://codereview.chromium.org/30593005/ that this is based on, we now skip the optimized WriteInt/Float/... in a large portion of the cases, so the optimization only benefits about 10% of the fields in this benchmark. https://codereview.chromium.org/39463002/ is a follow-up that touches the other cases. Assembly in follow-up. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode254 base/pickle.cc:254: inline void Pickle::WriteBytesCommon(const void* data, size_t length) { On 2013/10/24 22:36:48, jar wrote: > Note that inline is a "hint" and is traditionally totally ignored by compilers, > since the compiler can usually figure this out better than the programmer.... so > they ignore it. Both clang and gcc failed to inline without it. Without inlining, no specialization on the size is possible. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257 base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max); On 2013/10/24 22:36:48, jar wrote: > nit: length is already of type size_t Yep, sorry, copy-and-paste from the previous version. > > Also... if you have concerns about overflow I don't particularly have concerns about overflow. We can add DCHECKs. This is not meant to be called with untrusted data. > , wouldn't you want to be sure you > were: > DCHECK_LE(length, kuint32max - sizeof(uint32) + 1) > since you're about to potentially increase length a bit? > > ... or you could check that RSN: > DCHECK(data_len >= length); > and/or maybe: > DCHECK(static_cast<uint32>(data_len), static_cast<uint32>(length)); > > ...which makes me wonder why these are DCHECKs. > > Is there something in the code which assures we won't have such problems, and > these are really fallback developer protection? So, this is no change wrt previous code. As I mentioned, it's not meant to be called with untrusted data, nor enough data that it would overflow (e.g. messages would go over the limit of 256MB before we get to overflow). I would severely push back against adding CHECKs, because that halves the performance. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode260 base/pickle.cc:260: size_t new_size = write_offset_ + data_len; On 2013/10/24 22:36:48, jar wrote: > How are you assured that this is not an overflow? > > Same question around line 287. The previous code (in BeginWrite) didn't check for overflow. I can add DCHECKs if you want. https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode278 base/pickle.cc:278: WriteBytesCommon(data, length); On 2013/10/24 22:36:48, jar wrote: > For all the template magic... it appears that you changed calling with an > argument: > sizeof(foo) > > to calling a template which handled the type > which forwarded to a template that specialized for the size > that called into the common code taking length. > > I was *expecting* specialization to use type assignments such as moving across a > uint32 when size was 4..... and using this carefully only when you had proper > alignment... but that does not appear to be the case. It's possible to do, but then you need template instantiations for each of the types, whereas you actually only need the size of the type in the resulting code. In this version, we only need one instantiation for each of the sizes, so we have less bloat. WritePOD does the type->size translation, is fully inline and so doesn't take any space. > Can you clarify why this is an optimization, rather than just a lot of use of > templates? I indicated some of the savings in an earlier version of the patch. Essentially, if the size is known at compile-time, the compiler will: 1- elide the AlignInt logic and the memset it size is multiple of 4 (common case) 2- inline the memcpy > Is the hope that your explicit statement in lines 281-283 will cause the > templates to be instantiated, and then all the code within them to be inlined? Yes. I verified the code with objdump. At the callsites for, say, WriteInt, it doesn't have the definition for WriteBytes, so it won't inline it. Since it won't inline it, it will call the generic version, passing the size as a run-time parameter. It has no way of eliding/inlining what I mentioned above. I tried calling WriteBytes from WriteBytesStatic, but gcc failed to inline. I can't make WriteBytes itself inline because the callsites don't have the definition. So the solution was to add an intermediate inline WriteBytesCommon, that is only called where the definition is available. The compiler properly inlines/specializes (I verified with objdump). > To be honest, I thought there was a good chance that a killer compiler could > already have done most of the inlining and propagation of calling constants that > you seem to be chasing. Have you looked at compiled assembly code after full > optimization to see how the code already looked? I drove this work by looking at the output code in parallel with the performance measurements.
Assembly-wise, here's what I got. I don't have a windows machine at hand to dump there. The benefit on gcc is evident however (built with the Official settings): ======================================== x86-64 gcc ======================================== Before: 0000000000000000 <_ZN6Pickle10WriteBytesEPKvi>: 0: 4c 89 74 24 f8 mov %r14,-0x8(%rsp) 5: 4c 63 f2 movslq %edx,%r14 8: 48 89 5c 24 d8 mov %rbx,-0x28(%rsp) d: 48 89 6c 24 e0 mov %rbp,-0x20(%rsp) 12: 4c 89 64 24 e8 mov %r12,-0x18(%rsp) 17: 48 89 fd mov %rdi,%rbp 1a: 4c 89 6c 24 f0 mov %r13,-0x10(%rsp) 1f: 48 83 ec 28 sub $0x28,%rsp 23: 49 89 f5 mov %rsi,%r13 26: 4c 89 f6 mov %r14,%rsi 29: 41 89 d4 mov %edx,%r12d 2c: e8 00 00 00 00 callq 31 <_ZN6Pickle10WriteBytesEPKvi+0x31> 2d: R_X86_64_PC32 _ZN6Pickle10BeginWriteEm-0x4 31: 48 89 c3 mov %rax,%rbx 34: 31 c0 xor %eax,%eax 36: 48 85 db test %rbx,%rbx 39: 74 21 je 5c <_ZN6Pickle10WriteBytesEPKvi+0x5c> 3b: 4c 89 f2 mov %r14,%rdx 3e: 4c 89 ee mov %r13,%rsi 41: 48 89 df mov %rbx,%rdi 44: e8 00 00 00 00 callq 49 <_ZN6Pickle10WriteBytesEPKvi+0x49> 45: R_X86_64_PLT32 memcpy-0x4 49: 44 89 e2 mov %r12d,%edx 4c: 48 89 de mov %rbx,%rsi 4f: 48 89 ef mov %rbp,%rdi 52: e8 00 00 00 00 callq 57 <_ZN6Pickle10WriteBytesEPKvi+0x57> 53: R_X86_64_PC32 _ZN6Pickle8EndWriteEPci-0x4 57: b8 01 00 00 00 mov $0x1,%eax 5c: 48 8b 1c 24 mov (%rsp),%rbx 60: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp 65: 4c 8b 64 24 10 mov 0x10(%rsp),%r12 6a: 4c 8b 6c 24 18 mov 0x18(%rsp),%r13 6f: 4c 8b 74 24 20 mov 0x20(%rsp),%r14 74: 48 83 c4 28 add $0x28,%rsp 78: c3 retq 0000000000000000 <_ZN6Pickle10BeginWriteEm>: 0: 48 89 5c 24 e8 mov %rbx,-0x18(%rsp) 5: 4c 89 64 24 f8 mov %r12,-0x8(%rsp) a: 48 89 fb mov %rdi,%rbx d: 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) 12: 48 83 ec 18 sub $0x18,%rsp 16: 48 8b 57 08 mov 0x8(%rdi),%rdx 1a: 48 8b 4f 18 mov 0x18(%rdi),%rcx 1e: 8b 2a mov (%rdx),%ebp 20: 48 89 e8 mov %rbp,%rax 23: 48 f7 d8 neg %rax 26: 83 e0 03 and $0x3,%eax 29: 48 01 c5 add %rax,%rbp 2c: 4c 8d 64 35 00 lea 0x0(%rbp,%rsi,1),%r12 31: 4c 89 e0 mov %r12,%rax 34: 48 03 47 10 add 0x10(%rdi),%rax 38: 48 39 c8 cmp %rcx,%rax 3b: 76 16 jbe 53 <_ZN6Pickle10BeginWriteEm+0x53> 3d: 48 01 c9 add %rcx,%rcx 40: 48 39 c8 cmp %rcx,%rax 43: 48 89 ce mov %rcx,%rsi 46: 48 0f 43 f0 cmovae %rax,%rsi 4a: e8 00 00 00 00 callq 4f <_ZN6Pickle10BeginWriteEm+0x4f> 4b: R_X86_64_PC32 _ZN6Pickle6ResizeEm-0x4 4f: 48 8b 53 08 mov 0x8(%rbx),%rdx 53: 44 89 22 mov %r12d,(%rdx) 56: 48 8b 43 08 mov 0x8(%rbx),%rax 5a: 48 03 43 10 add 0x10(%rbx),%rax 5e: 4c 8b 64 24 10 mov 0x10(%rsp),%r12 63: 48 8b 1c 24 mov (%rsp),%rbx 67: 48 01 e8 add %rbp,%rax 6a: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp 6f: 48 83 c4 18 add $0x18,%rsp 73: c3 retq 0000000000000000 <_ZN6Pickle8EndWriteEPci>: 0: 48 63 fa movslq %edx,%rdi 3: 48 89 f8 mov %rdi,%rax 6: 83 e0 03 and $0x3,%eax 9: 75 05 jne 10 <_ZN6Pickle8EndWriteEPci+0x10> b: f3 c3 repz retq d: 0f 1f 00 nopl (%rax) 10: ba 04 00 00 00 mov $0x4,%edx 15: 48 01 f7 add %rsi,%rdi 18: 31 f6 xor %esi,%esi 1a: 48 29 c2 sub %rax,%rdx 1d: e9 00 00 00 00 jmpq 22 <.LC1+0xa> 1e: R_X86_64_PLT32 memset-0x4 After: 0000000000000000 <_ZN6Pickle16WriteBytesStaticILm4EEEvPKv>: 0: 48 89 5c 24 e8 mov %rbx,-0x18(%rsp) 5: 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) a: 48 89 fb mov %rdi,%rbx d: 4c 89 64 24 f8 mov %r12,-0x8(%rsp) 12: 48 83 ec 18 sub $0x18,%rsp 16: 48 8b 57 20 mov 0x20(%rdi),%rdx 1a: 49 89 f4 mov %rsi,%r12 1d: 48 8b 77 18 mov 0x18(%rdi),%rsi 21: 48 8d 6a 04 lea 0x4(%rdx),%rbp 25: 48 39 f5 cmp %rsi,%rbp 28: 76 13 jbe 3d <_ZN6Pickle16WriteBytesStaticILm4EEEvPKv+0x3d> 2a: 48 01 f6 add %rsi,%rsi 2d: 48 39 ee cmp %rbp,%rsi 30: 48 0f 42 f5 cmovb %rbp,%rsi 34: e8 00 00 00 00 callq 39 <_ZN6Pickle16WriteBytesStaticILm4EEEvPKv+0x39> 35: R_X86_64_PC32 _ZN6Pickle6ResizeEm-0x4 39: 48 8b 53 20 mov 0x20(%rbx),%rdx 3d: 48 8b 43 08 mov 0x8(%rbx),%rax 41: 48 03 43 10 add 0x10(%rbx),%rax 45: 41 8b 0c 24 mov (%r12),%ecx 49: 89 0c 10 mov %ecx,(%rax,%rdx,1) 4c: 48 8b 53 20 mov 0x20(%rbx),%rdx 50: 48 8b 43 08 mov 0x8(%rbx),%rax 54: 83 c2 04 add $0x4,%edx 57: 89 10 mov %edx,(%rax) 59: 48 89 6b 20 mov %rbp,0x20(%rbx) 5d: 4c 8b 64 24 10 mov 0x10(%rsp),%r12 62: 48 8b 1c 24 mov (%rsp),%rbx 66: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp 6b: 48 83 c4 18 add $0x18,%rsp 6f: c3 retq
======================================== arm32 gcc ======================================== Before: 00000000 <_ZN6Pickle10WriteBytesEPKvi>: 0: b5f8 push {r3, r4, r5, r6, r7, lr} 2: 460f mov r7, r1 4: 4611 mov r1, r2 6: 4614 mov r4, r2 8: 4606 mov r6, r0 a: f7ff fffe bl 0 <_ZN6Pickle10WriteBytesEPKvi> a: R_ARM_THM_CALL _ZN6Pickle10BeginWriteEj e: 4605 mov r5, r0 10: b150 cbz r0, 28 <_ZN6Pickle10WriteBytesEPKvi+0x28> 12: 4639 mov r1, r7 14: 4622 mov r2, r4 16: f7ff fffe bl 0 <memcpy> 16: R_ARM_THM_CALL memcpy 1a: 4630 mov r0, r6 1c: 4629 mov r1, r5 1e: 4622 mov r2, r4 20: f7ff fffe bl 0 <_ZN6Pickle10WriteBytesEPKvi> 20: R_ARM_THM_CALL _ZN6Pickle8EndWriteEPci 24: 2001 movs r0, #1 26: bdf8 pop {r3, r4, r5, r6, r7, pc} 28: bdf8 pop {r3, r4, r5, r6, r7, pc} 2a: bf00 nop 00000000 <_ZN6Pickle10BeginWriteEj>: 0: b5f8 push {r3, r4, r5, r6, r7, lr} 2: 1d03 adds r3, r0, #4 4: 4604 mov r4, r0 6: e893 1088 ldmia.w r3, {r3, r7, ip} a: 681a ldr r2, [r3, #0] c: 4255 negs r5, r2 e: f005 0503 and.w r5, r5, #3 12: 18ad adds r5, r5, r2 14: 186e adds r6, r5, r1 16: 19f7 adds r7, r6, r7 18: 4567 cmp r7, ip 1a: d907 bls.n 2c <_ZN6Pickle10BeginWriteEj+0x2c> 1c: ea4f 014c mov.w r1, ip, lsl #1 20: 428f cmp r7, r1 22: bf28 it cs 24: 4639 movcs r1, r7 26: f7ff fffe bl 0 <_ZN6Pickle10BeginWriteEj> 26: R_ARM_THM_CALL _ZN6Pickle6ResizeEj 2a: 6863 ldr r3, [r4, #4] 2c: 601e str r6, [r3, #0] 2e: 6860 ldr r0, [r4, #4] 30: 68a3 ldr r3, [r4, #8] 32: 18c0 adds r0, r0, r3 34: 1940 adds r0, r0, r5 36: bdf8 pop {r3, r4, r5, r6, r7, pc} 00000000 <_ZN6Pickle8EndWriteEPci>: 0: f012 0303 ands.w r3, r2, #3 4: d100 bne.n 8 <_ZN6Pickle8EndWriteEPci+0x8> 6: 4770 bx lr 8: 1888 adds r0, r1, r2 a: 2100 movs r1, #0 c: f1c3 0204 rsb r2, r3, #4 10: f7ff bffe b.w 0 <memset> 10: R_ARM_THM_JUMP24 memset After: 00000000 <_ZN6Pickle16WriteBytesStaticILj4EEEvPKv>: 0: b570 push {r4, r5, r6, lr} 2: 4604 mov r4, r0 4: 6903 ldr r3, [r0, #16] 6: 460e mov r6, r1 8: 68c2 ldr r2, [r0, #12] a: 1d1d adds r5, r3, #4 c: 4295 cmp r5, r2 e: d906 bls.n 1e <_ZN6Pickle16WriteBytesStaticILj4EEEvPKv+0x1e> 10: 0051 lsls r1, r2, #1 12: 42a9 cmp r1, r5 14: bf38 it cc 16: 4629 movcc r1, r5 18: f7ff fffe bl 0 <_ZN6Pickle16WriteBytesStaticILj4EEEvPKv> 18: R_ARM_THM_CALL _ZN6Pickle6ResizeEj 1c: 6923 ldr r3, [r4, #16] 1e: 6860 ldr r0, [r4, #4] 20: 68a2 ldr r2, [r4, #8] 22: 6831 ldr r1, [r6, #0] 24: 1882 adds r2, r0, r2 26: 50d1 str r1, [r2, r3] 28: 6863 ldr r3, [r4, #4] 2a: 6922 ldr r2, [r4, #16] 2c: 3204 adds r2, #4 2e: 601a str r2, [r3, #0] 30: 6125 str r5, [r4, #16] 32: bd70 pop {r4, r5, r6, pc} You can see that, before the optimization: - no inlining was done - memcpy was called - memset was called. - 3 functions necessary - 75 instructions on x86-64, 50 on arm32 - on arm32, 5 registers are pushed, and twice at that. After: - fully inlined - no memcpy - no memset - 30 instructions on x86-64, 25 on arm32 - on arm32, only 3 registers are pushed, on only once.
https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257 base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max); On 2013/10/24 22:36:48, jar wrote: > nit: length is already of type size_t Done. > > Also... if you have concerns about overflow, wouldn't you want to be sure you > were: > DCHECK_LE(length, kuint32max - sizeof(uint32) + 1) > since you're about to potentially increase length a bit? > > ... or you could check that RSN: > DCHECK(data_len >= length); > and/or maybe: > DCHECK(static_cast<uint32>(data_len), static_cast<uint32>(length)); > > ...which makes me wonder why these are DCHECKs. > > Is there something in the code which assures we won't have such problems, and > these are really fallback developer protection? I added some DCHECKs to ensure: length <= data_len <= kuint32max (i.e. no overflow in AlignInt) write_offset_ <= kuint32max - data_len <=> write_offset_ + data_len <= kuint32max (i.e. no overflow in new_size computation).
https://codereview.chromium.org/34413002/diff/310001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode215 base/pickle.h:215: bool WriteBool(bool value) { drive-by: The follow-up to this in a world where resize() returns void is to make all the write* methods void. Propagating this upwards would delete needless checks in about two dozen files.
https://codereview.chromium.org/34413002/diff/310001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode219 base/pickle.h:219: return WritePOD(value); drive-by: If I were trying to make this fast, I'd want to handle the normal (space available case) totally inline here, without any fn calls. Something along the lines of if (write_offset_ <= capacity_ - sizeof(unint32)) { *hairy-cast-expression = value; write_offset_ += sizeof(uint32) return; } slow_path(...);
On Mon, Oct 28, 2013 at 11:17 AM, <tsepez@chromium.org> wrote: > > https://codereview.chromium.**org/34413002/diff/310001/base/**pickle.h<https:... > File base/pickle.h (right): > > https://codereview.chromium.**org/34413002/diff/310001/base/** > pickle.h#newcode215<https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode215> > base/pickle.h:215: bool WriteBool(bool value) { > drive-by: The follow-up to this in a world where resize() returns void > is to make all the write* methods void. Propagating this upwards would > delete needless checks in about two dozen files. > Yes, that's absolutely the plan, as a follow up, as described in a previous PS. > > https://codereview.chromium.**org/34413002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Mon, Oct 28, 2013 at 12:09 PM, <tsepez@chromium.org> wrote: > > https://codereview.chromium.**org/34413002/diff/310001/base/**pickle.h<https:... > File base/pickle.h (right): > > https://codereview.chromium.**org/34413002/diff/310001/base/** > pickle.h#newcode219<https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode219> > base/pickle.h:219: return WritePOD(value); > drive-by: If I were trying to make this fast, I'd want to handle the > normal (space available case) totally inline here, without any fn calls. > Something along the lines of > if (write_offset_ <= capacity_ - sizeof(unint32)) { > *hairy-cast-expression = value; > write_offset_ += sizeof(uint32) > return; > } > slow_path(...); > I experimented with that, but it adds about 150KB to chrome (also mentioned in a previous PS :) ), which is a lot. I don't think the benefits outweigh that. > > https://codereview.chromium.**org/34413002/<https://codereview.chromium.org/3... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Rebased on top of Dana's latest version for https://codereview.chromium.org/35893002/ and I also folded Resize-returns-void here, PTAL. I don't feel strongly about whether we should explicitly CHECK after realloc or not. I added one in this version.
One thing to be careful here is: I've seen other suggestions for pickle changes recently. This change nicely adjusts the names of some internal fields, to have new meanings. I'm *really* worried that some other change will land, and incorrectly assume the field had the "old" meaning. As a result, rather than changing the meaning of an existing field, I'd rather see that slot (old name) removed, and a new slot (new name) added. This would cause a merge conflict with anyone using the old semantics. See suggestion below. https://codereview.chromium.org/34413002/diff/390001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/390001/base/pickle.h#newcode327 base/pickle.h:327: size_t capacity_; How about changing this name to: capacity_after_header_
https://codereview.chromium.org/34413002/diff/390001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/390001/base/pickle.h#newcode327 base/pickle.h:327: size_t capacity_; On 2013/10/29 16:50:48, jar wrote: > How about changing this name to: > capacity_after_header_ Done.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/piman@chromium.org/34413002/560001
Step "update" is always a major failure. Look at the try server FAQ for more details. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/piman@chromium.org/34413002/560001
Step "update" is always a major failure. Look at the try server FAQ for more details. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
Message was sent while issue was closed.
Committed patchset #7 manually as r231987 (presubmit successful). |