Issue 34413002: Pickle::Write* micro-optimizations

Issue 34413002: Pickle::Write* micro-optimizations (Closed)

Created:
7 years, 2 months ago by piman

Modified:
7 years, 1 month ago

Reviewers:
danakj, Tom Sepez, jar (doing other things)

CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, cc-bugs_chromium.org, erikwright+watch_chromium.org

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

Pickle::Write* micro-optimizations Pickle is hot in some benchmarks. This helps by reworking WriteBytes in a few of ways: 1- Fold BeginWrite and EndWrite into WriteBytes since it's the only caller now. 2- keep track of the write offset, always aligned, separately, so that we can do alignment checks on the field sizes rather than the payload size (for next point). 3- provides a template version of WriteBytes with specializations for predefined sizes, that inline/optimize away alignment checks, padding, and memcpy. 4- change the meaning of capacity_ to not take the header size into account. This simplifies some arithmetic. BUG=307480 R=jar@chromium.org Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=231987

Patch Set 1 #

Total comments: 18

Patch Set 2 : rebase #

Total comments: 2

Patch Set 3 : Common WriteBytes definition #

Total comments: 9

Patch Set 4 : add overflow DCHECKs #

Total comments: 2

Patch Set 5 : Rebase on top of latest https://codereview.chromium.org/35893002/ , fold Resize-returns-void here #

Total comments: 2

Patch Set 6 : capacity_ -> capacity_after_header_ #

Patch Set 7 : Fix windows #

Created: 7 years, 1 month ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+91 lines, -91 lines)			Patch
M	base/pickle.h	View	1 2 3 4 5	5 chunks	+30 lines, -29 lines	0 comments	Download
M	base/pickle.cc	View	1 2 3 4 5 6	7 chunks	+58 lines, -59 lines	0 comments	Download
M	base/pickle_unittest.cc	View	1 2 3 4 5	1 chunk	+3 lines, -3 lines	0 comments	Download

Messages

Total messages: 28 (0 generated)

Expand Messages | Collapse Messages

piman

https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (left): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#oldcode314 base/pickle.cc:314: return NULL; behavior change: Resize can only fail when ...

7 years, 2 months ago (2013-10-22 06:05:58 UTC) #1

danakj

Cool :D I have some questions.. https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, ...

7 years, 2 months ago (2013-10-22 17:22:05 UTC) #2

danakj

https://codereview.chromium.org/34413002/diff/1/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/1/base/pickle.cc#newcode314 base/pickle.cc:314: Resize(std::max(capacity_ * 2, needed_size)); On 2013/10/22 17:22:05, danakj wrote: ...

7 years, 2 months ago (2013-10-22 17:23:15 UTC) #3

danakj

7 years, 2 months ago (2013-10-22 18:23:54 UTC) #4

piman

Ready for review now, I rebased on top of https://codereview.chromium.org/35893002/ and https://codereview.chromium.org/38693003/ , and extracted ...

7 years, 2 months ago (2013-10-24 06:17:13 UTC) #5

piman

https://codereview.chromium.org/34413002/diff/140001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/140001/base/pickle.h#newcode219 base/pickle.h:219: return WritePOD(value); note: these actually always return true, so ...

7 years, 2 months ago (2013-10-24 06:24:00 UTC) #6

piman

https://codereview.chromium.org/34413002/diff/140001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/140001/base/pickle.cc#newcode273 base/pickle.cc:273: template <size_t length> void Pickle::WriteBytesStatic(const void* data) { BTW, ...

7 years, 2 months ago (2013-10-24 06:29:24 UTC) #7

piman

New version shares the code between WriteBytes and WriteBytesStatic. I verified that the same code ...

7 years, 2 months ago (2013-10-24 21:20:17 UTC) #10

jar (doing other things)

I'm curious... have you looked to see what optimizing compilers do to the pre-existing code? ...

7 years, 2 months ago (2013-10-24 22:36:48 UTC) #11

I'm curious... have you looked to see what optimizing compilers do to the
pre-existing code?  It might be too hard for some... but constants at calling
sites, being propagated to short function via inlining, might already be
producing some of this.

With the modified code, it would be good <sadly> to see assembly on each
platform.  You could have carried the specialization lower, and then you might
have forced your intended optimizations, but for some platforms, alignment is
critical, and without such assurances (which I don't think you gave the
compiler), it must use memcpy or equivalent (rather than faster POD move
instructions).

Finally.... have you benchmarked the resulting code (if it is different)?

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc
File base/pickle.cc (right):

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode254
base/pickle.cc:254: inline void Pickle::WriteBytesCommon(const void* data,
size_t length) {
Note that inline is a "hint" and is traditionally totally ignored by compilers,
since the compiler can usually figure this out better than the programmer.... so
they ignore it.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257
base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max);
nit: length is already of type size_t

Also... if you have concerns about overflow, wouldn't you want to be sure you
were:
DCHECK_LE(length, kuint32max - sizeof(uint32) + 1)
since you're about to potentially increase length a bit?

... or you could check that RSN:
DCHECK(data_len >= length);
and/or maybe:
DCHECK(static_cast<uint32>(data_len), static_cast<uint32>(length));

...which makes me wonder why these are DCHECKs.

Is there something in the code which assures we won't have such problems, and
these are really fallback developer protection?

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode260
base/pickle.cc:260: size_t new_size = write_offset_ + data_len;
How are you assured that this is not an overflow?

Same question around line 287.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode278
base/pickle.cc:278: WriteBytesCommon(data, length);
For all the template magic... it appears that you changed calling with an
argument:
sizeof(foo)

to calling a template which handled the type
which forwarded to a template that specialized for the size
that called into the common code taking length.

I was *expecting* specialization to use type assignments such as moving across a
uint32 when size was 4..... and using this carefully only when you had proper
alignment... but that does not appear to be the case.

Can you clarify why this is an optimization, rather than just a lot of use of
templates?

Is the hope that your explicit statement in lines 281-283 will cause the
templates to be instantiated, and then all the code within them to be inlined?  

To be honest, I thought there was a good chance that a killer compiler could
already have done most of the inlining and propagation of calling constants that
you seem to be chasing.  Have you looked at compiled assembly code after full
optimization to see how the code already looked?

piman

Perf-wise, using the DelegatedFrame_ManyQuads_100000_100000 benchmark which is driving this (modeled from the limiting factor in ...

7 years, 2 months ago (2013-10-24 23:54:28 UTC) #12

Perf-wise, using the DelegatedFrame_ManyQuads_100000_100000 benchmark which is
driving this (modeled from the limiting factor in the spaceport benchmark
regression cited in the bug):

x86-64 gcc
before: 17453 us
after: 16604 us

x86-64 clang
before: 14447 us
after: 13455 us

(lower is better).
So about 5 to 7% on the overall benchmark.

Note, because of https://codereview.chromium.org/30593005/ that this is based
on, we now skip the optimized WriteInt/Float/... in a large portion of the
cases, so the optimization only benefits about 10% of the fields in this
benchmark. https://codereview.chromium.org/39463002/ is a follow-up that touches
the other cases.

Assembly in follow-up.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc
File base/pickle.cc (right):

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode254
base/pickle.cc:254: inline void Pickle::WriteBytesCommon(const void* data,
size_t length) {
On 2013/10/24 22:36:48, jar wrote:
> Note that inline is a "hint" and is traditionally totally ignored by
compilers,
> since the compiler can usually figure this out better than the programmer....
so
> they ignore it.

Both clang and gcc failed to inline without it.
Without inlining, no specialization on the size is possible.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257
base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max);
On 2013/10/24 22:36:48, jar wrote:
> nit: length is already of type size_t

Yep, sorry, copy-and-paste from the previous version.

> 
> Also... if you have concerns about overflow

I don't particularly have concerns about overflow. We can add DCHECKs. This is
not meant to be called with untrusted data.

> , wouldn't you want to be sure you
> were:
> DCHECK_LE(length, kuint32max - sizeof(uint32) + 1)
> since you're about to potentially increase length a bit?
> 
> ... or you could check that RSN:
> DCHECK(data_len >= length);
> and/or maybe:
> DCHECK(static_cast<uint32>(data_len), static_cast<uint32>(length));
> 
> ...which makes me wonder why these are DCHECKs.
> 
> Is there something in the code which assures we won't have such problems, and
> these are really fallback developer protection?

So, this is no change wrt previous code. As I mentioned, it's not meant to be
called with untrusted data, nor enough data that it would overflow (e.g.
messages would go over the limit of 256MB before we get to overflow).

I would severely push back against adding CHECKs, because that halves the
performance.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode260
base/pickle.cc:260: size_t new_size = write_offset_ + data_len;
On 2013/10/24 22:36:48, jar wrote:
> How are you assured that this is not an overflow?
> 
> Same question around line 287.

The previous code (in BeginWrite) didn't check for overflow. I can add DCHECKs
if you want.

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode278
base/pickle.cc:278: WriteBytesCommon(data, length);
On 2013/10/24 22:36:48, jar wrote:
> For all the template magic... it appears that you changed calling with an
> argument:
> sizeof(foo)
> 
> to calling a template which handled the type
> which forwarded to a template that specialized for the size
> that called into the common code taking length.
> 
> I was *expecting* specialization to use type assignments such as moving across
a
> uint32 when size was 4..... and using this carefully only when you had proper
> alignment... but that does not appear to be the case.

It's possible to do, but then you need template instantiations for each of the
types, whereas you actually only need the size of the type in the resulting
code.
In this version, we only need one instantiation for each of the sizes, so we
have less bloat. WritePOD does the type->size translation, is fully inline and
so doesn't take any space.

> Can you clarify why this is an optimization, rather than just a lot of use of
> templates?

I indicated some of the savings in an earlier version of the patch. Essentially,
if the size is known at compile-time, the compiler will:
1- elide the AlignInt logic and the memset it size is multiple of 4 (common
case)
2- inline the memcpy

> Is the hope that your explicit statement in lines 281-283 will cause the
> templates to be instantiated, and then all the code within them to be inlined?

Yes. I verified the code with objdump.
At the callsites for, say, WriteInt, it doesn't have the definition for
WriteBytes, so it won't inline it. Since it won't inline it, it will call the
generic version, passing the size as a run-time parameter. It has no way of
eliding/inlining what I mentioned above.
I tried calling WriteBytes from WriteBytesStatic, but gcc failed to inline. I
can't make WriteBytes itself inline because the callsites don't have the
definition. So the solution was to add an intermediate inline WriteBytesCommon,
that is only called where the definition is available. The compiler properly
inlines/specializes (I verified with objdump).

> To be honest, I thought there was a good chance that a killer compiler could
> already have done most of the inlining and propagation of calling constants
that
> you seem to be chasing.  Have you looked at compiled assembly code after full
> optimization to see how the code already looked?

I drove this work by looking at the output code in parallel with the performance
measurements.

piman

Assembly-wise, here's what I got. I don't have a windows machine at hand to dump ...

7 years, 2 months ago (2013-10-24 23:55:17 UTC) #13

Assembly-wise, here's what I got. I don't have a windows machine at hand to dump
there. The benefit on gcc is evident however (built with the Official settings):

======================================== x86-64 gcc
======================================== 

Before:

0000000000000000 <_ZN6Pickle10WriteBytesEPKvi>:
   0:   4c 89 74 24 f8          mov    %r14,-0x8(%rsp)
   5:   4c 63 f2                movslq %edx,%r14
   8:   48 89 5c 24 d8          mov    %rbx,-0x28(%rsp)
   d:   48 89 6c 24 e0          mov    %rbp,-0x20(%rsp)
  12:   4c 89 64 24 e8          mov    %r12,-0x18(%rsp)
  17:   48 89 fd                mov    %rdi,%rbp
  1a:   4c 89 6c 24 f0          mov    %r13,-0x10(%rsp)
  1f:   48 83 ec 28             sub    $0x28,%rsp
  23:   49 89 f5                mov    %rsi,%r13
  26:   4c 89 f6                mov    %r14,%rsi
  29:   41 89 d4                mov    %edx,%r12d
  2c:   e8 00 00 00 00          callq  31 <_ZN6Pickle10WriteBytesEPKvi+0x31>
                        2d: R_X86_64_PC32       _ZN6Pickle10BeginWriteEm-0x4
  31:   48 89 c3                mov    %rax,%rbx
  34:   31 c0                   xor    %eax,%eax
  36:   48 85 db                test   %rbx,%rbx
  39:   74 21                   je     5c <_ZN6Pickle10WriteBytesEPKvi+0x5c>
  3b:   4c 89 f2                mov    %r14,%rdx
  3e:   4c 89 ee                mov    %r13,%rsi
  41:   48 89 df                mov    %rbx,%rdi
  44:   e8 00 00 00 00          callq  49 <_ZN6Pickle10WriteBytesEPKvi+0x49>
                        45: R_X86_64_PLT32      memcpy-0x4
  49:   44 89 e2                mov    %r12d,%edx
  4c:   48 89 de                mov    %rbx,%rsi
  4f:   48 89 ef                mov    %rbp,%rdi
  52:   e8 00 00 00 00          callq  57 <_ZN6Pickle10WriteBytesEPKvi+0x57>
                        53: R_X86_64_PC32       _ZN6Pickle8EndWriteEPci-0x4
  57:   b8 01 00 00 00          mov    $0x1,%eax
  5c:   48 8b 1c 24             mov    (%rsp),%rbx
  60:   48 8b 6c 24 08          mov    0x8(%rsp),%rbp
  65:   4c 8b 64 24 10          mov    0x10(%rsp),%r12
  6a:   4c 8b 6c 24 18          mov    0x18(%rsp),%r13
  6f:   4c 8b 74 24 20          mov    0x20(%rsp),%r14
  74:   48 83 c4 28             add    $0x28,%rsp
  78:   c3                      retq 

0000000000000000 <_ZN6Pickle10BeginWriteEm>:
   0:   48 89 5c 24 e8          mov    %rbx,-0x18(%rsp)
   5:   4c 89 64 24 f8          mov    %r12,-0x8(%rsp)
   a:   48 89 fb                mov    %rdi,%rbx
   d:   48 89 6c 24 f0          mov    %rbp,-0x10(%rsp)
  12:   48 83 ec 18             sub    $0x18,%rsp
  16:   48 8b 57 08             mov    0x8(%rdi),%rdx
  1a:   48 8b 4f 18             mov    0x18(%rdi),%rcx
  1e:   8b 2a                   mov    (%rdx),%ebp
  20:   48 89 e8                mov    %rbp,%rax
  23:   48 f7 d8                neg    %rax
  26:   83 e0 03                and    $0x3,%eax
  29:   48 01 c5                add    %rax,%rbp
  2c:   4c 8d 64 35 00          lea    0x0(%rbp,%rsi,1),%r12
  31:   4c 89 e0                mov    %r12,%rax
  34:   48 03 47 10             add    0x10(%rdi),%rax
  38:   48 39 c8                cmp    %rcx,%rax
  3b:   76 16                   jbe    53 <_ZN6Pickle10BeginWriteEm+0x53>
  3d:   48 01 c9                add    %rcx,%rcx
  40:   48 39 c8                cmp    %rcx,%rax
  43:   48 89 ce                mov    %rcx,%rsi
  46:   48 0f 43 f0             cmovae %rax,%rsi
  4a:   e8 00 00 00 00          callq  4f <_ZN6Pickle10BeginWriteEm+0x4f>
                        4b: R_X86_64_PC32       _ZN6Pickle6ResizeEm-0x4
  4f:   48 8b 53 08             mov    0x8(%rbx),%rdx
  53:   44 89 22                mov    %r12d,(%rdx)
  56:   48 8b 43 08             mov    0x8(%rbx),%rax
  5a:   48 03 43 10             add    0x10(%rbx),%rax
  5e:   4c 8b 64 24 10          mov    0x10(%rsp),%r12
  63:   48 8b 1c 24             mov    (%rsp),%rbx
  67:   48 01 e8                add    %rbp,%rax
  6a:   48 8b 6c 24 08          mov    0x8(%rsp),%rbp
  6f:   48 83 c4 18             add    $0x18,%rsp
  73:   c3                      retq   

0000000000000000 <_ZN6Pickle8EndWriteEPci>:
   0:   48 63 fa                movslq %edx,%rdi
   3:   48 89 f8                mov    %rdi,%rax
   6:   83 e0 03                and    $0x3,%eax
   9:   75 05                   jne    10 <_ZN6Pickle8EndWriteEPci+0x10>
   b:   f3 c3                   repz retq 
   d:   0f 1f 00                nopl   (%rax)
  10:   ba 04 00 00 00          mov    $0x4,%edx
  15:   48 01 f7                add    %rsi,%rdi
  18:   31 f6                   xor    %esi,%esi
  1a:   48 29 c2                sub    %rax,%rdx
  1d:   e9 00 00 00 00          jmpq   22 <.LC1+0xa>
                        1e: R_X86_64_PLT32      memset-0x4



After:

0000000000000000 <_ZN6Pickle16WriteBytesStaticILm4EEEvPKv>:
   0:   48 89 5c 24 e8          mov    %rbx,-0x18(%rsp)
   5:   48 89 6c 24 f0          mov    %rbp,-0x10(%rsp)
   a:   48 89 fb                mov    %rdi,%rbx
   d:   4c 89 64 24 f8          mov    %r12,-0x8(%rsp)
  12:   48 83 ec 18             sub    $0x18,%rsp
  16:   48 8b 57 20             mov    0x20(%rdi),%rdx
  1a:   49 89 f4                mov    %rsi,%r12
  1d:   48 8b 77 18             mov    0x18(%rdi),%rsi
  21:   48 8d 6a 04             lea    0x4(%rdx),%rbp
  25:   48 39 f5                cmp    %rsi,%rbp
  28:   76 13                   jbe    3d
<_ZN6Pickle16WriteBytesStaticILm4EEEvPKv+0x3d>
  2a:   48 01 f6                add    %rsi,%rsi
  2d:   48 39 ee                cmp    %rbp,%rsi
  30:   48 0f 42 f5             cmovb  %rbp,%rsi
  34:   e8 00 00 00 00          callq  39
<_ZN6Pickle16WriteBytesStaticILm4EEEvPKv+0x39>
                        35: R_X86_64_PC32       _ZN6Pickle6ResizeEm-0x4
  39:   48 8b 53 20             mov    0x20(%rbx),%rdx
  3d:   48 8b 43 08             mov    0x8(%rbx),%rax
  41:   48 03 43 10             add    0x10(%rbx),%rax
  45:   41 8b 0c 24             mov    (%r12),%ecx
  49:   89 0c 10                mov    %ecx,(%rax,%rdx,1)
  4c:   48 8b 53 20             mov    0x20(%rbx),%rdx
  50:   48 8b 43 08             mov    0x8(%rbx),%rax
  54:   83 c2 04                add    $0x4,%edx
  57:   89 10                   mov    %edx,(%rax)
  59:   48 89 6b 20             mov    %rbp,0x20(%rbx)
  5d:   4c 8b 64 24 10          mov    0x10(%rsp),%r12
  62:   48 8b 1c 24             mov    (%rsp),%rbx
  66:   48 8b 6c 24 08          mov    0x8(%rsp),%rbp
  6b:   48 83 c4 18             add    $0x18,%rsp
  6f:   c3                      retq

piman

======================================== arm32 gcc ======================================== Before: 00000000 <_ZN6Pickle10WriteBytesEPKvi>: 0: b5f8 push {r3, r4, r5, r6, r7, ...

7 years, 2 months ago (2013-10-24 23:55:35 UTC) #14

======================================== arm32 gcc
======================================== 

Before:

00000000 <_ZN6Pickle10WriteBytesEPKvi>:
   0:   b5f8            push    {r3, r4, r5, r6, r7, lr}
   2:   460f            mov     r7, r1
   4:   4611            mov     r1, r2
   6:   4614            mov     r4, r2
   8:   4606            mov     r6, r0
   a:   f7ff fffe       bl      0 <_ZN6Pickle10WriteBytesEPKvi>
                        a: R_ARM_THM_CALL       _ZN6Pickle10BeginWriteEj
   e:   4605            mov     r5, r0
  10:   b150            cbz     r0, 28 <_ZN6Pickle10WriteBytesEPKvi+0x28>
  12:   4639            mov     r1, r7
  14:   4622            mov     r2, r4
  16:   f7ff fffe       bl      0 <memcpy>
                        16: R_ARM_THM_CALL      memcpy
  1a:   4630            mov     r0, r6
  1c:   4629            mov     r1, r5
  1e:   4622            mov     r2, r4
  20:   f7ff fffe       bl      0 <_ZN6Pickle10WriteBytesEPKvi>
                        20: R_ARM_THM_CALL      _ZN6Pickle8EndWriteEPci
  24:   2001            movs    r0, #1
  26:   bdf8            pop     {r3, r4, r5, r6, r7, pc}
  28:   bdf8            pop     {r3, r4, r5, r6, r7, pc}
  2a:   bf00            nop

00000000 <_ZN6Pickle10BeginWriteEj>:
   0:   b5f8            push    {r3, r4, r5, r6, r7, lr}
   2:   1d03            adds    r3, r0, #4
   4:   4604            mov     r4, r0
   6:   e893 1088       ldmia.w r3, {r3, r7, ip}
   a:   681a            ldr     r2, [r3, #0]
   c:   4255            negs    r5, r2
   e:   f005 0503       and.w   r5, r5, #3
  12:   18ad            adds    r5, r5, r2
  14:   186e            adds    r6, r5, r1
  16:   19f7            adds    r7, r6, r7
  18:   4567            cmp     r7, ip
  1a:   d907            bls.n   2c <_ZN6Pickle10BeginWriteEj+0x2c>
  1c:   ea4f 014c       mov.w   r1, ip, lsl #1
  20:   428f            cmp     r7, r1
  22:   bf28            it      cs
  24:   4639            movcs   r1, r7
  26:   f7ff fffe       bl      0 <_ZN6Pickle10BeginWriteEj>
                        26: R_ARM_THM_CALL      _ZN6Pickle6ResizeEj
  2a:   6863            ldr     r3, [r4, #4]
  2c:   601e            str     r6, [r3, #0]
  2e:   6860            ldr     r0, [r4, #4]
  30:   68a3            ldr     r3, [r4, #8]
  32:   18c0            adds    r0, r0, r3
  34:   1940            adds    r0, r0, r5
  36:   bdf8            pop     {r3, r4, r5, r6, r7, pc}

00000000 <_ZN6Pickle8EndWriteEPci>:
   0:   f012 0303       ands.w  r3, r2, #3
   4:   d100            bne.n   8 <_ZN6Pickle8EndWriteEPci+0x8>
   6:   4770            bx      lr
   8:   1888            adds    r0, r1, r2
   a:   2100            movs    r1, #0
   c:   f1c3 0204       rsb     r2, r3, #4
  10:   f7ff bffe       b.w     0 <memset>
                        10: R_ARM_THM_JUMP24    memset


After:

00000000 <_ZN6Pickle16WriteBytesStaticILj4EEEvPKv>:
   0:   b570            push    {r4, r5, r6, lr}
   2:   4604            mov     r4, r0
   4:   6903            ldr     r3, [r0, #16]
   6:   460e            mov     r6, r1
   8:   68c2            ldr     r2, [r0, #12]
   a:   1d1d            adds    r5, r3, #4
   c:   4295            cmp     r5, r2
   e:   d906            bls.n   1e
<_ZN6Pickle16WriteBytesStaticILj4EEEvPKv+0x1e>
  10:   0051            lsls    r1, r2, #1
  12:   42a9            cmp     r1, r5
  14:   bf38            it      cc
  16:   4629            movcc   r1, r5
  18:   f7ff fffe       bl      0 <_ZN6Pickle16WriteBytesStaticILj4EEEvPKv>
                        18: R_ARM_THM_CALL      _ZN6Pickle6ResizeEj
  1c:   6923            ldr     r3, [r4, #16]
  1e:   6860            ldr     r0, [r4, #4]
  20:   68a2            ldr     r2, [r4, #8]
  22:   6831            ldr     r1, [r6, #0]
  24:   1882            adds    r2, r0, r2
  26:   50d1            str     r1, [r2, r3]
  28:   6863            ldr     r3, [r4, #4]
  2a:   6922            ldr     r2, [r4, #16]
  2c:   3204            adds    r2, #4
  2e:   601a            str     r2, [r3, #0]
  30:   6125            str     r5, [r4, #16]
  32:   bd70            pop     {r4, r5, r6, pc}




You can see that, before the optimization:
- no inlining was done
- memcpy was called
- memset was called.
- 3 functions necessary
- 75 instructions on x86-64, 50 on arm32
- on arm32, 5 registers are pushed, and twice at that.

After:
- fully inlined
- no memcpy
- no memset
- 30 instructions on x86-64, 25 on arm32
- on arm32, only 3 registers are pushed, on only once.

piman

https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc File base/pickle.cc (right): https://codereview.chromium.org/34413002/diff/230001/base/pickle.cc#newcode257 base/pickle.cc:257: DCHECK_LE(static_cast<size_t>(length), kuint32max); On 2013/10/24 22:36:48, jar wrote: > nit: ...

7 years, 1 month ago (2013-10-25 03:11:40 UTC) #15

Tom Sepez

https://codereview.chromium.org/34413002/diff/310001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode215 base/pickle.h:215: bool WriteBool(bool value) { drive-by: The follow-up to this ...

7 years, 1 month ago (2013-10-28 18:17:38 UTC) #16

Tom Sepez

https://codereview.chromium.org/34413002/diff/310001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/310001/base/pickle.h#newcode219 base/pickle.h:219: return WritePOD(value); drive-by: If I were trying to make ...

7 years, 1 month ago (2013-10-28 19:09:05 UTC) #17

piman

On Mon, Oct 28, 2013 at 11:17 AM, <tsepez@chromium.org> wrote: > > https://codereview.chromium.**org/34413002/diff/310001/base/**pickle.h<https://codereview.chromium.org/34413002/diff/310001/base/pickle.h> > File ...

7 years, 1 month ago (2013-10-28 19:27:01 UTC) #18

piman

On Mon, Oct 28, 2013 at 12:09 PM, <tsepez@chromium.org> wrote: > > https://codereview.chromium.**org/34413002/diff/310001/base/**pickle.h<https://codereview.chromium.org/34413002/diff/310001/base/pickle.h> > File ...

7 years, 1 month ago (2013-10-28 19:28:51 UTC) #19

piman

Rebased on top of Dana's latest version for https://codereview.chromium.org/35893002/ and I also folded Resize-returns-void here, ...

7 years, 1 month ago (2013-10-29 04:53:09 UTC) #20

jar (doing other things)

One thing to be careful here is: I've seen other suggestions for pickle changes recently. ...

7 years, 1 month ago (2013-10-29 16:50:48 UTC) #21

piman

https://codereview.chromium.org/34413002/diff/390001/base/pickle.h File base/pickle.h (right): https://codereview.chromium.org/34413002/diff/390001/base/pickle.h#newcode327 base/pickle.h:327: size_t capacity_; On 2013/10/29 16:50:48, jar wrote: > How ...

7 years, 1 month ago (2013-10-30 20:29:05 UTC) #22

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/piman@chromium.org/34413002/560001

7 years, 1 month ago (2013-10-31 03:08:33 UTC) #24

commit-bot: I haz the power

Step "update" is always a major failure. Look at the try server FAQ for more ...

7 years, 1 month ago (2013-10-31 03:18:39 UTC) #25

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/piman@chromium.org/34413002/560001

7 years, 1 month ago (2013-10-31 03:26:38 UTC) #26

commit-bot: I haz the power

Step "update" is always a major failure. Look at the try server FAQ for more ...

7 years, 1 month ago (2013-10-31 03:47:24 UTC) #27

piman

7 years, 1 month ago (2013-10-31 04:03:25 UTC) #28

Message was sent while issue was closed.

Committed patchset #7 manually as r231987 (presubmit successful).

Expand Messages | Collapse Messages