Issue 2502953003: base: make CHECK macros trap at distinct addresses in official builds

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-15 15:26:57 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/1

4 years, 1 month ago (2016-11-15 15:27:12 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-15 15:35:20 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/304374)

4 years, 1 month ago (2016-11-15 15:35:21 UTC) #4

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-21 14:57:38 UTC) #5

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/20001

4 years, 1 month ago (2016-11-21 14:57:52 UTC) #6

Primiano Tucci (use gerrit)

Description was changed from ========== base: Ensure that CHECK macros trap at distinct addresses TODO ...

4 years, 1 month ago (2016-11-21 15:26:21 UTC) #7

Description was changed from

==========
base: Ensure that CHECK macros trap at distinct addresses

TODO CL desription, WORK IN PROGRESS

BUG=664209
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

A bit of history:
-----------------
Before crrev.com/2125923002 CHECK in official builds was essentially:
if (!condition) DebuggerBreak()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler. A more complete overview with
assembly is in crbug.com/664209, but the short story is that the most
efficient way to handle these checks seems to be when we end up
with the following situation:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...
Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    some_compare_opcode  cond1;
    jump_if_zero         prologue;
    some_compare_opcode  cond2;
    jump_if_zero         prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    some_compare_opcode  cond1;
    jump_if_NOT_zero     next1;
    trap_opcode
next1:
    some_compare_opcode  cond2;
    jump_if_NOT_zero     next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, as in the case of BreakDebugger, as
it bloats the binary and reduces i-cache hotness for the main flow.

crrev.com/2125923002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of 
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K

BUG=664209
==========

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

4 years, 1 month ago (2016-11-21 16:36:17 UTC) #8

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

A bit of history:
-----------------
Before crrev.com/2125923002 CHECK in official builds was essentially:
if (!condition) DebuggerBreak()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler. A more complete overview with
assembly is in crbug.com/664209, but the short story is that the most
efficient way to handle these checks seems to be when we end up
with the following situation:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...
Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    some_compare_opcode  cond1;
    jump_if_zero         prologue;
    some_compare_opcode  cond2;
    jump_if_zero         prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    some_compare_opcode  cond1;
    jump_if_NOT_zero     next1;
    trap_opcode
next1:
    some_compare_opcode  cond2;
    jump_if_NOT_zero     next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, as in the case of BreakDebugger, as
it bloats the binary and reduces i-cache hotness for the main flow.

crrev.com/2125923002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of 
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K

BUG=664209
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_opcode
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

4 years, 1 month ago (2016-11-21 16:43:28 UTC) #9

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_opcode
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_opcode
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

Primiano Tucci (use gerrit)

primiano@chromium.org changed reviewers: + thakis@chromium.org

4 years, 1 month ago (2016-11-21 16:46:23 UTC) #10

Primiano Tucci (use gerrit)

CHECK() is back! CL description should have all the details. I didn't manage to run ...

4 years, 1 month ago (2016-11-21 16:46:23 UTC) #11

Primiano Tucci (use gerrit)

primiano@chromium.org changed reviewers: + torne@chromium.org

4 years, 1 month ago (2016-11-21 16:46:52 UTC) #12

Primiano Tucci (use gerrit)

Had a chat offline with torne, he is suggesting that clobbering one register during the ...

4 years, 1 month ago (2016-11-21 17:18:18 UTC) #14

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-21 18:18:42 UTC) #15

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-11-21 18:18:43 UTC) #16

Primiano Tucci (use gerrit)

+tkent FYI as per discussion in "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"

4 years, 1 month ago (2016-11-22 14:58:15 UTC) #17

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

4 years, 1 month ago (2016-11-22 14:59:27 UTC) #18

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_opcode

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_opcode
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_opcode
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

Primiano Tucci (use gerrit)

primiano@chromium.org changed reviewers: + brettw@chromium.org, danakj@chromium.org

4 years ago (2016-11-23 18:24:14 UTC) #19

danakj

On Wed, Nov 23, 2016 at 10:24 AM, <primiano@chromium.org> wrote: > Brett / Dana: opinions ...

4 years ago (2016-11-23 22:22:22 UTC) #21

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

4 years ago (2016-11-29 19:35:29 UTC) #22

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (~10-40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
s in the case of BreakDebugger(). That bloats the binary and reduces
i-cache hotness for the main flow.

crrev.com/1982123002 eventually fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_opcode", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

Nico

Thanks for the detailed writeup. I think not clobbering registers is helpful if we can ...

4 years ago (2016-12-02 02:04:03 UTC) #23

Torne

On 2016/12/02 02:04:03, Nico wrote: > Thanks for the detailed writeup. I think not clobbering ...

4 years ago (2016-12-02 12:37:17 UTC) #24

Primiano Tucci (use gerrit)

All right I think then the best approach is to just manually put the various ...

4 years ago (2016-12-02 15:08:28 UTC) #25

All right I think then the best approach is to just manually put the various
declinations of __builtin_trap (ud2, & co) as #ifdef.
In this way we don't need neither LINE nor COUNTER and we avoid clobbering any
register. Also this would reduce by 50% at least the binary inflation.
I'll update the patchset soon, probably on Monday.

On 2016/12/02 12:37:17, Torne wrote:
> On 2016/12/02 02:04:03, Nico wrote:
> > How does Torne's suggestion help? Aren't these ud2s mergable too? Or do
> > compilers not try to reason about inline asm? (LLVM probably doesn't.) 
> I observed that the compilers I tested it on did not merge the asm volatile
> blocks. This is certainly not guaranteed, and a sufficiently adventurous
> compiler might, but it seems generally unlikely?

I think "volatile" is what prevents the compiler from merging them. I can
confirm i see what torne observed.
I also added a test to check that we don't regress that (See the change to the
unittest). So we should be safe-ish on this side (w.r.t. non-collpasing the trap
instruction)..
The only thing I can't easily test is whether the debugging symbols will
properly attribute each "ud2" instruction to the right line of code.
I can confirm symbols are correct right now (see CL description).

>This doesn't help on Windows, does it. Would
>  ((void)(*(volatile char*)0 = LINE))
> be something that works everywhere? x86 has up to 32bit immediates and
supports
> immediate->mem moves, but I guess on arm it'd still clobber a reg (or
several).
I honestly think the best solution is to use "ud2" and friend everywhere,
including windows.
I didn't want to touch windows in this CL because:
1) I still don't know how windows and crashpad react to ud2. Technically they
are "undefined instruction" and I expect them to raise the same kind of
exception as jumping to broken JIT-ed code. But I need to check that first.
2) The situation of CHECK on windows is complicated because of the completely
orthogonal discussion about CHECK not being as fast as RELEASE_ASSERT.

So I'd plan to fix first non-windows OS and then come back to windows once we
settle on the argument of CHECK vs RELEASE_ASSERT.

Nico

Maybe we should just call abort(). That's marked noreturn, takes 0 arguments, and is standard. ...

4 years ago (2016-12-02 15:29:25 UTC) #26

Primiano Tucci (use gerrit)

On 2016/12/02 15:29:25, Nico wrote: > Maybe we should just call abort(). That's marked noreturn, ...

4 years ago (2016-12-02 16:47:54 UTC) #27

Torne

Yeah. The problem there is that x86 doesn't have conditional calls, and so if the ...

4 years ago (2016-12-05 12:37:06 UTC) #28

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-10 16:49:21 UTC) #30

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/40001

3 years, 10 months ago (2017-02-10 16:49:49 UTC) #31

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-10 16:56:50 UTC) #32

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/60001

3 years, 10 months ago (2017-02-10 16:57:21 UTC) #33

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-10 18:41:10 UTC) #34

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/230169)

3 years, 10 months ago (2017-02-10 18:41:11 UTC) #35

Primiano Tucci (use gerrit)

On 2016/12/05 21:08:46, Nico wrote: > Ok, the per-arch ud2s sound like a reasonable way ...

3 years, 10 months ago (2017-02-13 15:24:11 UTC) #36

Mark Mentovai

https://codereview.chromium.org/2502953003/diff/60001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/60001/base/logging.h#newcode484 base/logging.h:484: // either a SIGTRAP or a SIGABRT depending on ...

3 years, 10 months ago (2017-02-13 16:22:09 UTC) #37

Torne

On 2017/02/13 15:24:11, Primiano Tucci wrote: > On 2016/12/05 21:08:46, Nico wrote: > > Ok, ...

3 years, 10 months ago (2017-02-13 17:10:05 UTC) #38

On 2017/02/13 15:24:11, Primiano Tucci wrote:
> On 2016/12/05 21:08:46, Nico wrote:
> > Ok, the per-arch ud2s sound like a reasonable way to go to me.
> 
> So I gave it another shoot to this. I also reworked the test to cover also
> non-official builds.
> Good news: the test strategy seems to work as intended.
> Bad news: the test actually highlighted that the new version isn't good
enough.
> The new trick isn't enough to fool the compiler, udf ff is still being folded
on
> android-arm-gcc.
> I manually reprod the test failure and disassembled locally:
> 
>    0x00170586 <+214>:   cmp.w   r9, #1
>    0x0017058a <+218>:   beq.n   0x17059e <CHECKNotAmbiguous_ChildMain()+238>
>    0x0017058c <+220>:   cmp.w   r9, #2
>    0x00170590 <+224>:   beq.n   0x17059e <CHECKNotAmbiguous_ChildMain()+238>
>    0x00170592 <+226>:   movs    r0, #10
>    0x00170594 <+228>:   blx     0x542d8 <putchar@plt>
>    0x00170598 <+232>:   cmp.w   r9, #3
>    0x0017059c <+236>:   bne.n   0x1705a0 <CHECKNotAmbiguous_ChildMain()+240>
>    0x0017059e <+238>:   udf     #255    ; 0xff
> 
> So I think we have to resort to the option of clobbering one register of the
> previous patchet.
> Unless I am doing something silly here without realizing.

So for ARM32 you can probably make this work by making the parameter to the udf
instruction be __COUNTER__ % 256, which won't cost any registers. The parameter
here doesn't matter to anything (and is only even available by reading back the
instruction and decoding it manually). It should probably be safe to do the same
thing on ARM64 with __COUNTER__ % 65536 but I'm not 100% certain there isn't
some significance to this value because it's suspicious that __builtin_trap()
uses a specific number, and ARM's lack of public documentation is annoying.

For x86, could you do it by having the asm block be two instructions, *first*
ud2 and *then* whatever the smallest thing with an immediate operand that can be
unique is?

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-16 19:19:27 UTC) #39

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/80001

3 years, 10 months ago (2017-02-16 19:20:13 UTC) #40

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

3 years, 10 months ago (2017-02-16 19:56:15 UTC) #41

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.
The assembly now looks as follows:
    some_compare_opcode  cond1;
    jump_if_zero         prologue1;
    some_compare_opcode  cond2;
    jump_if_zero         prologue2;
    ...
 prologue1:
    mov                  eax, LINE_OF_COND1
    trap_opcode
 prologue2:
    mov                  eax, LINE_OF_COND2
    trap_opcode

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

More concretely, given the following function:
extern "C" void foobar(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile int *)0x40) = x;
  CHECK(x + y);
}

This is the assembly produced after this CL:
Arm32:
  0x002f22d4 <+0>:     cbz     r0, 0x2f22e2 <foobar+14>  # CHECK(x)
  0x002f22d6 <+2>:     cbz     r1, 0x2f22ee <foobar+26>  # CHECK(y)
  0x002f22d8 <+4>:     movs    r3, #64 ; 0x40
  0x002f22da <+6>:     cmn     r0, r1
  0x002f22dc <+8>:     str     r0, [r3, #0]
  0x002f22de <+10>:    beq.n   0x2f22e8 <foobar+20>     # CHECK(x + y)
  0x002f22e0 <+12>:    bx      lr
  0x002f22e2 <+14>:    movw    r3, #934        ; 0x3a6  # Prologue for CHECK(x)
  0x002f22e6 <+18>:    udf     #255    ; 0xff
  0x002f22e8 <+20>:    movw    r3, #937        ; 0x3a9  # Prologue for CHECK(y)
  0x002f22ec <+24>:    udf     #255    ; 0xff
  0x002f22ee <+26>:    movw    r3, #935        ; 0x3a7  # Prologue for CHECK(x +
y)
  0x002f22f2 <+30>:    udf     #255    ; 0xff

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x002f22e6
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x002f22f2
  Line 935 of "../../foobar.cc"


Arm64:
  0x0083525c <+0>:     cbz     w0, 0x835278 <foobar()+28>
  0x00835260 <+4>:     cbz     w1, 0x835288 <foobar()+44>
  0x00835264 <+8>:     cmn     w0, w1
  0x00835268 <+12>:    mov     x1, #0x40                       // #64
  0x0083526c <+16>:    str     w0, [x1]
  0x00835270 <+20>:    b.eq    0x835280 <foobar()+36>
  0x00835274 <+24>:    ret
  0x00835278 <+28>:    mov     w0, #0x3a6                      // #934
  0x0083527c <+32>:    brk     #0x3e8
  0x00835280 <+36>:    mov     w0, #0x3a9                      // #937
  0x00835284 <+40>:    brk     #0x3e8
  0x00835288 <+44>:    mov     w0, #0x3a7                      // #935
  0x0083528c <+48>:    brk     #0x3e8

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x0083527c
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x00835284
  Line 935 of "../../foobar.cc"


Linux x86_64:
  0x020ad480 <+0>:     test   %edi,%edi
  0x020ad482 <+2>:     je     0x20ad494 <foobar(int, int)+20>  # CHECK(x)
  0x020ad484 <+4>:     test   %esi,%esi
  0x020ad486 <+6>:     je     0x20ad49b <foobar(int, int)+27>  # CHECK(y)
  0x020ad488 <+8>:     mov    %edi,0x40
  0x020ad48f <+15>:    add    %edi,%esi
  0x020ad491 <+17>:    je     0x20ad4a2 <foobar(int, int)+34>  # CHECK(x + y)
  0x020ad493 <+19>:    retq
  0x020ad494 <+20>:    mov    $0x3a6,%eax
  0x020ad499 <+25>:    ud2
  0x020ad49b <+27>:    mov    $0x3a7,%eax
  0x020ad4a0 <+32>:    ud2
  0x020ad4a2 <+34>:    mov    $0x3a9,%eax
  0x020ad4a7 <+39>:    ud2

  Are fault addresses correctly attributed? Yes, proof:
  (gdb) info line *0x020ad499
  Line 934 of "../../foobar.cc"
  (gdb) info line *0x020ad49b
  Line 935 of "../../foobar.cc"


Binary size inflation on official builds:
-----------------------------------------
Android arm:    48324152  -> 48336440  = 12 K
Android arm64:  85160872  -> 85177256  = 16 K
Linux   x86_64: 121030408 -> 121079560 = 48 K


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  86992248  -> 86992248  = 0echo
Android x86: 104502640 -> 104506736 = 4 K
Linux x86_64: TBD


BUG=664209,599867
==========

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-16 20:49:58 UTC) #42

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/120744) mac_chromium_rel_ng on ...

3 years, 10 months ago (2017-02-16 20:49:59 UTC) #43

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-16 21:03:26 UTC) #44

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/100001

3 years, 10 months ago (2017-02-16 21:04:11 UTC) #45

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

3 years, 10 months ago (2017-02-16 21:09:25 UTC) #46

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  86992248  -> 86992248  = 0echo
Android x86: 104502640 -> 104506736 = 4 K
Linux x86_64: TBD


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  86992248  -> 86992248  = 0
Android x86: 104502640 -> 104506736 = 4 K
Android x86_64:  93686832 -> 93686832 = 0
Linux x86_64: 124219488 -> 124346464 = 124 K


BUG=664209,599867
==========

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

3 years, 10 months ago (2017-02-16 21:31:50 UTC) #47

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  86992248  -> 86992248  = 0
Android x86: 104502640 -> 104506736 = 4 K
Android x86_64:  93686832 -> 93686832 = 0
Linux x86_64: 124219488 -> 124346464 = 124 K


BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG=664209,599867
==========

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-16 21:35:28 UTC) #48

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/120001

3 years, 10 months ago (2017-02-16 21:36:34 UTC) #49

Primiano Tucci (use gerrit)

PTAL I think I finally have a version that keep all the constraints satisfied: - ...

3 years, 10 months ago (2017-02-16 21:38:45 UTC) #50

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-16 23:42:04 UTC) #52

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/311520)

3 years, 10 months ago (2017-02-16 23:42:05 UTC) #53

Torne

Looks pretty reasonable. I'm mildly scared by using hlt on arm64 but I guess entering ...

3 years, 10 months ago (2017-02-17 11:06:31 UTC) #54

Torne

Also, the CL description is no longer quite accurate as it still mentions putting line ...

3 years, 10 months ago (2017-02-17 11:08:01 UTC) #55

Primiano Tucci (use gerrit)

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

3 years, 10 months ago (2017-02-17 11:13:47 UTC) #56

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+10/+40K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction before
the trap which pushes the current line into a register. This causes:
- The crash line to be in the register as an extra help for the
  diagnosis of tough crash cases.
- The compiler doesn't fold the trap instructions, although still
  applies no-return optimizations.

Which involves some extra bytes for each CHECKS, but not as many as
the non-inline case. Also by doing this the various prologue get
properly attributed to the CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction after
the trap which creates an opcode with a unique counter. This prevents
the compiler from folding the trap instructions, still applying no-return
optimizations.
Also by doing this the various prologue get properly attributed to the
CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG=664209,599867
==========

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-17 12:34:28 UTC) #57

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/140001

3 years, 10 months ago (2017-02-17 12:34:48 UTC) #58

Primiano Tucci (use gerrit)

> I'm mildly scared by using hlt on arm64 but I guess entering debug state ...

3 years, 10 months ago (2017-02-17 12:37:35 UTC) #59

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-17 12:53:26 UTC) #60

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/324231)

3 years, 10 months ago (2017-02-17 12:53:28 UTC) #61

Mark Mentovai

mark@chromium.org changed reviewers: + mark@chromium.org

3 years, 10 months ago (2017-02-17 14:24:27 UTC) #62

Mark Mentovai

This is lovely! https://codereview.chromium.org/2502953003/diff/140001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ...

3 years, 10 months ago (2017-02-17 14:24:28 UTC) #63

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-17 14:47:11 UTC) #64

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/160001

3 years, 10 months ago (2017-02-17 14:47:30 UTC) #65

Primiano Tucci (use gerrit)

> This is lovely! Not quite the same reaction I had when I hit the ...

3 years, 10 months ago (2017-02-17 14:51:07 UTC) #66

> This is lovely!
Not quite the same reaction I had when I hit the various compiler optimizations.
The fact that we use -O2 on desktop and -Os on Android then added further levels
of amusement to all this.

https://codereview.chromium.org/2502953003/diff/140001/base/logging.h
File base/logging.h (right):

https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525
base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ %
128))
On 2017/02/17 14:24:28, Mark Mentovai wrote:
> You should write this as int3, which is a distinct mnemonic that always
> corresponds to a single-byte form of the instruction. I don’t trust every
> assembler to emit the one-byte form upon seeing int $3.

Ah right, I realized only now that there is a 1byte specialization of int3. 
FYI gcc was doing the right think, using the 1byte version even with int $3, but
make sense to be explicit.

https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525
base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ %
128))
On 2017/02/17 14:24:28, Mark Mentovai wrote:
> Can the push go behind the ud2?

Done.

https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525
base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ %
128))
On 2017/02/17 14:24:28, Mark Mentovai wrote:
> The push imm8 form of this instruction dedicates a whole byte to imm8, you
> shouldn’t need to take it mod 128. I bet you did this because "i"(128) (when
> __COUNTER__ reaches that) gives you (int)128 which overflows as an 8-bit
> quantity, so you’d wind up with a fatter variant of push immediate.
> 
> Instead of "i"(__COUNTER__ % 128), take "i"((signed char)__COUNTER__).
> 
> Now you can use twice as many of these in the same file without the compiler
> deciding that any two instances are identical!
> 

Spot on. I did % 128 because I observed that values > 127 did fall back on the 5
byte instruction (rather than 2).
However your suggestion does the trick. if I just ((uint8_t)__COUNTER__) I get
double the space and still the 2 bytes variant.

> For that matter, you can do "i"(__COUNTER__ - 128) and you’ll get the thin
imm8
> form for the first 256 occurrences in a file, and then you’ll get fatter forms
> for subsequent occurrences, but there’s a much stronger guarantee that there
> won’t be any overlap.

Hmm given that the overlapping problem seems bounded to  functions scope, I
think it is quite unlikely that a function will have > 255 CHECKs. Binary size
inflation is already in the ~hundreds kb ballpark.

https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode548
base/logging.h:548: TRAP_INSTRUCTION();      \
On 2017/02/17 14:24:28, Mark Mentovai wrote:
> nit: these are more TRAP_SEQUENCE now that none are really single
instructions.

good point, renamed.

Mark Mentovai

LGTM https://codereview.chromium.org/2502953003/diff/140001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % ...

3 years, 10 months ago (2017-02-17 15:04:33 UTC) #67

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-17 15:08:29 UTC) #68

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/180001

3 years, 10 months ago (2017-02-17 15:08:46 UTC) #69

Primiano Tucci (use gerrit)

https://codereview.chromium.org/2502953003/diff/160001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/160001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int3; ud2; push %0;" ::"i"((uint8_t)__COUNTER__)) On 2017/02/17 15:04:33, ...

3 years, 10 months ago (2017-02-17 15:08:47 UTC) #70

Primiano Tucci (use gerrit)

+wfh: I thought I added you various patchsets ago but that just realized that was ...

3 years, 10 months ago (2017-02-17 15:17:17 UTC) #72

Primiano Tucci (use gerrit)

May the waterfall and all the hidden devices/bots/configuration have mercy of this CL. (I have ...

3 years, 10 months ago (2017-02-17 18:28:14 UTC) #73

Will Harris

as long as this doesn't regress future CL https://codereview.chromium.org/2697423002/ I'm happy with this :)

3 years, 10 months ago (2017-02-17 18:46:47 UTC) #74

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-17 19:18:01 UTC) #75

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 10 months ago (2017-02-17 19:18:02 UTC) #76

Primiano Tucci (use gerrit)

The CQ bit was checked by primiano@chromium.org

3 years, 10 months ago (2017-02-17 19:46:54 UTC) #77

Primiano Tucci (use gerrit)

The patchset sent to the CQ was uploaded after l-g-t-m from mark@chromium.org Link to the ...

3 years, 10 months ago (2017-02-17 19:46:56 UTC) #78

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2502953003/180001

3 years, 10 months ago (2017-02-17 19:48:07 UTC) #79

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 180001, "attempt_start_ts": 1487360814997190, "parent_rev": "868be9f79e939e2839e6a9fee6a7629cb93380f4", "commit_rev": "8c972d0e190168b4b5621e81563f319563fd0af8"}

3 years, 10 months ago (2017-02-17 21:08:54 UTC) #80

commit-bot: I haz the power

Description was changed from ========== base: make CHECK macros trap at distinct addresses in official ...

3 years, 10 months ago (2017-02-17 21:09:40 UTC) #81

Message was sent while issue was closed.

Description was changed from

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction after
the trap which creates an opcode with a unique counter. This prevents
the compiler from folding the trap instructions, still applying no-return
optimizations.
Also by doing this the various prologue get properly attributed to the
CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG=664209,599867
==========

to

==========
base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction after
the trap which creates an opcode with a unique counter. This prevents
the compiler from folding the trap instructions, still applying no-return
optimizations.
Also by doing this the various prologue get properly attributed to the
CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG=664209,599867

Review-Url: https://codereview.chromium.org/2502953003
Cr-Commit-Position: refs/heads/master@{#451381}
Committed:
https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f...
==========

commit-bot: I haz the power

Committed patchset #10 (id:180001) as https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f319563fd0af8

3 years, 10 months ago (2017-02-17 21:09:42 UTC) #82

alph

A revert of this CL (patchset #10 id:180001) has been created in https://codereview.chromium.org/2706453004/ by alph@chromium.org. ...

3 years, 10 months ago (2017-02-17 21:27:44 UTC) #83

Primiano Tucci (use gerrit)

On 2017/02/17 21:27:44, alph wrote: > A revert of this CL (patchset #10 id:180001) has ...

3 years, 10 months ago (2017-02-17 22:30:41 UTC) #84

Primiano Tucci (use gerrit)

3 years, 10 months ago (2017-02-21 11:05:06 UTC) #85

Message was sent while issue was closed.

This is relanding in https://codereview.chromium.org/2705053002/

Issue 2502953003: base: make CHECK macros trap at distinct addresses in official builds (Closed)

Description

Patch Set 1 #

Patch Set 2 : . #

Patch Set 3 : use inline asm #

Patch Set 4 : fix comments #

Patch Set 5 : . #

Patch Set 6 : . #

Patch Set 7 : . #

Patch Set 8 : Fix tests (an uncountable amount of tables have been flipped for this) #

Patch Set 9 : Address Mark's comment + fixtestsforrealz #

Patch Set 10 : use static_cast + primitive type #

Messages