|
|
Created:
4 years, 1 month ago by Primiano Tucci (use gerrit) Modified:
3 years, 10 months ago CC:
chromium-reviews, danakj, brettw, esprehn, tkent, Mark Mentovai, scottmg Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptionbase: make CHECK macros trap at distinct addresses in official builds
Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.
- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.
The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.
A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see crbug.com/664209 for more) that the most efficient way to
handle these checks is ending up with:
Source code:
CHECK(cond1);
CHECK(cond2);
...
Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
compare_opcode cond1;
jump_if_zero prologue;
compare_opcode cond2;
jump_if_zero prologue;
...
prologue:
trap_instruction(s)
Rather than something like:
compare_opcode cond1;
jump_if_NOT_zero next1;
trap_instruction(s)
next1:
compare_opcode cond2;
jump_if_NOT_zero next2;
trap_instruction(s)
next2:
...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.
crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.
This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction after
the trap which creates an opcode with a unique counter. This prevents
the compiler from folding the trap instructions, still applying no-return
optimizations.
Also by doing this the various prologue get properly attributed to the
CHECK line in debugging symbols.
Binary size inflation on official builds:
-----------------------------------------
Android arm: 48684276 -> 48712948 = 28 K
Android arm64: 85611800 -> 85665048 = 53 K
Android x86_64: 91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)
BUG=664209, 599867
Review-Url: https://codereview.chromium.org/2502953003
Cr-Commit-Position: refs/heads/master@{#451381}
Committed: https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f319563fd0af8
Patch Set 1 #Patch Set 2 : . #
Total comments: 1
Patch Set 3 : use inline asm #Patch Set 4 : fix comments #
Total comments: 1
Patch Set 5 : . #Patch Set 6 : . #Patch Set 7 : . #
Total comments: 2
Patch Set 8 : Fix tests (an uncountable amount of tables have been flipped for this) #
Total comments: 9
Patch Set 9 : Address Mark's comment + fixtestsforrealz #
Total comments: 2
Patch Set 10 : use static_cast + primitive type #Messages
Total messages: 85 (51 generated)
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== base: Ensure that CHECK macros trap at distinct addresses TODO CL desription, WORK IN PROGRESS BUG=664209 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds A bit of history: ----------------- Before crrev.com/2125923002 CHECK in official builds was essentially: if (!condition) DebuggerBreak() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. A more complete overview with assembly is in crbug.com/664209, but the short story is that the most efficient way to handle these checks seems to be when we end up with the following situation: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) some_compare_opcode cond1; jump_if_zero prologue; some_compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: some_compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: some_compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, as in the case of BreakDebugger, as it bloats the binary and reduces i-cache hotness for the main flow. crrev.com/2125923002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209 ==========
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds A bit of history: ----------------- Before crrev.com/2125923002 CHECK in official builds was essentially: if (!condition) DebuggerBreak() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. A more complete overview with assembly is in crbug.com/664209, but the short story is that the most efficient way to handle these checks seems to be when we end up with the following situation: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) some_compare_opcode cond1; jump_if_zero prologue; some_compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: some_compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: some_compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, as in the case of BreakDebugger, as it bloats the binary and reduces i-cache hotness for the main flow. crrev.com/2125923002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ==========
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ==========
primiano@chromium.org changed reviewers: + thakis@chromium.org
CHECK() is back! CL description should have all the details. I didn't manage to run any perf tests on this, but by looking at the assmebly before and after I am quite confident this shouldn't have any visible impact (worst case the perf waterfall will prove me wrong) FTR I tested this also OSX and works there as well. I have no idea if this is a problem at all on windows. I don't have the bandwidth nor enough knowledge to fix and cover with a test there. But windows seems to have already its own special code for CHECK().
primiano@chromium.org changed reviewers: + torne@chromium.org
+torne
Had a chat offline with torne, he is suggesting that clobbering one register during the crash might not be ideal as the register might have useful data to reconstruct the reason of the crash. This could be addressed by doing an arch-specific variant of this CL, such as: #if defined(ARCH_X86) #define LOGGING_CRASH() ({ asm volatile("ud2"); __builtin_unreachable(); }) #elif ARM #define LOGGING_CRASH() ({ asm volatile("udf 0xff"); __builtin_unreachable(); }) ... This would have the same properties of the current patchset, the actual binary inflation would be smaller as the "mov eax, LINE" instructions would be avoided, but would required arch-specific code. I'm quite open to both options, the current one honestly makes me a bit more confident that the compiler won't do funny things, but on the other side there is now a test for that. Open to suggestions.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
+tkent FYI as per discussion in "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_opcode Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_opcode next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_opcode next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ==========
primiano@chromium.org changed reviewers: + brettw@chromium.org, danakj@chromium.org
Brett / Dana: opinions here (see #14) ?
On Wed, Nov 23, 2016 at 10:24 AM, <primiano@chromium.org> wrote: > Brett / Dana: opinions here (see #14) ? > Sorry I'm behind on reviews so I'm slow. There's a few things sitting in queue before this just FYI. If you can try another base owner that might help you move forward. > > > https://codereview.chromium.org/2502953003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (~10-40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, s in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 eventually fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_opcode", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ==========
Thanks for the detailed writeup. I think not clobbering registers is helpful if we can have it. A random idea below, but it might not work well on arm. Now that I think of it, __COUNTER__ would probably better in that suggestion since it's more likely to fit in an immediate. But iirc arm can't do immediate->memory stores, so it'd still burn a register? https://codereview.chromium.org/2502953003/diff/20001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/20001/base/logging.h#newcode491 base/logging.h:491: #define LOGGING_CRASH(LINE) ((void)(*(volatile char*)0 = 0)) This doesn't help on Windows, does it. Would ((void)(*(volatile char*)0 = LINE)) be something that works everywhere? x86 has up to 32bit immediates and supports immediate->mem moves, but I guess on arm it'd still clobber a reg (or several). How does Torne's suggestion help? Aren't these ud2s mergable too? Or do compilers not try to reason about inline asm? (LLVM probably doesn't.) (https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Other-Builtins.html claims that __builtin_unreachable would generate a return, but at least clang seems to not do this.)
On 2016/12/02 02:04:03, Nico wrote: > Thanks for the detailed writeup. I think not clobbering registers is helpful if > we can have it. > > A random idea below, but it might not work well on arm. Now that I think of it, > __COUNTER__ would probably better in that suggestion since it's more likely to > fit in an immediate. But iirc arm can't do immediate->memory stores, so it'd > still burn a register? Yes, you remember correctly, ARM can only store register->memory. > https://codereview.chromium.org/2502953003/diff/20001/base/logging.h > File base/logging.h (right): > > https://codereview.chromium.org/2502953003/diff/20001/base/logging.h#newcode491 > base/logging.h:491: #define LOGGING_CRASH(LINE) ((void)(*(volatile char*)0 = 0)) > This doesn't help on Windows, does it. Would > > ((void)(*(volatile char*)0 = LINE)) > > be something that works everywhere? x86 has up to 32bit immediates and supports > immediate->mem moves, but I guess on arm it'd still clobber a reg (or several). > > How does Torne's suggestion help? Aren't these ud2s mergable too? Or do > compilers not try to reason about inline asm? (LLVM probably doesn't.) I observed that the compilers I tested it on did not merge the asm volatile blocks. This is certainly not guaranteed, and a sufficiently adventurous compiler might, but it seems generally unlikely?
All right I think then the best approach is to just manually put the various declinations of __builtin_trap (ud2, & co) as #ifdef. In this way we don't need neither LINE nor COUNTER and we avoid clobbering any register. Also this would reduce by 50% at least the binary inflation. I'll update the patchset soon, probably on Monday. On 2016/12/02 12:37:17, Torne wrote: > On 2016/12/02 02:04:03, Nico wrote: > > How does Torne's suggestion help? Aren't these ud2s mergable too? Or do > > compilers not try to reason about inline asm? (LLVM probably doesn't.) > I observed that the compilers I tested it on did not merge the asm volatile > blocks. This is certainly not guaranteed, and a sufficiently adventurous > compiler might, but it seems generally unlikely? I think "volatile" is what prevents the compiler from merging them. I can confirm i see what torne observed. I also added a test to check that we don't regress that (See the change to the unittest). So we should be safe-ish on this side (w.r.t. non-collpasing the trap instruction).. The only thing I can't easily test is whether the debugging symbols will properly attribute each "ud2" instruction to the right line of code. I can confirm symbols are correct right now (see CL description). >This doesn't help on Windows, does it. Would > ((void)(*(volatile char*)0 = LINE)) > be something that works everywhere? x86 has up to 32bit immediates and supports > immediate->mem moves, but I guess on arm it'd still clobber a reg (or several). I honestly think the best solution is to use "ud2" and friend everywhere, including windows. I didn't want to touch windows in this CL because: 1) I still don't know how windows and crashpad react to ud2. Technically they are "undefined instruction" and I expect them to raise the same kind of exception as jumping to broken JIT-ed code. But I need to check that first. 2) The situation of CHECK on windows is complicated because of the completely orthogonal discussion about CHECK not being as fast as RELEASE_ASSERT. So I'd plan to fix first non-windows OS and then come back to windows once we settle on the argument of CHECK vs RELEASE_ASSERT.
Maybe we should just call abort(). That's marked noreturn, takes 0 arguments, and is standard. What was the problem with that again? (I guess the compiler could still collect all calls to abort in a function in one basic block and just jump to that from everywhere, like with __builtin_trap?)
On 2016/12/02 15:29:25, Nico wrote: > Maybe we should just call abort(). That's marked noreturn, takes 0 arguments, > and is standard. What was the problem with that again? (I guess the compiler > could still collect all calls to abort in a function in one basic block and just > jump to that from everywhere, like with __builtin_trap?) Precisely, it will merge the callq abort. Just tried: --- #include <stdlib.h> #include <stdio.h> #define UNLIKELY(x) __builtin_expect(!!(x), 0) #define CHECK(n) (UNLIKELY(!(n)) ? abort() : (void)0) void __attribute__ ((visibility("default"))) foo(int n) { CHECK(n != 2); CHECK(n != 3); *((volatile int*) 0x42) = 3; CHECK(n != 4); } --- $ g++ -c test.cc --std=c++11 -O1 -S -o - ## BB#0: pushq %rbp Ltmp0: .cfi_def_cfa_offset 16 Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp2: .cfi_def_cfa_register %rbp cmpl $3, %edi je LBB0_4 ## BB#1: cmpl $2, %edi je LBB0_4 ## BB#2: movl $3, 66 cmpl $4, %edi je LBB0_4 ## BB#3: popq %rbp retq LBB0_4: callq _abort .cfi_endproc which means that frame #0 of the crash report will be abort(), and frame #1 will be always attributed to one of the four CHECK lines regardless of the actual triggering condition.
Yeah. The problem there is that x86 doesn't have conditional calls, and so if the function is noreturn it's more compact to conditionally jump to a single shared call instruction. (on ARM you can do conditional calls, but we build everything as thumb2 which puts various restrictions/size tradeoffs on the code and means what you get is not trivially predictable still). The "old" version of CHECK() which called BreakDebugger *didn't* have this problem, because BreakDebugger isn't noreturn (since it was explicitly intended that when a debugger was attached you could choose to continue), and so it has to be able to generate a codepath where it can "carry on" after BreakDebugger returns. But allowing for BreakDebugger to return worsens code generation in general, since it has to preserve registers/etc around it. I think the architecture-specific ifdef'ed trap instructions with __builtin_unreachable is a reasonable tradeoff here, though it's ugly that we have to do it that way: it means we get unique addresses, but it doesn't have to allow for calling a return-capable function and so the runtime cost is *relatively* low: the ideal output would be the compiler putting a bunch of "branch forward if condition failed" inline in the code, with N trap instructions in a row after the end of the function, one for each branch target, and that's what appears to happen when Primiano was testing it in many cases. If it manages to generate that, then the performance cost *should* be no worse than __builtin_trap, and the only added cost for the unique addresses is the bytes for one extra trap instruction per CHECK().
Ok, the per-arch ud2s sound like a reasonable way to go to me.
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
On 2016/12/05 21:08:46, Nico wrote: > Ok, the per-arch ud2s sound like a reasonable way to go to me. So I gave it another shoot to this. I also reworked the test to cover also non-official builds. Good news: the test strategy seems to work as intended. Bad news: the test actually highlighted that the new version isn't good enough. The new trick isn't enough to fool the compiler, udf ff is still being folded on android-arm-gcc. I manually reprod the test failure and disassembled locally: 0x00170586 <+214>: cmp.w r9, #1 0x0017058a <+218>: beq.n 0x17059e <CHECKNotAmbiguous_ChildMain()+238> 0x0017058c <+220>: cmp.w r9, #2 0x00170590 <+224>: beq.n 0x17059e <CHECKNotAmbiguous_ChildMain()+238> 0x00170592 <+226>: movs r0, #10 0x00170594 <+228>: blx 0x542d8 <putchar@plt> 0x00170598 <+232>: cmp.w r9, #3 0x0017059c <+236>: bne.n 0x1705a0 <CHECKNotAmbiguous_ChildMain()+240> 0x0017059e <+238>: udf #255 ; 0xff So I think we have to resort to the option of clobbering one register of the previous patchet. Unless I am doing something silly here without realizing.
https://codereview.chromium.org/2502953003/diff/60001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/60001/base/logging.h#newcode484 base/logging.h:484: // either a SIGTRAP or a SIGABRT depending on the platform. On x86 and x86_64, ud2 will be SIGILL. That’s good, but it’s not distinct in that true bad codegen will also be SIGILL. SIGABRT or SIGTRAP, as this comment says, would be much better. On the Windows equivalent (https://codereview.chromium.org/2676483002/), we’re discussing int3 with a ud2 backstop. int3 will be SIGTRAP which is somewhat distinct. The ud2 backstop is there to provide a non-continuable barrier to further execution, otherwise it’s not really noreturn because it’s trivial to continue past int3. The reserved illegal opcode backstop could also work for ARM. I don’t know that it’ll solve multiple CHECK()s jumping to a single target that you’re seeing, but it’s a further refinement.
On 2017/02/13 15:24:11, Primiano Tucci wrote: > On 2016/12/05 21:08:46, Nico wrote: > > Ok, the per-arch ud2s sound like a reasonable way to go to me. > > So I gave it another shoot to this. I also reworked the test to cover also > non-official builds. > Good news: the test strategy seems to work as intended. > Bad news: the test actually highlighted that the new version isn't good enough. > The new trick isn't enough to fool the compiler, udf ff is still being folded on > android-arm-gcc. > I manually reprod the test failure and disassembled locally: > > 0x00170586 <+214>: cmp.w r9, #1 > 0x0017058a <+218>: beq.n 0x17059e <CHECKNotAmbiguous_ChildMain()+238> > 0x0017058c <+220>: cmp.w r9, #2 > 0x00170590 <+224>: beq.n 0x17059e <CHECKNotAmbiguous_ChildMain()+238> > 0x00170592 <+226>: movs r0, #10 > 0x00170594 <+228>: blx 0x542d8 <putchar@plt> > 0x00170598 <+232>: cmp.w r9, #3 > 0x0017059c <+236>: bne.n 0x1705a0 <CHECKNotAmbiguous_ChildMain()+240> > 0x0017059e <+238>: udf #255 ; 0xff > > So I think we have to resort to the option of clobbering one register of the > previous patchet. > Unless I am doing something silly here without realizing. So for ARM32 you can probably make this work by making the parameter to the udf instruction be __COUNTER__ % 256, which won't cost any registers. The parameter here doesn't matter to anything (and is only even available by reading back the instruction and decoding it manually). It should probably be safe to do the same thing on ARM64 with __COUNTER__ % 65536 but I'm not 100% certain there isn't some significance to this value because it's suspicious that __builtin_trap() uses a specific number, and ARM's lack of public documentation is annoying. For x86, could you do it by having the asm block be two instructions, *first* ud2 and *then* whatever the smallest thing with an immediate operand that can be unique is?
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. The assembly now looks as follows: some_compare_opcode cond1; jump_if_zero prologue1; some_compare_opcode cond2; jump_if_zero prologue2; ... prologue1: mov eax, LINE_OF_COND1 trap_opcode prologue2: mov eax, LINE_OF_COND2 trap_opcode Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. More concretely, given the following function: extern "C" void foobar(int x, int y) { CHECK(x); CHECK(y); *((volatile int *)0x40) = x; CHECK(x + y); } This is the assembly produced after this CL: Arm32: 0x002f22d4 <+0>: cbz r0, 0x2f22e2 <foobar+14> # CHECK(x) 0x002f22d6 <+2>: cbz r1, 0x2f22ee <foobar+26> # CHECK(y) 0x002f22d8 <+4>: movs r3, #64 ; 0x40 0x002f22da <+6>: cmn r0, r1 0x002f22dc <+8>: str r0, [r3, #0] 0x002f22de <+10>: beq.n 0x2f22e8 <foobar+20> # CHECK(x + y) 0x002f22e0 <+12>: bx lr 0x002f22e2 <+14>: movw r3, #934 ; 0x3a6 # Prologue for CHECK(x) 0x002f22e6 <+18>: udf #255 ; 0xff 0x002f22e8 <+20>: movw r3, #937 ; 0x3a9 # Prologue for CHECK(y) 0x002f22ec <+24>: udf #255 ; 0xff 0x002f22ee <+26>: movw r3, #935 ; 0x3a7 # Prologue for CHECK(x + y) 0x002f22f2 <+30>: udf #255 ; 0xff Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x002f22e6 Line 934 of "../../foobar.cc" (gdb) info line *0x002f22f2 Line 935 of "../../foobar.cc" Arm64: 0x0083525c <+0>: cbz w0, 0x835278 <foobar()+28> 0x00835260 <+4>: cbz w1, 0x835288 <foobar()+44> 0x00835264 <+8>: cmn w0, w1 0x00835268 <+12>: mov x1, #0x40 // #64 0x0083526c <+16>: str w0, [x1] 0x00835270 <+20>: b.eq 0x835280 <foobar()+36> 0x00835274 <+24>: ret 0x00835278 <+28>: mov w0, #0x3a6 // #934 0x0083527c <+32>: brk #0x3e8 0x00835280 <+36>: mov w0, #0x3a9 // #937 0x00835284 <+40>: brk #0x3e8 0x00835288 <+44>: mov w0, #0x3a7 // #935 0x0083528c <+48>: brk #0x3e8 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x0083527c Line 934 of "../../foobar.cc" (gdb) info line *0x00835284 Line 935 of "../../foobar.cc" Linux x86_64: 0x020ad480 <+0>: test %edi,%edi 0x020ad482 <+2>: je 0x20ad494 <foobar(int, int)+20> # CHECK(x) 0x020ad484 <+4>: test %esi,%esi 0x020ad486 <+6>: je 0x20ad49b <foobar(int, int)+27> # CHECK(y) 0x020ad488 <+8>: mov %edi,0x40 0x020ad48f <+15>: add %edi,%esi 0x020ad491 <+17>: je 0x20ad4a2 <foobar(int, int)+34> # CHECK(x + y) 0x020ad493 <+19>: retq 0x020ad494 <+20>: mov $0x3a6,%eax 0x020ad499 <+25>: ud2 0x020ad49b <+27>: mov $0x3a7,%eax 0x020ad4a0 <+32>: ud2 0x020ad4a2 <+34>: mov $0x3a9,%eax 0x020ad4a7 <+39>: ud2 Are fault addresses correctly attributed? Yes, proof: (gdb) info line *0x020ad499 Line 934 of "../../foobar.cc" (gdb) info line *0x020ad49b Line 935 of "../../foobar.cc" Binary size inflation on official builds: ----------------------------------------- Android arm: 48324152 -> 48336440 = 12 K Android arm64: 85160872 -> 85177256 = 16 K Linux x86_64: 121030408 -> 121079560 = 48 K BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 86992248 -> 86992248 = 0echo Android x86: 104502640 -> 104506736 = 4 K Linux x86_64: TBD BUG=664209,599867 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 86992248 -> 86992248 = 0echo Android x86: 104502640 -> 104506736 = 4 K Linux x86_64: TBD BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 86992248 -> 86992248 = 0 Android x86: 104502640 -> 104506736 = 4 K Android x86_64: 93686832 -> 93686832 = 0 Linux x86_64: 124219488 -> 124346464 = 124 K BUG=664209,599867 ==========
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 86992248 -> 86992248 = 0 Android x86: 104502640 -> 104506736 = 4 K Android x86_64: 93686832 -> 93686832 = 0 Linux x86_64: 124219488 -> 124346464 = 124 K BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG=664209,599867 ==========
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL I think I finally have a version that keep all the constraints satisfied: - generates as less binary possible; - keep crash addresses distinct in case of several CHECKs - cannot be stepped through during debugging - generates a sigtrap in *most* case (and in the case of arm32 something which is not just a too generic SIGSEGV) Apologies for the delay, but ended up spending an incredible amount of time to find the proper way to keep 4 different compilers happy.
+scottmg FYI
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
Looks pretty reasonable. I'm mildly scared by using hlt on arm64 but I guess entering debug state is normally just fatal? (and we couldn't find a sane way to hit undef). https://codereview.chromium.org/2502953003/diff/120001/base/logging_unittest.cc File base/logging_unittest.cc (right): https://codereview.chromium.org/2502953003/diff/120001/base/logging_unittest.... base/logging_unittest.cc:282: ASSERT_EQ(0, sigaction(SIGILL, &act, NULL)); Don't you need to handle SIGBUS here as well for arm32?
Also, the CL description is no longer quite accurate as it still mentions putting line numbers into a register.
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+10/+40K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction before the trap which pushes the current line into a register. This causes: - The crash line to be in the register as an extra help for the diagnosis of tough crash cases. - The compiler doesn't fold the trap instructions, although still applies no-return optimizations. Which involves some extra bytes for each CHECKS, but not as many as the non-inline case. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction after the trap which creates an opcode with a unique counter. This prevents the compiler from folding the trap instructions, still applying no-return optimizations. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG=664209,599867 ==========
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
> I'm mildly scared by using hlt on arm64 but I guess entering debug state is normally just fatal? (and we couldn't find a sane way to hit undef). It causes a SIGILL. I guess is either not implemented in production HW or disabled by the kernel. In any case, the hlt is just a 2nd stop for debugging sessions. production code will die at the brk instruction before. > Also, the CL description is no longer quite accurate as it still mentions putting line numbers into a register. Oh right I forgot to fix that part. fixed https://codereview.chromium.org/2502953003/diff/120001/base/logging_unittest.cc File base/logging_unittest.cc (right): https://codereview.chromium.org/2502953003/diff/120001/base/logging_unittest.... base/logging_unittest.cc:282: ASSERT_EQ(0, sigaction(SIGILL, &act, NULL)); On 2017/02/17 11:06:31, Torne wrote: > Don't you need to handle SIGBUS here as well for arm32? Ehm err you are definitely right. It was green because all android bots are 64b on the CQ. verified that failed locally on a 32 bit device and that adding sigbus fixed it.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
mark@chromium.org changed reviewers: + mark@chromium.org
This is lovely! https://codereview.chromium.org/2502953003/diff/140001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) Can the push go behind the ud2? https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) You should write this as int3, which is a distinct mnemonic that always corresponds to a single-byte form of the instruction. I don’t trust every assembler to emit the one-byte form upon seeing int $3. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) The push imm8 form of this instruction dedicates a whole byte to imm8, you shouldn’t need to take it mod 128. I bet you did this because "i"(128) (when __COUNTER__ reaches that) gives you (int)128 which overflows as an 8-bit quantity, so you’d wind up with a fatter variant of push immediate. Instead of "i"(__COUNTER__ % 128), take "i"((signed char)__COUNTER__). Now you can use twice as many of these in the same file without the compiler deciding that any two instances are identical! For that matter, you can do "i"(__COUNTER__ - 128) and you’ll get the thin imm8 form for the first 256 occurrences in a file, and then you’ll get fatter forms for subsequent occurrences, but there’s a much stronger guarantee that there won’t be any overlap. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode548 base/logging.h:548: TRAP_INSTRUCTION(); \ nit: these are more TRAP_SEQUENCE now that none are really single instructions.
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
> This is lovely! Not quite the same reaction I had when I hit the various compiler optimizations. The fact that we use -O2 on desktop and -Os on Android then added further levels of amusement to all this. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) On 2017/02/17 14:24:28, Mark Mentovai wrote: > You should write this as int3, which is a distinct mnemonic that always > corresponds to a single-byte form of the instruction. I don’t trust every > assembler to emit the one-byte form upon seeing int $3. Ah right, I realized only now that there is a 1byte specialization of int3. FYI gcc was doing the right think, using the 1byte version even with int $3, but make sense to be explicit. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) On 2017/02/17 14:24:28, Mark Mentovai wrote: > Can the push go behind the ud2? Done. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) On 2017/02/17 14:24:28, Mark Mentovai wrote: > The push imm8 form of this instruction dedicates a whole byte to imm8, you > shouldn’t need to take it mod 128. I bet you did this because "i"(128) (when > __COUNTER__ reaches that) gives you (int)128 which overflows as an 8-bit > quantity, so you’d wind up with a fatter variant of push immediate. > > Instead of "i"(__COUNTER__ % 128), take "i"((signed char)__COUNTER__). > > Now you can use twice as many of these in the same file without the compiler > deciding that any two instances are identical! > Spot on. I did % 128 because I observed that values > 127 did fall back on the 5 byte instruction (rather than 2). However your suggestion does the trick. if I just ((uint8_t)__COUNTER__) I get double the space and still the 2 bytes variant. > For that matter, you can do "i"(__COUNTER__ - 128) and you’ll get the thin imm8 > form for the first 256 occurrences in a file, and then you’ll get fatter forms > for subsequent occurrences, but there’s a much stronger guarantee that there > won’t be any overlap. Hmm given that the overlapping problem seems bounded to functions scope, I think it is quite unlikely that a function will have > 255 CHECKs. Binary size inflation is already in the ~hundreds kb ballpark. https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode548 base/logging.h:548: TRAP_INSTRUCTION(); \ On 2017/02/17 14:24:28, Mark Mentovai wrote: > nit: these are more TRAP_SEQUENCE now that none are really single instructions. good point, renamed.
LGTM https://codereview.chromium.org/2502953003/diff/140001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/140001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int $3; push %0; ud2;" ::"i"(__COUNTER__ % 128)) Primiano Tucci wrote: > Hmm given that the overlapping problem seems bounded to functions scope, I > think it is quite unlikely that a function will have > 255 CHECKs. Binary size > inflation is already in the ~hundreds kb ballpark. >255 is no problem. You only need to start sweating when you’re >256. :) https://codereview.chromium.org/2502953003/diff/160001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/160001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int3; ud2; push %0;" ::"i"((uint8_t)__COUNTER__)) I suggested a fundamental type because this header didn’t already #include <stdint.h> or anything else that would give you this type. So either #include that or go with a fundamental type here. P.S. Google/Chrome style demands a C++-style static_cast<uint8_t>() instead of a (uint8_t).
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2502953003/diff/160001/base/logging.h File base/logging.h (right): https://codereview.chromium.org/2502953003/diff/160001/base/logging.h#newcode525 base/logging.h:525: asm volatile("int3; ud2; push %0;" ::"i"((uint8_t)__COUNTER__)) On 2017/02/17 15:04:33, Mark Mentovai wrote: > I suggested a fundamental type because this header didn’t already #include > <stdint.h> or anything else that would give you this type. So either #include > that or go with a fundamental type here. > > P.S. Google/Chrome style demands a C++-style static_cast<uint8_t>() instead of a > (uint8_t). Ah right. fixed both the cast and switched to unsigned char. I always feel bad adding more includes to logging.h (although i suspect something else is already including up there)
superlgtm
+wfh: I thought I added you various patchsets ago but that just realized that was not the case.
May the waterfall and all the hidden devices/bots/configuration have mercy of this CL. (I have been enough in this project to be confident that this will be reverted at least once by some bot that cross-builds nacl on mips and runs tests on an arm emulator that will break subtly the kernel abi)
as long as this doesn't regress future CL https://codereview.chromium.org/2697423002/ I'm happy with this :)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by primiano@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mark@chromium.org Link to the patchset: https://codereview.chromium.org/2502953003/#ps180001 (title: "use static_cast + primitive type")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 180001, "attempt_start_ts": 1487360814997190, "parent_rev": "868be9f79e939e2839e6a9fee6a7629cb93380f4", "commit_rev": "8c972d0e190168b4b5621e81563f319563fd0af8"}
Message was sent while issue was closed.
Description was changed from ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction after the trap which creates an opcode with a unique counter. This prevents the compiler from folding the trap instructions, still applying no-return optimizations. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG=664209,599867 ========== to ========== base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction after the trap which creates an opcode with a unique counter. This prevents the compiler from folding the trap instructions, still applying no-return optimizations. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG=664209,599867 Review-Url: https://codereview.chromium.org/2502953003 Cr-Commit-Position: refs/heads/master@{#451381} Committed: https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f... ==========
Message was sent while issue was closed.
Committed patchset #10 (id:180001) as https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f...
Message was sent while issue was closed.
A revert of this CL (patchset #10 id:180001) has been created in https://codereview.chromium.org/2706453004/ by alph@chromium.org. The reason for reverting is: Broke compile https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Chrom....
Message was sent while issue was closed.
On 2017/02/17 21:27:44, alph wrote: > A revert of this CL (patchset #10 id:180001) has been created in > https://codereview.chromium.org/2706453004/ by mailto:alph@chromium.org. > > The reason for reverting is: Broke compile > https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Chrom.... As per my best expectations, broke nacl on an internal CrOs bot. I give up for this week.
Message was sent while issue was closed.
This is relanding in https://codereview.chromium.org/2705053002/ |