|
|
Created:
5 years, 10 months ago by Derek Schuff Modified:
5 years, 9 months ago Reviewers:
Mark Seaborn CC:
native-client-reviews_googlegroups.com, shyamsundarr Base URL:
https://chromium.googlesource.com/native_client/nacl-newlib.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionAvoid writing the sandbox base address to memory in memcpy
PNaCl x86-64 code avoids revealing the sandbox base address to untrusted code,
to make vulnerabilities harder to exploit. For this to be effective, the IRT
also needs to have this property. Currently the IRT is built by PNaCl and so
is safe in this way, but we want to switch to nacl-clang. nacl-clang has the
same defaults for codegen, but includes assembly versions of memcpy, memset,
and setjmp.
This CL fixes memcpy in several ways
- It avoids writing the sandbox base to memory. (It also avoids the movs string instructions.)
- It uses L-prefix symbol names to keep them out of the object symbol table by default.
- It uses rbp as a base pointer instead of rsi/rdi to avoid extra masking.
- It removes use of nontemporal store and prefetch instructions. (It also removes the fence instruction, which was there because movnti has weak ordering with respect to other stores).
R=mseaborn@chromium.org
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4088
Committed: https://git.chromium.org/gitweb?p=native_client/nacl-newlib.git;a=commit;h=fe55b309fd67115ffe77ed59609506cbd611b210
Patch Set 1 #
Total comments: 11
Patch Set 2 : review #Patch Set 3 : remove movs, use local labels #Patch Set 4 : remove nontemporals and setjmp modifications #
Total comments: 11
Patch Set 5 : review #
Total comments: 10
Patch Set 6 : review #Messages
Total messages: 16 (1 generated)
I thought about just using PNaCl's setjmp implementation directly, but discovered that it uses a different jmp_buf layout. I figured it would be better to keep nacl-clang matching existing nacl-gcc code.
Would it be worth running the nacl-clang Scons tests with Victor's prototype validator patch applied (from https://code.google.com/p/nativeclient/issues/detail?id=3596) to make sure we've caught all the cases? I'm not sure if that patch still applies cleanly, though. https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:61: movq %nacl: (r15,rsi), rax Aside: if we want to optimise this, we could use rbp as the address register, so that the masking only happens once. https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:117: rep movsb %nacl:(rsi), %nacl:(rdi), r15 This leaves base-address-extended addresses in registers. You'll need to remove use of REP instructions too. See https://docs.google.com/a/chromium.org/document/d/1eskaI4353XdsJQFJLRnZzb_YIE... https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... File newlib/libc/machine/x86_64/setjmp.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... newlib/libc/machine/x86_64/setjmp.S:30: movl (rsp), eax Note: shorter as: popq %rax movl eax, %nacl: 32 (r15,rdi) ... movl esp, %nacl: 44 (r15,rdi) (no leal needed) https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... newlib/libc/machine/x86_64/setjmp.S:40: movl esi, eax /* Return value */ I'm not sure why you're changing this. It looks like you're fixing this longjmp() implementation to follow the standard. If so, shouldn't that be a separate change, with a test in the Scons build?
https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:117: rep movsb %nacl:(rsi), %nacl:(rdi), r15 On 2015/02/11 02:23:45, Mark Seaborn wrote: > This leaves base-address-extended addresses in registers. You'll need to remove > use of REP instructions too. > > See > https://docs.google.com/a/chromium.org/document/d/1eskaI4353XdsJQFJLRnZzb_YIE... Would it be sufficient just to clear rsi and rdi after the movsb instruction? The addresses are never written to memory in this case, and jumping to any of these instructions would not result in the attacker getting the values from the registers. https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... File newlib/libc/machine/x86_64/setjmp.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... newlib/libc/machine/x86_64/setjmp.S:40: movl esi, eax /* Return value */ On 2015/02/11 02:23:45, Mark Seaborn wrote: > I'm not sure why you're changing this. It looks like you're fixing this > longjmp() implementation to follow the standard. If so, shouldn't that be a > separate change, with a test in the Scons build? Yes, I noticed that it didn't conform and decided to fix that, but I'll make it a separate CL.
https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:117: rep movsb %nacl:(rsi), %nacl:(rdi), r15 On 2015/02/11 18:19:11, Derek Schuff wrote: > On 2015/02/11 02:23:45, Mark Seaborn wrote: > > This leaves base-address-extended addresses in registers. You'll need to > remove > > use of REP instructions too. > > > > See > > > https://docs.google.com/a/chromium.org/document/d/1eskaI4353XdsJQFJLRnZzb_YIE... > > Would it be sufficient just to clear rsi and rdi after the movsb instruction? > The addresses are never written to memory in this case, and jumping to any of > these instructions would not result in the attacker getting the values from the > registers. ...except I guess if the movsb causes a fault, the user can catch it and get the register state from the exception context.
https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:117: rep movsb %nacl:(rsi), %nacl:(rdi), r15 On 2015/02/11 18:19:11, Derek Schuff wrote: > On 2015/02/11 02:23:45, Mark Seaborn wrote: > > This leaves base-address-extended addresses in registers. You'll need to > remove > > use of REP instructions too. > > > > See > > > https://docs.google.com/a/chromium.org/document/d/1eskaI4353XdsJQFJLRnZzb_YIE... > > Would it be sufficient just to clear rsi and rdi after the movsb instruction? > The addresses are never written to memory in this case, and jumping to any of > these instructions would not result in the attacker getting the values from the > registers. In principle, yes. As you say, the instructions that clear rsi and rdi wouldn't even need to be bundle-locked to the REP instruction. However, checking for this is potentially more complicated to implement in the validator. It would be simpler just to disallow REP. What would be the reason to keep the use of REP? Is it just for ease of porting, or do you think REP is faster than a loop these days? > ...except I guess if the movsb causes a fault, the user can catch it and get the > register state from the exception context. Nope, that's not a problem, because we don't enable fault handling under PNaCl.
https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:61: movq %nacl: (r15,rsi), rax On 2015/02/11 02:23:45, Mark Seaborn wrote: > Aside: if we want to optimise this, we could use rbp as the address register, so > that the masking only happens once. Done. https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/m... newlib/libc/machine/x86_64/memcpy.S:117: rep movsb %nacl:(rsi), %nacl:(rdi), r15 On 2015/02/11 19:21:51, Mark Seaborn wrote: > On 2015/02/11 18:19:11, Derek Schuff wrote: > > On 2015/02/11 02:23:45, Mark Seaborn wrote: > > > This leaves base-address-extended addresses in registers. You'll need to > > remove > > > use of REP instructions too. > > > > > > See > > > > > > https://docs.google.com/a/chromium.org/document/d/1eskaI4353XdsJQFJLRnZzb_YIE... > > > > Would it be sufficient just to clear rsi and rdi after the movsb instruction? > > The addresses are never written to memory in this case, and jumping to any of > > these instructions would not result in the attacker getting the values from > the > > registers. > > In principle, yes. As you say, the instructions that clear rsi and rdi wouldn't > even need to be bundle-locked to the REP instruction. > > However, checking for this is potentially more complicated to implement in the > validator. It would be simpler just to disallow REP. > > What would be the reason to keep the use of REP? Is it just for ease of > porting, or do you think REP is faster than a loop these days? > > > > ...except I guess if the movsb causes a fault, the user can catch it and get > the > > register state from the exception context. > > Nope, that's not a problem, because we don't enable fault handling under PNaCl. OK, as we discussed I just made them all explicit movs loops, and clear rsi and rdi after exit. https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... File newlib/libc/machine/x86_64/setjmp.S (right): https://codereview.chromium.org/917463003/diff/1/newlib/libc/machine/x86_64/s... newlib/libc/machine/x86_64/setjmp.S:30: movl (rsp), eax On 2015/02/11 02:23:45, Mark Seaborn wrote: > Note: shorter as: > > popq %rax > movl eax, %nacl: 32 (r15,rdi) > ... > movl esp, %nacl: 44 (r15,rdi) > > (no leal needed) Done.
removed movs, PTAL
On 2015/02/12 22:44:06, Derek Schuff wrote: > removed movs, PTAL moved setjmp changes to another CL, removed nontemporals, and updated CL description.
Any chance of splitting this into multiple CLs? If you do multiple changes in one CL, it takes me longer to review and I'm more likely to make a mistake. https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (left): https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:90: sfence Please note the removal of this sfence in the commit message. Presumably the sfence was there because non-temporals have less ordering guarantees? https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:19: jb .Lbyte_copy > - It uses local symbol names to avoid polluting the user's symbol namespace. FWIW, they already *are* local as long as there's no ".global" decl. The issue with using "byte_copy:" is just that GDB tends to treat it as starting a new function. (Just pointing this out. No change required except perhaps to the commit description.) https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:48: /* Avoid revealing the sandbox base address. Nit: Should this use the NaCl style for multiline comments, with "/*" on a separate line? https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:63: pushq rdx Nit: fix operand's indentation alignment https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:113: popq rcx Earlier you push rdx but here you pop rcx. Not sure whether this is correct.
https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (left): https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:90: sfence On 2015/02/18 20:03:38, Mark Seaborn wrote: > Please note the removal of this sfence in the commit message. > > Presumably the sfence was there because non-temporals have less ordering > guarantees? correct, and done. https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:19: jb .Lbyte_copy On 2015/02/18 20:03:38, Mark Seaborn wrote: > > - It uses local symbol names to avoid polluting the user's symbol namespace. > > FWIW, they already *are* local as long as there's no ".global" decl. The issue > with using "byte_copy:" is just that GDB tends to treat it as starting a new > function. (Just pointing this out. No change required except perhaps to the > commit description.) OK, I guess technically the effect that I wanted is that they don't end up in the symbol table of the object file. https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:48: /* Avoid revealing the sandbox base address. On 2015/02/18 20:03:38, Mark Seaborn wrote: > Nit: Should this use the NaCl style for multiline comments, with "/*" on a > separate line? Done. https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:63: pushq rdx On 2015/02/18 20:03:38, Mark Seaborn wrote: > Nit: fix operand's indentation alignment Done. https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:113: popq rcx On 2015/02/18 20:03:38, Mark Seaborn wrote: > Earlier you push rdx but here you pop rcx. Not sure whether this is correct. This replaces line 91 of the original which just copies edx (memcpy's size argument) into ecx to be modified.
https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/60001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:113: popq rcx On 2015/02/18 20:47:40, Derek Schuff wrote: > On 2015/02/18 20:03:38, Mark Seaborn wrote: > > Earlier you push rdx but here you pop rcx. Not sure whether this is correct. > > This replaces line 91 of the original which just copies edx (memcpy's size > argument) into ecx to be modified. (I forgot to add "and rdx is not used anymore in this codepath, which just returns in line 131")
LGTM https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:45: movl edx, ecx /* Copy 128 bytes at a time with minimum cache polution */ Remove "with minimum cache polution"? https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:64: pushq rdx Maybe add "/* Save byte count */" https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:114: popq rcx and "/* Restore byte count */" https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:117: andl $127, ecx add "/* Copy the remaining bytes */" https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:163: movl edx, ecx You could use edx from here to the end, and avoid the "mov".
New patchsets have been uploaded after l-g-t-m from mseaborn@chromium.org
Message was sent while issue was closed.
Committed patchset #6 (id:100001) manually as fe55b309fd67115ffe77ed59609506cbd611b210 (presubmit successful).
Message was sent while issue was closed.
oops, forgot to mail all the 'Done's https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... File newlib/libc/machine/x86_64/memcpy.S (right): https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:45: movl edx, ecx /* Copy 128 bytes at a time with minimum cache polution */ On 2015/02/20 00:43:45, Mark Seaborn wrote: > Remove "with minimum cache polution"? Done. https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:64: pushq rdx On 2015/02/20 00:43:45, Mark Seaborn wrote: > Maybe add "/* Save byte count */" Done. https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:114: popq rcx On 2015/02/20 00:43:45, Mark Seaborn wrote: > and "/* Restore byte count */" Done. https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:117: andl $127, ecx On 2015/02/20 00:43:45, Mark Seaborn wrote: > add "/* Copy the remaining bytes */" Done. https://codereview.chromium.org/917463003/diff/80001/newlib/libc/machine/x86_... newlib/libc/machine/x86_64/memcpy.S:163: movl edx, ecx On 2015/02/20 00:43:45, Mark Seaborn wrote: > You could use edx from here to the end, and avoid the "mov". Done.
Message was sent while issue was closed.
Now that the rowhammer blog post is published (http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-t...), I can add some context: The reason for removing the non-temporal memory accesses here is just in case they can be used for row hammering. |