Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(77)

Side by Side Diff: native_client_sdk/src/doc/reference/sandbox_internals/arm-32-bit-sandbox.rst

Issue 147803003: NaCl docs: add ARM 32-bit sandbox (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src
Patch Set: Created 6 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
OLDNEW
(Empty)
1 ==================
2 ARM 32-bit Sandbox
3 ==================
4
5 Native Client for ARM is a method for running programs---even malicious
6 ones---safely, on computers that use 32-bit ARM processors. It's an
7 extension of earlier work on Native Client for x86 processors. This
8 security is provided with a low performance overhead of about 10% over
9 regular ARM code, and as you'll see in this document the sandbox model
10 is beautifully simple, meaning that the trusted codebase is much easier
11 to validate.
12
13 As an implementation detail, the Native Client 32-bit ARM sandbox is
14 currently used by Portable Native Client to execute code on 32-bit ARM
15 machines in a safe manner. The portable bitcode contained in a **pexe**
16 is translated to a 32-bit ARM **nexe** before execution. This may change
17 at a point in time: Portable Native Client doesn't necessarily need this
18 sandbox to execute code on ARM. Note that the Portable Native Client
19 compiler itself is also untrusted: it too runs in the ARM sandbox
20 described in this document.
21
22 On this page, we describe how Native Client works on 32-bit ARM. We
23 assume no prior knowledge about the internals of Native Client, on x86
24 or any other architecture, but we do assume some familiarity with
25 assembly languages in general.
26
27 .. contents::
28 :local:
29 :backlinks: none
30 :depth: 3
31
32 An Introduction to the ARM Architecture
33 =======================================
34
35 In this section, we summarize the relevant parts of the ARM processor
36 architecture.
37
38 About ARM and ARMv7-A
39 ---------------------
40
41 ARM is one of the older commercial "RISC" processor designs, dating back
42 to the early 1980s. Today, it is used primarily in embedded systems:
43 everything from toys, to home automation, to automobiles. However, its
44 most visible use is in cellular phones, tablets and some
45 laptops.
46
47 Through the years, there have been many revisions of the ARM
48 architecture, written as ARMv\ *X* for some version *X*. Native Client
49 specifically targets the ARMv7-A architecture commonly used in high-end
50 phones and smartbooks. This revision, defined in the mid-2000s, adds a
51 number of useful instructions, and specifies some portions of the system
52 that used to be left to individual chip manufacturers. Critically,
53 ARMv7-A specifies the "eXecute Never" bit, or *XN*. This pagetable
54 attribute lets us mark memory as non-executable. Our security relies on
55 the presence of this feature.
56
57 ARMv8 adds a new 64-bit instruction set architecture called A64, while
58 also enhancing the 32-bit A32 ISA. For Native Client's purposes the A32
59 ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new
60 instructions. This document only discussed the 32-bit A32 instruction
61 set: A64 would require a different sandboxing model.
62
63 ARM Programmer's Model
64 ----------------------
65
66 While modern ARM chips support several instruction encodings, 32-bit
67 Native Client on ARM focuses on a single one: a fixed-width encoding
68 where every instruction is 32-bits wide called A32 (previously, and
69 confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called
70 T32), Jazelle, ThumbEE and such aren't supported by Native Client. This
71 dramatically simplifies some of our analyses, as we'll see later. Nearly
72 every instruction can be conditionally executed based on the contents of
73 a dedicated condition code register.
74
75 ARM processors have 16 general-purpose registers used for integer and
76 memory operations, written ``r0`` through ``r15``. Of these, two have
77 special roles baked in to the hardware:
78
79 * ``r14`` is the Link Register. The ARM *call* instruction
80 (*branch-with-link*) doesn't use the stack directly. Instead, it
81 stashes the return address in ``r14``. In other circumstances, ``r14``
82 can be (and is!) used as a general-purpose register. When ``r14`` is
83 playing its Link Register role, it's referred to as ``lr``.
84 * ``r15`` is the Program Counter. While it can be read and written like
85 any other register, setting it to a new value will cause execution to
86 jump to a new address. Using it in some circumstances is also
87 undefined by the ARM architecture. Because of this, ``r15`` is never
88 used for anything else, and is referred to as ``pc``.
89
90 Other registers are given roles by convention. The only important
91 registers to Native Client are ``r9`` and ``r13``, which are used as the
92 Thread Pointer location and Stack Pointer. When playing this role,
93 they're referred to as ``tp`` and ``sp``.
94
95 Like other RISC-inspired designs, ARM programs use explicit *load* and
96 *store* instructions to access memory. All other instructions operate
97 only on registers, or on registers and small constants called
98 immediates. Because both instructions and data words are 32-bits, we
99 can't simply embed a 32-bit number into an instruction. ARM programs use
100 three methods to work around this, all of which Native Client exploits:
101
102 1. Many instructions can encode a modified immediate, which is an 8-bit
103 number rotated right by an even number of bits.
104 2. The ``movw`` and ``movt`` instructions can be used to set the top and
105 bottom 16-bits of a register, and can therefore encode any 32-bit
106 immediate.
107 3. For values that can't be represented as modified immediates, ARM
108 programs use ``pc``-relative loads to load data from inside the
109 code---hidden in a place where it won't be executed such as "constant
110 pools", just past the final return of a function.
111
112 We'll introduce more details of the ARM instruction set later, as we
113 walk through the system.
114
115 The Native Client Approach
116 ==========================
117
118 Native Client runs an untrusted program, potentially from an unknown or
119 malicious source, inside a sandbox created by a trusted runtime. The
120 trusted runtime allows the untrusted program to "call-out" and perform
121 certain actions, such as drawing graphics, but prevents it from
122 accessing the operating system directly. This "call-out" facility,
123 called a trampoline, looks like a standard function call to the
124 untrusted program, but it allows control to escape from the sandbox in a
125 controlled way.
126
127 The untrusted program and trusted runtime inhabit the same process, or
128 virtual address space, maintained by the operating system. To keep the
129 trusted runtime behaving the way we expect, we must prevent the
130 untrusted program from accessing and modifying its internals. Since they
131 share a virtual address space, we can't rely on the operating system for
132 this. Instead, we isolate the untrusted program from the trusted
133 runtime.
134
135 Unlike modern operating systems, we use a cooperative isolation
136 method. Native Client can't run any off-the-shelf program compiled for
137 an off-the-shelf operating system. The program must be compiled to
138 comply with Native Client's rules. The details vary on each platform,
139 but in general, the untrusted program:
140
141 * Must not attempt to use certain forbidden instructions, such as system
142 calls.
143 * Must not attempt to modify its own code without abiding by Native
144 Client's code modification rules.
145 * Must not jump into the middle of an instruction group, or otherwise do
146 tricky things to cause instructions to be interpreted multiple ways.
147 * Must use special, strictly-defined instruction sequences to perform
148 permitted but potentially dangerous actions. We call these sequences
149 pseudo-instructions.
150
151 We can't simply take the program's word that it complies with these
152 rules---we call it "untrusted" for a reason! Nor do we require it to be
153 produced by a special compiler; in practice, we don't trust our
154 compilers either. Instead, we apply a load-time validator that
155 disassembles the program. The validator either proves that the program
156 complies with our rules, or rejects it as unsafe. By keeping the rules
157 simple, we keep the validator simple, small, and fast. We like to put
158 our trust in small, simple things, and the validator is key to the
159 system's security.
160
161 .. Note::
162 :class: note
163
164 For the computationally-inclined, all our validators scale linearly in
165 the size of the program.
166
167 NaCl/ARM: Pure Software Fault Isolation
168 ---------------------------------------
169
170 In the original Native Client system for the x86, we used unusual
171 hardware features of that processor (the segment registers) to isolate
172 untrusted programs. This was simple and fast, but won't work on ARM,
173 which has nothing equivalent. Instead, we use pure software fault
174 isolation.
175
176 We use a fixed address space layout: the untrusted program gets the
177 lowest gigabyte, addresses ``0`` through ``0x3FFFFFFF``. The rest of the
178 address space holds the trusted runtime and the operating system. We
179 isolate the program by requiring every *load*, *store*, and *indirect
180 branch* (to an address in a register) to use a pseudo-instruction. The
181 pseudo-instructions ensure that the address stays within the
182 sandbox. The *indirect branch* pseudo-instruction, in turn, ensures that
183 such branches won't split up other pseudo-instructions.
184
185 At either side of the sandbox, we place small (8KiB) guard
186 regions. These are simply areas in the process's address space that are
187 mapped without read, write, or execute permissions, so any attempt to
188 access them for any reason---*load*, *store*, or *jump*---will cause a
189 fault.
190
191 Finally, we ban the use of certain instructions, notably direct system
192 calls. This is to ensure that the untrusted program can be run on any
193 operating system supported by Native Client, and to prevent access to
194 certain system features that might be used to subvert the sandbox. As a
195 side effect, it helps to prevent programs from exploiting buggy
196 operating system APIs.
197
198 Let's walk through the details, starting with the simplest part: *load*
199 and *store*.
200
201 Loads and Stores
202 ^^^^^^^^^^^^^^^^
203
204 All access to memory must be through *load* and *store*
205 pseudo-instructions. These are simply a native *load* or *store*
206 instruction, preceded by a guard instruction.
207
208 Each *load* or *store* pseudo-instruction is similar to the *load* shown
209 below. We use abstract "placeholder" registers instead of specific
210 numbered registers for the sake of discussion. ``rA`` is the register
211 holding the address to load from. ``rD`` is the destination for the
212 loaded data.
213
214 .. naclcode::
215
216 bic rA, #0xC0000000
217 ldr rD, [rA]
218
219 The first instruction, ``bic``, clears the top two bits of ``rA``. In
220 this case, that means that the value in ``rA`` is forced to an address
221 inside our sandbox, between ``0`` and ``0x3FFFFFFF``, inclusive.
222
223 The second instruction, ``ldr``, uses the previously-sandboxed address
224 to load a value. This address might not be the address that the program
225 intended, and might cause an access to an unmapped memory location
226 within the sandbox: ``bic`` forces the address to be valid, by clearing
227 the top two bits. This is a no-op in a correct program.
228
229 This illustrates a common property of all Native Client systems: we aim
230 for safety, not correctness. A program using an invalid address in
231 ``rA`` here is simply broken, so we are free to do whatever we want to
232 preserve safety. In this case the program might load an invalid (but
233 safe) value, or cause a segmentation fault limited to the untrusted
234 code.
235
236 Now, if we allowed arbitrary branches within the program, a malicious
237 program could set up carefully-crafted values in ``rA``, and then jump
238 straight to the ``ldr``. This is why we validate that programs never
239 split pseudo-instructions.
240
241 Alternative Sandboxing
binji 2014/02/06 21:23:39 It's not immediately clear to me if this sandboxin
JF 2014/02/06 23:26:36 It's currently disallowed because NaCl can't reall
242 """"""""""""""""""""""
243
244 .. naclcode::
245
246 tst rA, #0xC0000000
247 ldreq rD, [rA]
248
249 The first instruction, ``tst``, performs a bitwise-\ ``AND`` of ``rA`` and
250 the modified immediate literal, ``0xC0000000``. It sets the condition
251 flags based on the result, but does not write the result to a
252 register. In particular, it sets the ``Z`` condition flag if the result
253 was zero---if the two values had no one-bits in common. In this case,
binji 2014/02/06 21:23:39 nit: "no one-bits" reads like "no one" bits to me.
JF 2014/02/06 23:26:36 Changed to "had no set bits in common".
254 that means that the value in ``rA`` was an address inside our sandbox,
255 between ``0`` and ``0x3FFFFFFF``, inclusive.
256
257 The second instruction, ``ldreq``, is a conditional load if equal. As we
258 mentioned before, nearly all ARM instructions can be made
259 conditional. In assembly language, we simply stick the desired condition
260 on the end of the instruction's mnemonic name. Here, the condition is
261 ``EQ``, which causes the instruction to execute only if the ``Z`` flag
262 is set.
263
264 Thus, when the pseudo-instruction executes, the ``tst`` sets ``Z`` if
265 (and only if) the value in ``rA`` is an address within the bounds of the
266 sandbox, and then the ``ldreq`` loads if (and only if) it was. If ``rA``
267 held an invalid address, the *load* does not execute, and ``rD`` is
268 unchanged.
269
270 Addressing Modes
271 """"""""""""""""
272
273 ARM has an unusually rich set of addressing modes. We allow all but one:
274 register-indexed, where two registers are added to determine the
275 address.
276
277 We permit simple *load* and *store*, as shown above. We also permit
278 displacement, pre-index, and post-index memory operations:
279
280 .. naclcode::
281
282 bic rA, #0xC0000000
283 ldr rD, [rA, #1234] ; this is fine
284
285 bic rA, #0xC0000000
286 ldr rD, [rA, #1234]! ; also fine
287
288 bic rA, #0xC0000000
289 ldr rD, [rA], #1234 ; looking good
290
291 In each case, we know ``rA`` points into the sandbox when the ``ldr``
292 executes. We allow adding an immediate displacement to ``rA`` to
293 determine the final address (as in the first two examples here) because
294 the largest immediate displacement is ±4095 bytes, while our guard pages
295 are 8192 bytes wide.
296
297 We also allow ARM's more unusual *load* and *store* instructions, such
298 as *load-multiple* and *store-multiple*, etc.
299
300 Conditional *Load* and *Store*
301 """"""""""""""""""""""""""""""
302
303 There's one problem with the pseudo-instructions shown above: they are
304 unconditional (assuming ``rA`` is valid). ARM compilers regularly use
305 conditional *load* and *store*, so we should support this in Native
306 Client. We do so by defining alternate, predicable
binji 2014/02/06 21:23:39 sp: predictable
JF 2014/02/06 23:26:36 Done.
307 pseudo-instructions. Here is a conditional *store*
308 (store-if-greater-than) using this pseudo-instruction sequence:
309
310 .. naclcode::
311
312 bicgt rA, #0xC0000000
313 strgt rX, [rA, #123]
314
315 .. Note::
316 :class: note
317
318 The ``tst``-based sequence is faster than the ``bic``-based sequence
319 on modern ARM chips. It avoids a data dependency in the address
320 register. This is why we keep both around. The ``tst``-based sequence
321 unfortunately leaks information on some processors, and is therefore
322 forbidden in these cases.
binji 2014/02/06 21:23:39 This is a bit unclear. What happens if you try to
JF 2014/02/06 23:26:36 I moved this note up and clarified.
323
324 The Stack Pointer, Thread Pointer, and Program Counter
325 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
326
327 Stack Pointer
328 """""""""""""
329
330 In C-like languages, the stack is used to store return addresses during
331 function calls, as well as any local variables that won't fit in
332 registers. This makes stack operations very common.
333
334 Native Client does not require guard instructions on any *load* or
335 *store* involving the stack pointer, ``sp``. This improves performance
336 and reduces code size. However, ARM's stack pointer isn't special: it's
337 just another register, called ``sp`` only by convention. To make it safe
338 to use this register as a *load* or *store* address without guards, we
339 add a rule: ``sp`` must always contain a valid address.
340
341 We enforce this rule by restricting the sorts of operations that
342 programs can use to alter ``sp``. Programs can alter ``sp`` by adding or
343 subtracting an immediate, as a side-effect of a *load* or *store*:
344
345 .. naclcode::
346
347 ldr rX, [sp], #4! ; loads from stack, then adds 4 to sp
348
349 pop {rX} ; equivalent
350
351 str rX, [sp, #1234]! ; adds 1234 to sp, then stores to stack
352
353 These are safe because, as we mentioned before, the largest immediate
354 available in a *load* or *store* is ±4095. Even after adding or
355 subtracting 4095, the stack pointer will still be within the sandbox or
356 guard regions.
binji 2014/02/06 21:23:39 But what happens when you execute one of these sp-
JF 2014/02/06 23:26:36 Correct, but the key here is that you also perform
357
358 Any other operation that alters ``sp`` must be followed by a guard
359 instruction. The most common alterations, in practice, are addition and
360 subtraction of arbitrary integers:
361
362 .. naclcode::
363
364 add sp, rX
365 bic sp, #0xC0000000
366
367 The ``bic`` is similar to the one we used for conditional *load* and
368 *store*, and serves exactly the same purpose: after it completes, ``sp``
369 is a valid address.
370
371 .. Note::
372 :class: note
373
374 Clever assembly programmers and compilers may want to use this
375 "trusted" property of ``sp`` to emit more efficient code: in a hot
376 loop instead of using ``sp`` as a stack pointer it can be temporarily
377 used as an index pointer (e.g. to traverse an array). This avoids the
378 extra ``bic`` whenever the pointer is updated in the loop.
379
380 Thread Pointer Loads
381 """"""""""""""""""""
382
383 The thread pointer and IRT thread pointer are stored in the trusted
384 address space. All uses and definitions of ``r9`` from untrusted code
385 are forbidden except as follows:
386
387 .. naclcode::
388
389 ldr Rn, [r9] ; load use thread pointer.
binji 2014/02/06 21:23:39 load use?
JF 2014/02/06 23:26:36 "user", already fixed.
390 ldr Rn, [r9, #4] ; load IRT thread pointer.
391
392 ``pc``-relative Loads
393 """""""""""""""""""""
394
395 By extension, we also allow *load* through the ``pc`` without a
396 mask. The explanation is quite similar:
397
398 * Our control-flow isolation rules mean that the ``pc`` will always
399 point into the sandbox.
400 * The maximum immediate displacement that can be used in a
401 ``pc``-relative *load* is smaller than the width of the guard pages.
402
403 We do not allow ``pc``-relative stores, because they look suspiciously
404 like self-modifying code, or any addressing mode that would alter the
405 ``pc`` as a side effect of the *load*.
406
407 Indirect Branches
408 ^^^^^^^^^^^^^^^^^
409
410 There are two types of control flow on ARM: direct and indirect. Direct
411 control flow instructions have an embedded target address or
412 offset. Indirect control flow instructions take their destination
413 address from a register. The ``b`` (branch) and ``bl``
binji 2014/02/06 21:23:39 (*branch*)? What is the (* *) syntax doing anyway?
JF 2014/02/06 23:26:36 The parens are just part of the text, no ReST mean
414 (*branch-with-link*) instructions are *direct branch* and *call*,
415 respectively. The ``bx`` (*branch-exchange*) and ``blx``
416 (*branch-with-link-exchange*) are the indirect equivalents.
417
418 Because the program counter ``pc`` is simply another register, ARM also
419 has many implicit indirect control flow instructions. Programs can
420 operate on the ``pc`` using *add* or *load*, or even outlandish (and
421 often specified as having unpredictable-behavior) things like multiply!
422 In Native Client we ban all such instructions. Indirect control flow is
423 exclusively through ``bx`` and ``blx``. Because all of ARM's control
424 flow instructions are called *branch* instructions, we'll use the term
425 *indirect branch* from here on, even though this includes things like
426 *virtual call*, *return*, and the like.
427
428 The Trouble with Indirection
429 """"""""""""""""""""""""""""
430
431 *Indirect branch* present two problems for Native Client:
432
433 * We must ensure that they don't send execution outside the sandbox.
434 * We must ensure that they don't break up the instructions inside a
435 pseudo-instruction, by landing on the second one.
436
437 .. Note::
438 :class: note
439
440 On the x86 architectures we must also ensure that it doesn't land
441 inside an instruction. This is unnecessary on ARM, where all
442 instructions are 32-bit wide.
443
444 Checking both of these for *direct branch* is easy: the validator just
445 pulls the (fixed) target address out of the instruction and checks what
446 it points to.
447
448 The Native Client Solution: "Bundles"
449 """""""""""""""""""""""""""""""""""""
450
451 For *indirect branch*, we can address the first problem by simply
452 masking some high-order bits off the address, like we did for *load* and
453 *store*. The second problem is more subtle. Detecting every possible
454 route that every *indirect branch* might take is difficult. Instead, we
455 take the approach pioneered by the original Native Client: we restrict
456 the possible places that any *indirect branch* can land. On Native
457 Client for ARM, *indirect branch* can target any address that has its
458 bottom four bits clear---any address that's ``0 mod 16``. We call these
459 16-byte chunks of code "bundles". The validator makes sure that no
460 pseudo-instruction straddles a bundle boundary. Compilers must pad with`
461 `nop``\ s to ensure that every pseudo-instruction fits entirely inside
462 one bundle.
463
464 Here is the *indirect branch* pseudo-instruction. As you can see, it
465 clears the top two and bottom four bits of the address:
466
467 .. naclcode::
468
469 bic rA, #0xC000000F
470 bx rA
471
472 This particular pseudo-instruction (a ``bic`` followed by a ``bx``) is
473 used for computed jumps in switch tables and returning from functions,
474 among other uses. Recall that, under ARM's modified immediate rules, we
475 can fit the constant ``0xC000000F`` into the ``bic`` instruction's
476 immediate field: ``0xC000000F`` is the 8-bit constant ``0xFC``, rotated
477 right by 4 bits.
478
479 The other useful variant is the *indirect branch-with-link*, which is
480 the ARM equivalent to *call*:
481
482 .. naclcode::
483
484 bic rA, #0xC000000F
485 blx rA
486
487 This is used for indirect function calls---commonly seen in C++ programs
488 as virtual calls, but also for calling function pointers in C.
489
490 Note that both *indirect branch* pseudo-instructions use ``bic``, rather
491 than the ``tst`` instruction we allow for *load* and *store*. There are
492 two reasons for this:
493
494 1. Conditional *branch* is very common. Much more common than
495 conditional *load* and *store*. If we supported an alternative
496 ``tst``-based sequence for *branch*, it would be rare.
497 2. There's no performance benefit to using ``tst`` here on modern ARM
498 chips. *Branch* consumes its operands later in the pipeline than
499 *load* and *store* (since they don't have to generate an address,
500 etc) so this sequence doesn't stall.
501
502 .. Note::
503 :class: note
504
505 At this point astute readers are wondering what the ``x`` in ``bx``
506 and ``blx`` means. We told you it stood for "exchange", but exchange
507 to what? ARM, for all the reduced-ness of its instruction set, can
508 change execution mode from A32 (ARM) to T32 (Thumb) and back with
509 these *branch* instructions, called *interworking branch*. Recall that
510 A32 instructions are 32-bit wide, and T32 instructions are a mix of
511 both 16-bit or 32-bit wide. The destination address given to a
512 *branch* therefore cannot sensibly have its bottom bit set in either
513 instruction set: that would be an unaligned instruction in both cases,
514 and ARM simply doesn't support this. The bottom bit for the *indirect
515 branch* was therefore cleverly recycled by the ARM architecture to
516 mean "switch to T32 mode" when set!
517
518 As you've figured out by now, Native Client's sandbox won't be very
519 happy if A32 instructions were to be executed as T32 instructions: who
520 know what they correspond to? A malicious person could craft valid
521 A32 code that's actually very naughty T32 code, somewhat like forming
522 a sentence that happens to be valid in English and French but with
523 completely different meanings, complimenting the reader in one
524 language and insulting them in the other.
binji 2014/02/06 21:23:39 I think I need an example of this sentence. :)
JF 2014/02/06 23:26:36 I know right!!! I actually have a G+ post about th
525
526 You've figured out by now that the bundle alignment restrictions of
527 the Native Client sandbox already take care of making this travesty
528 impossible: by masking off the bottom 4 bits of the destination the
529 interworking nature of ARM's *indirect branch* is completely avoided.
530
531 *Call* and *Return*
532 """""""""""""""""""
533
534 On ARM, there is no *call* or *return* instruction. A *call* is simply a
535 *branch* that just happen to load a return address into ``lr``, the link
536 register. If the called function is a leaf (that is, if it calls no
537 other functions before returning), it simply branches to the address
538 stored in ``lr``:
binji 2014/02/06 21:23:39 This is describing returning, correct? The way I r
JF 2014/02/06 23:26:36 Yes, I clarified.
539
540 .. naclcode::
541
542 bic lr, #0xC000000F
543 bx lr
544
545 If the function called other functions, however, it had to spill ``lr``
546 onto the stack. On x86, this is done implicitly, but it is explicit on
547 ARM:
548
549 .. naclcode::
550
551 push { lr }
552 ; ... some code here ...
553 pop { lr }
554 bic lr, #0xC000000F
555 bx lr
556
557 There are two things to note about this code.
558
559 1. As we mentioned before, we don't allow arbitrary instructions to
560 write to the Program Counter, ``pc``. Thus, while a traditional ARM
561 program might have popped directly into ``pc`` to end the function,
562 we require a pop into a register, followed by a pseudo-instruction.
563 2. Function returns really are just *indirect branch*, with the same
564 restrictions. This means that functions can only return to addresses
565 that are bundle-aligned: ``0 mod 16``.
566
567 The implication here is that a *call*\ ---the *branch* that enter
binji 2014/02/06 21:23:39 s/enter/enters/?
JF 2014/02/06 23:26:36 Done.
568 functions---must be placed at the end of the bundle, so that the return
569 address they generate is ``0 mod 16``. Otherwise, when we clear the
570 bottom four bits, the program would enter an infinite loop! (Native
571 Client doesn't try to prevent infinite loops, but the validator actually
572 does check the alignment of calls. This is because, when we were writing
573 the compiler, it was annoying to find out our calls were in the wrong
574 place by having the program run forever!).
binji 2014/02/06 21:23:39 "!)." looks weird to me. Isn't it supposed to be j
JF 2014/02/06 23:26:36 Done.
575
576 .. Note::
577 :class: note
578
579 Properly balancing the CPU's *call*/*return* actually allows it to
580 perform much better by allowing it to speculatively execute the return
581 address' code. For more information on ARM's *call*/*return* stack see
binji 2014/02/06 21:23:39 I've usually seen address's, but I defer to Andy o
JF 2014/02/06 23:26:36 Leaving as-is for now.
582 ARM's technical reference manual.
583
584 Literal Pools and Data Bundles
585 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
586
587 In the section where we described the ARM architecture, we mentioned
588 ARM's unusual immediate forms. To restate:
589
590 * ARM instructions are fixed-length, 32-bits, so we can't have an
591 instruction that includes an arbitrary 32-bit constant.
592 * Many ARM instructions can include a modified immediate constant, which
593 is flexible, but limited.
594 * For any other value (particularly addresses), ARM programs explicitly
595 load constants from inside the code itself.
596
597 .. Note::
598 :class: note
599
600 ARMv7 introduces some instructions, ``movw`` and ``movt``, that try to
601 address this by letting us directly load larger constants. Our
602 toolchain uses this capability in some cases.
603
604 Here's a typical example of the use of a literal pool. ARM assemblers
605 typically hide the details---this is the sort of code you'd see produced
606 by a disassembler, but with more comments.
607
608 .. naclcode::
609
610 ; C equivalent: "table[3] = 4"
611 ; 'table' is a static array of bytes.
612 ldr r0, [pc, #124] ; load the address of the 'table'
613 ; ("124" is the offset from here to the constant belo w)
614 add r0, #3 ; add the immediate array index
615 mov r1, #4 ; get the constant '4' into a register
616 bic r0, #0xC0000000 ; mask our array address
617 strb r1, [r0] ; store one byte
618
619 ...
620
621 .word table ; constant referenced above
622
623 Because table is a static array, the compiler knew its address at
624 compile-time---but the address didn't fit in a modified immediate. (Most
625 don't). So, instead of loading an immediate into ``r0`` with a ``mov``,
626 we stashed the address in the code, generated its address using ``pc``,
627 and loaded the constant. ARM compilers will typically group all the
628 embedded data together into a literal pool. These typically live just
629 past the end of functions, where they won't be executed.
630
631 This is an important trick in ARM code, so it's important to support it
632 in Native Client... but there's a potential flaw. If we let programs
633 contain arbitrary data, mingled in with the code, couldn't they hide
634 malicious instructions this way?
635
636 The answer is no, because the validator disassembles the entire
637 executable region of the program, without regard to whether the
638 programmer said a certain chunk was code or data. But this brings the
639 opposite problem: what if the program needs to contain a certain
640 constant that just happens to encode a malicious instruction? We want
641 to allow this, but we have to be certain it will never be executed as
642 code!
643
644 Data Bundles to the Rescue
645 """"""""""""""""""""""""""
646
647 As we discussed in the last section, ARM code in Native Client is
648 structured in 16-byte bundles. We allow literal pools by putting them in
649 special bundles, called data bundles. Each data bundle can contain 12
650 bytes of arbitrary data, and the program can have as many data bundles
651 as it likes.
652
653 Each data bundle starts with a breakpoint instruction, ``bkpt``. This
654 way, if an *indirect branch* tries to enter the data bundle, the process
655 will take a fault and the trusted runtime will intervene (by terminating
656 the program). For example:
657
658 .. naclcode::
659
660 bkpt #0x5BE0 ; must be aligned 0 mod 16!
661 .word 0xDEADBEEF ; arbitrary constants are A-ok
662 svc #30 ; trying to make a syscall? ok!
663 str r0, [r1] ; unmasked stores are fine too
664
665 So, we have a way for programs to create an arbitrary, even dangerous,
666 chunk of data within their code. We can prevent *indirect branch* from
667 entering it. We can also prevent fall-through from the code just before
668 it, by the ``bkpt``. But what about *direct branch* straight into the
669 middle?
670
671 The validator detects all data bundles (because this ``bkpt`` has a
672 special encoding) and marks them as off-limits for *direct branch*. If
673 it finds a *direct branch* into a data bundle, the entire program is
674 rejected as unsafe. Because *direct branch* cannot be modified at
675 runtime, the data bundles cannot be executed.
676
677 .. Note::
678 :class: note
679
680 Clever readers may wonder: why use ``bkpt #0x5BE0``, that seems
681 awfully specific when you just need a special "roadblock" instruction!
682 Quite true, young Padawan! It happens that this odd ``bkpt``
683 instruction is encoded as ``0xE125BE70`` in A32, and in T32 the
684 ``bkpt`` instruction is encoded as ``0xBExx`` (where ``xx`` could be
685 any 8-bit immediate, say ``0x70``) and ``0xE125`` encodes the *branch*
686 instruction ``b.n #0x250``. The special roadblock instruction
687 therefore doubles as a roadblock in T32, if anything were to go so
688 awry that we tried to execute it as a T32 instruction! Much defense,
689 such depth, wow!
690
691 Trampolines and Memory Layout
692 -----------------------------
693
694 So far, the rules we've described make for boring programs: they can't
695 communicate with the outside world!
696
697 * The program can't call an external library, or the operating system,
698 even to do something simple like draw some pixels on the screen.
699 * It also can't read or write memory outside of its dedicated sandbox,
700 so communicating that way is right out.
701
702 We fix this by allowing the untrusted program to call into the trusted
703 runtime using a trampoline. A trampoline is simply a short stretch of
704 code, placed by the trusted runtime at a known location within the
705 sandbox, that is permitted to do things the untrusted program can't.
706
707 Even though trampolines are inside the sandbox, the untrusted program
708 can't modify them: the trusted runtime marks them read-only. It also
709 can't do anything clever with the special instructions inside the
710 trampoline---for example, call it at a slightly offset address to bypass
711 some checks---because the validator only allows trampolines to be
712 reached by *indirect branch* (or *branch-with-link*). We structure the
713 trampolines carefully so that they're safe to enter at any ``0 mod 16``
714 address.
715
716 The validator can detect attempts to use the trampolines because they're
717 loaded at a fixed location in memory. Let's look at the memory map of
718 the Native Client sandbox.
719
720 Memory Map
721 ^^^^^^^^^^
722
723 The ARM sandbox is always at virtual address ``0``, and is exactly 1GiB
724 in size. This includes the untrusted program's code and data, the
725 trampolines, and a small guard region to detect null pointer
726 dereferences. In practice, the untrusted program takes up a bit more
727 room than this, because of the need for additional guard regions at
728 either end of the sandbox.
729
730 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
731 | Address | Size | Name | Purpose |
732 +================+=======+===================+================================== ==================================+
733 | ``-0x2000`` | 8KiB | Bottom Guard | Keeps negative-displacement *load * or *store* from escaping. |
734 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
735 | ``0`` | 64KiB | Null Guard | Catches null pointer dereferences , guards against kernel exploits. |
736 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
737 | ``0x10000`` | 64KiB | Trampolines | Up to 2048 unique syscall entry p oints. |
738 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
739 | ``0x20000`` | ~1GiB | Untrusted Sandbox | Contains untrusted code, followed by its heap/stack/memory. |
740 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
741 | ``0x40000000`` | 8KiB | Top Guard | Keeps positive-displacement *load * or *store* from escaping. |
742 +----------------+-------+-------------------+---------------------------------- ----------------------------------+
743
744 Within the trampolines, the untrusted program can call any address
745 that's ``0 mod 16``. However, only even slots are used, so useful
746 trampolines are always ``0 mod 32``. If the program calls an odd slot,
747 it will fault, and the trusted runtime will shut it down.
748
749 .. Note::
750 :class: note
751
752 This is a bit of speculative flexibility. While the current bundle
753 size of Native Client on ARM is 16 bytes, we've considered the
754 possibility of optional 32-byte bundles, to enable certain compiler
755 improvements. While this option isn't available to untrusted programs
756 today, we're trying to keep the system "32-byte clean".
757
758 Inside a Trampoline
759 ^^^^^^^^^^^^^^^^^^^
760
761 When we introduced trampolines, we mentioned that they can do things
762 that untrusted programs can't. To be more specific, trampolines can jump
763 to locations outside the sandbox. On ARM, this is all they do. Here's a
764 typical trampoline fragment on ARM:
765
766 .. naclcode::
767
768 ; Even trampoline bundle:
769 push { r0-r3 } ; Save arguments that may be in registers.
770 push { lr } ; Save the untrusted return address.
771 ; (This is a separate step because it must be on top.)
772 ldr r0, [pc, #4] ; Load the destination address from the next bundle.
773 blx r0 ; Go!
774
775 ; The odd trampoline that immediately follows:
776 bkpt 0x5be0 ; Prevent entry to this data bundle.
777 .word address_of_routine
778
779 The only odd thing here is that we push the incoming value of ``lr``,
780 and then use ``blx``--not ``bx``---to escape the sandbox. This is
781 because, in practice, all trampolines jump to the same routine in the
782 trusted runtime, called the syscall hook. It uses the return address
783 produced by the final ``blx`` instruction to determine which trampoline
784 was called.
785
786 Loose Ends
787 ----------
788
789 Forbidden Instructions
790 ^^^^^^^^^^^^^^^^^^^^^^
791
792 To complete the sandbox, the validator ensures that the program does not
793 try to use certain forbidden instructions.
794
795 * We forbid instructions that directly interact with the operating
796 system by going around the trusted runtime. We prevent this to limit
797 the functionality of the untrusted program, and to ensure portability
798 across operating systems.
799 * We forbid instructions that change the processor's execution mode to
800 Thumb, ThumbEE, or Jazelle. This would cause the code to be
801 interpreted differently than the validator's original 32-bit ARM
802 disassembly, so the validator results might be invalidated.
803 * We forbid instructions that aren't available to user code (i.e. have
804 to be used by an operating system kernel). This is purely out of
805 paranoia, because the hardware should prevent the instructions from
806 working. Essentially, we consider it "suspicious" if a program
807 contains these instructions---it might be trying to exploit a hardware
808 bug.
809 * We forbid instructions, or variants of instructions, that are
810 implementation-defined ("unpredictable") or deprecated in the ARMv7-A
811 architecture manual.
812 * Finally, we forbid a small number of instructions, such as ``setend``,
813 purely out of paranoia. It's easier to loosen the validator's
814 restrictions than to tighten them, so we err on the side of rejecting
815 safe instructions.
816
817 If an instruction can't be decoded at all within the ARMv7-A instruction
818 set specification, it is forbidden.
819
820 .. Note::
821 :class: note
822
823 Here is a list of instructions currently forbidden for security
824 reasons (that is, excluding deprecated or undefined instructions):
825
826 * ``BLX`` (immediate): always changes to Thumb mode.
827 * ``BXJ``: always changes to Jazelle mode.
828 * ``CPS``: not available to user code.
829 * ``LDM``, exception return version: not available to user code.
830 * ``LDM``, kernel version: not available to user code.
831 * ``LDR*T`` (unprivileged load operations): theoretically harmless,
832 but suspicious when found in user code. Use ``LDR`` instead.
833 * ``MSR``, kernel version: not available to user code.
834 * ``RFE``: not available to user code.
835 * ``SETEND``: theoretically harmless, but suspicious when found in
836 user code. May make some future validator extensions difficult.
837 * ``SMC``: not available to user code.
838 * ``SRS``: not available to user code.
839 * ``STM``, kernel version: not available to user code.
840 * ``STR*T`` (unprivileged store operations): theoretically harmless,
841 but suspicious when found in user code. Use ``STR`` instead.
842 * ``SVC``/``SWI``: allows direct operating system interaction.
843 * Any unassigned hint instruction: difficult to reason about, so
844 treated as suspicious.
845
846 More details are available in the `ARMv7 instruction table definition
847 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trus ted/validator_arm/armv7.table>`_.
848
849 Coprocessors
850 ^^^^^^^^^^^^
851
852 ARM has traditionally added new instruction set features through
853 coprocessors. Coprocessors are accessed through a small set of
854 instructions, and often have their own register files. Floating point
855 and the NEON vector extensions are both implemented as coprocessors, as
856 is the MMU.
857
858 We're confident that the side-effects of coprocessors in slots 10 and 11
859 (that is, floating point, NEON, etc.) are well-understood. These are in
860 the coprocessor space reserved by ARM Ltd. for their own extensions
861 (``CP8``--\ ``CP15``), and are unlikely to change significantly. So, we
862 allow untrusted code to use coprocessors 10 and 11, and we mandate the
863 presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor
864 Extension, VFPv4, FP16 and other extensions are allowed but not
865 required, and may fail on processors that do not support them, it is
866 therefore the program's responsibility to validate their availability
867 before executing them.
868
869 We don't allow access to any other ARM-reserved coprocessor
870 (``CP8``--\ ``CP9`` or ``CP12``--\ ``CP15``). It's possible that read
871 access to ``CP15`` might be useful, and we might allow it in the
872 future---but again, it's easier to loosen the restrictions than tighten
873 them, so we ban it for now.
874
875 We do not, and probably never will, allow access to the vendor-specific
876 coprocessor space, ``CP0``--\ ``CP7``. We're simply not confident in our
877 ability to model the operations on these coprocessors, given that
878 vendors often leave them poorly-specified. Unfortunately this eliminates
879 some legacy floating point and vector implementations, but these are
880 superceded on ARMv7-A parts anyway.
881
882 Validator Code
883 ^^^^^^^^^^^^^^
884
885 By now you're itching to see the sandbox validator's code and dissect
886 it. You'll have a disapointing read: at less that 500 lines of code
887 `validator.cc
888 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/validator.cc>`_
889 is quite simple to understand and much shorter than this document. It's
890 of course dependent on the `ARMv7 instruction table definition
891 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/armv7.table>`_,
892 which teaches it about the ARMv7 instruction set.
OLDNEW
« no previous file with comments | « native_client_sdk/src/doc/_book.yaml ('k') | native_client_sdk/src/doc/reference/sandbox_internals/index.rst » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698