Chromium Code Reviews| OLD | NEW |
|---|---|
| (Empty) | |
| 1 ================== | |
| 2 ARM 32-bit Sandbox | |
| 3 ================== | |
| 4 | |
| 5 Native Client for ARM is a method for running programs---even malicious | |
|
Andy
2014/02/07 19:41:13
"method" --> "sandboxing technology"
JF
2014/02/07 21:15:14
Done.
| |
| 6 ones---safely, on computers that use 32-bit ARM processors. It's an | |
|
Andy
2014/02/07 19:41:13
"It's" --> "The ARM sandbox is"
JF
2014/02/07 21:15:14
Done.
| |
| 7 extension of earlier work on Native Client for x86 processors. This | |
|
Andy
2014/02/07 19:41:13
"This security" --> "Security"
JF
2014/02/07 21:15:14
Done.
| |
| 8 security is provided with a low performance overhead of about 10% over | |
| 9 regular ARM code, and as you'll see in this document the sandbox model | |
| 10 is beautifully simple, meaning that the trusted codebase is much easier | |
| 11 to validate. | |
| 12 | |
| 13 As an implementation detail, the Native Client 32-bit ARM sandbox is | |
| 14 currently used by Portable Native Client to execute code on 32-bit ARM | |
| 15 machines in a safe manner. The portable bitcode contained in a **pexe** | |
| 16 is translated to a 32-bit ARM **nexe** before execution. This may change | |
| 17 at a point in time: Portable Native Client doesn't necessarily need this | |
| 18 sandbox to execute code on ARM. Note that the Portable Native Client | |
| 19 compiler itself is also untrusted: it too runs in the ARM sandbox | |
| 20 described in this document. | |
| 21 | |
| 22 On this page, we describe how Native Client works on 32-bit ARM. We | |
| 23 assume no prior knowledge about the internals of Native Client, on x86 | |
| 24 or any other architecture, but we do assume some familiarity with | |
| 25 assembly languages in general. | |
| 26 | |
| 27 .. contents:: | |
| 28 :local: | |
| 29 :backlinks: none | |
| 30 :depth: 3 | |
| 31 | |
| 32 An Introduction to the ARM Architecture | |
| 33 ======================================= | |
| 34 | |
| 35 In this section, we summarize the relevant parts of the ARM processor | |
| 36 architecture. | |
| 37 | |
| 38 About ARM and ARMv7-A | |
| 39 --------------------- | |
| 40 | |
| 41 ARM is one of the older commercial "RISC" processor designs, dating back | |
| 42 to the early 1980s. Today, it is used primarily in embedded systems: | |
| 43 everything from toys, to home automation, to automobiles. However, its | |
| 44 most visible use is in cellular phones, tablets and some | |
| 45 laptops. | |
| 46 | |
| 47 Through the years, there have been many revisions of the ARM | |
| 48 architecture, written as ARMv\ *X* for some version *X*. Native Client | |
| 49 specifically targets the ARMv7-A architecture commonly used in high-end | |
| 50 phones and smartbooks. This revision, defined in the mid-2000s, adds a | |
| 51 number of useful instructions, and specifies some portions of the system | |
| 52 that used to be left to individual chip manufacturers. Critically, | |
| 53 ARMv7-A specifies the "eXecute Never" bit, or *XN*. This pagetable | |
| 54 attribute lets us mark memory as non-executable. Our security relies on | |
| 55 the presence of this feature. | |
| 56 | |
| 57 ARMv8 adds a new 64-bit instruction set architecture called A64, while | |
| 58 also enhancing the 32-bit A32 ISA. For Native Client's purposes the A32 | |
| 59 ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new | |
| 60 instructions. This document only discussed the 32-bit A32 instruction | |
| 61 set: A64 would require a different sandboxing model. | |
| 62 | |
| 63 ARM Programmer's Model | |
| 64 ---------------------- | |
| 65 | |
| 66 While modern ARM chips support several instruction encodings, 32-bit | |
| 67 Native Client on ARM focuses on a single one: a fixed-width encoding | |
| 68 where every instruction is 32-bits wide called A32 (previously, and | |
| 69 confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called | |
| 70 T32), Jazelle, ThumbEE and such aren't supported by Native Client. This | |
| 71 dramatically simplifies some of our analyses, as we'll see later. Nearly | |
| 72 every instruction can be conditionally executed based on the contents of | |
| 73 a dedicated condition code register. | |
| 74 | |
| 75 ARM processors have 16 general-purpose registers used for integer and | |
| 76 memory operations, written ``r0`` through ``r15``. Of these, two have | |
| 77 special roles baked in to the hardware: | |
| 78 | |
| 79 * ``r14`` is the Link Register. The ARM *call* instruction | |
| 80 (*branch-with-link*) doesn't use the stack directly. Instead, it | |
| 81 stashes the return address in ``r14``. In other circumstances, ``r14`` | |
| 82 can be (and is!) used as a general-purpose register. When ``r14`` is | |
| 83 playing its Link Register role, it's referred to as ``lr``. | |
| 84 * ``r15`` is the Program Counter. While it can be read and written like | |
| 85 any other register, setting it to a new value will cause execution to | |
| 86 jump to a new address. Using it in some circumstances is also | |
| 87 undefined by the ARM architecture. Because of this, ``r15`` is never | |
| 88 used for anything else, and is referred to as ``pc``. | |
| 89 | |
| 90 Other registers are given roles by convention. The only important | |
| 91 registers to Native Client are ``r9`` and ``r13``, which are used as the | |
| 92 Thread Pointer location and Stack Pointer. When playing this role, | |
| 93 they're referred to as ``tp`` and ``sp``. | |
| 94 | |
| 95 Like other RISC-inspired designs, ARM programs use explicit *load* and | |
| 96 *store* instructions to access memory. All other instructions operate | |
| 97 only on registers, or on registers and small constants called | |
| 98 immediates. Because both instructions and data words are 32-bits, we | |
| 99 can't simply embed a 32-bit number into an instruction. ARM programs use | |
| 100 three methods to work around this, all of which Native Client exploits: | |
| 101 | |
| 102 1. Many instructions can encode a modified immediate, which is an 8-bit | |
| 103 number rotated right by an even number of bits. | |
| 104 2. The ``movw`` and ``movt`` instructions can be used to set the top and | |
| 105 bottom 16-bits of a register, and can therefore encode any 32-bit | |
| 106 immediate. | |
| 107 3. For values that can't be represented as modified immediates, ARM | |
| 108 programs use ``pc``-relative loads to load data from inside the | |
| 109 code---hidden in a place where it won't be executed such as "constant | |
| 110 pools", just past the final return of a function. | |
| 111 | |
| 112 We'll introduce more details of the ARM instruction set later, as we | |
| 113 walk through the system. | |
| 114 | |
| 115 The Native Client Approach | |
| 116 ========================== | |
| 117 | |
| 118 Native Client runs an untrusted program, potentially from an unknown or | |
| 119 malicious source, inside a sandbox created by a trusted runtime. The | |
| 120 trusted runtime allows the untrusted program to "call-out" and perform | |
| 121 certain actions, such as drawing graphics, but prevents it from | |
| 122 accessing the operating system directly. This "call-out" facility, | |
| 123 called a trampoline, looks like a standard function call to the | |
| 124 untrusted program, but it allows control to escape from the sandbox in a | |
| 125 controlled way. | |
| 126 | |
| 127 The untrusted program and trusted runtime inhabit the same process, or | |
| 128 virtual address space, maintained by the operating system. To keep the | |
| 129 trusted runtime behaving the way we expect, we must prevent the | |
| 130 untrusted program from accessing and modifying its internals. Since they | |
| 131 share a virtual address space, we can't rely on the operating system for | |
| 132 this. Instead, we isolate the untrusted program from the trusted | |
| 133 runtime. | |
| 134 | |
| 135 Unlike modern operating systems, we use a cooperative isolation | |
| 136 method. Native Client can't run any off-the-shelf program compiled for | |
| 137 an off-the-shelf operating system. The program must be compiled to | |
| 138 comply with Native Client's rules. The details vary on each platform, | |
| 139 but in general, the untrusted program: | |
| 140 | |
| 141 * Must not attempt to use certain forbidden instructions, such as system | |
| 142 calls. | |
| 143 * Must not attempt to modify its own code without abiding by Native | |
| 144 Client's code modification rules. | |
| 145 * Must not jump into the middle of an instruction group, or otherwise do | |
| 146 tricky things to cause instructions to be interpreted multiple ways. | |
| 147 * Must use special, strictly-defined instruction sequences to perform | |
| 148 permitted but potentially dangerous actions. We call these sequences | |
| 149 pseudo-instructions. | |
| 150 | |
| 151 We can't simply take the program's word that it complies with these | |
| 152 rules---we call it "untrusted" for a reason! Nor do we require it to be | |
| 153 produced by a special compiler; in practice, we don't trust our | |
| 154 compilers either. Instead, we apply a load-time validator that | |
| 155 disassembles the program. The validator either proves that the program | |
| 156 complies with our rules, or rejects it as unsafe. By keeping the rules | |
| 157 simple, we keep the validator simple, small, and fast. We like to put | |
| 158 our trust in small, simple things, and the validator is key to the | |
| 159 system's security. | |
| 160 | |
| 161 .. Note:: | |
| 162 :class: note | |
| 163 | |
| 164 For the computationally-inclined, all our validators scale linearly in | |
| 165 the size of the program. | |
| 166 | |
| 167 NaCl/ARM: Pure Software Fault Isolation | |
| 168 --------------------------------------- | |
| 169 | |
| 170 In the original Native Client system for the x86, we used unusual | |
| 171 hardware features of that processor (the segment registers) to isolate | |
| 172 untrusted programs. This was simple and fast, but won't work on ARM, | |
| 173 which has nothing equivalent. Instead, we use pure software fault | |
| 174 isolation. | |
| 175 | |
| 176 We use a fixed address space layout: the untrusted program gets the | |
| 177 lowest gigabyte, addresses ``0`` through ``0x3FFFFFFF``. The rest of the | |
| 178 address space holds the trusted runtime and the operating system. We | |
| 179 isolate the program by requiring every *load*, *store*, and *indirect | |
| 180 branch* (to an address in a register) to use a pseudo-instruction. The | |
| 181 pseudo-instructions ensure that the address stays within the | |
| 182 sandbox. The *indirect branch* pseudo-instruction, in turn, ensures that | |
| 183 such branches won't split up other pseudo-instructions. | |
| 184 | |
| 185 At either side of the sandbox, we place small (8KiB) guard | |
| 186 regions. These are simply areas in the process's address space that are | |
| 187 mapped without read, write, or execute permissions, so any attempt to | |
| 188 access them for any reason---*load*, *store*, or *jump*---will cause a | |
| 189 fault. | |
| 190 | |
| 191 Finally, we ban the use of certain instructions, notably direct system | |
| 192 calls. This is to ensure that the untrusted program can be run on any | |
| 193 operating system supported by Native Client, and to prevent access to | |
| 194 certain system features that might be used to subvert the sandbox. As a | |
| 195 side effect, it helps to prevent programs from exploiting buggy | |
| 196 operating system APIs. | |
| 197 | |
| 198 Let's walk through the details, starting with the simplest part: *load* | |
| 199 and *store*. | |
| 200 | |
| 201 *Load* and *Store* | |
| 202 ^^^^^^^^^^^^^^^^^^ | |
| 203 | |
| 204 All access to memory must be through *load* and *store* | |
| 205 pseudo-instructions. These are simply a native *load* or *store* | |
| 206 instruction, preceded by a guard instruction. | |
| 207 | |
| 208 Each *load* or *store* pseudo-instruction is similar to the *load* shown | |
| 209 below. We use abstract "placeholder" registers instead of specific | |
| 210 numbered registers for the sake of discussion. ``rA`` is the register | |
| 211 holding the address to load from. ``rD`` is the destination for the | |
| 212 loaded data. | |
| 213 | |
| 214 .. naclcode:: | |
| 215 :prettyprint: 0 | |
| 216 | |
| 217 bic rA, #0xC0000000 | |
| 218 ldr rD, [rA] | |
| 219 | |
| 220 The first instruction, ``bic``, clears the top two bits of ``rA``. In | |
| 221 this case, that means that the value in ``rA`` is forced to an address | |
| 222 inside our sandbox, between ``0`` and ``0x3FFFFFFF``, inclusive. | |
| 223 | |
| 224 The second instruction, ``ldr``, uses the previously-sandboxed address | |
| 225 to load a value. This address might not be the address that the program | |
| 226 intended, and might cause an access to an unmapped memory location | |
| 227 within the sandbox: ``bic`` forces the address to be valid, by clearing | |
| 228 the top two bits. This is a no-op in a correct program. | |
| 229 | |
| 230 This illustrates a common property of all Native Client systems: we aim | |
| 231 for safety, not correctness. A program using an invalid address in | |
| 232 ``rA`` here is simply broken, so we are free to do whatever we want to | |
| 233 preserve safety. In this case the program might load an invalid (but | |
| 234 safe) value, or cause a segmentation fault limited to the untrusted | |
| 235 code. | |
| 236 | |
| 237 Now, if we allowed arbitrary branches within the program, a malicious | |
| 238 program could set up carefully-crafted values in ``rA``, and then jump | |
| 239 straight to the ``ldr``. This is why we validate that programs never | |
| 240 split pseudo-instructions. | |
| 241 | |
| 242 Alternative Sandboxing | |
| 243 """""""""""""""""""""" | |
| 244 | |
| 245 .. naclcode:: | |
| 246 :prettyprint: 0 | |
| 247 | |
| 248 tst rA, #0xC0000000 | |
| 249 ldreq rD, [rA] | |
| 250 | |
| 251 The first instruction, ``tst``, performs a bitwise-\ ``AND`` of ``rA`` | |
| 252 and the modified immediate literal, ``0xC0000000``. It sets the | |
| 253 condition flags based on the result, but does not write the result to a | |
| 254 register. In particular, it sets the ``Z`` condition flag if the result | |
| 255 was zero---if the two values had no set bits in common. In this case, | |
| 256 that means that the value in ``rA`` was an address inside our sandbox, | |
| 257 between ``0`` and ``0x3FFFFFFF``, inclusive. | |
| 258 | |
| 259 The second instruction, ``ldreq``, is a conditional load if equal. As we | |
| 260 mentioned before, nearly all ARM instructions can be made | |
| 261 conditional. In assembly language, we simply stick the desired condition | |
| 262 on the end of the instruction's mnemonic name. Here, the condition is | |
| 263 ``EQ``, which causes the instruction to execute only if the ``Z`` flag | |
| 264 is set. | |
| 265 | |
| 266 Thus, when the pseudo-instruction executes, the ``tst`` sets ``Z`` if | |
| 267 (and only if) the value in ``rA`` is an address within the bounds of the | |
| 268 sandbox, and then the ``ldreq`` loads if (and only if) it was. If ``rA`` | |
| 269 held an invalid address, the *load* does not execute, and ``rD`` is | |
| 270 unchanged. | |
| 271 | |
| 272 .. Note:: | |
| 273 :class: note | |
| 274 | |
| 275 The ``tst``-based sequence is faster than the ``bic``-based sequence | |
| 276 on modern ARM chips. It avoids a data dependency in the address | |
| 277 register. This is why we keep both around. The ``tst``-based sequence | |
| 278 unfortunately leaks information on some processors, and is therefore | |
| 279 forbidden on certain processors. This effectively means that it cannot | |
| 280 be used for regular Native Client **nexe** files, but can be used with | |
| 281 Portable Native Client because the target processor is known at | |
| 282 translation time from **pexe** to **nexe**. | |
| 283 | |
| 284 Addressing Modes | |
| 285 """""""""""""""" | |
| 286 | |
| 287 ARM has an unusually rich set of addressing modes. We allow all but one: | |
| 288 register-indexed, where two registers are added to determine the | |
| 289 address. | |
| 290 | |
| 291 We permit simple *load* and *store*, as shown above. We also permit | |
| 292 displacement, pre-index, and post-index memory operations: | |
| 293 | |
| 294 .. naclcode:: | |
| 295 :prettyprint: 0 | |
| 296 | |
| 297 bic rA, #0xC0000000 | |
| 298 ldr rD, [rA, #1234] ; This is fine. | |
| 299 bic rA, #0xC0000000 | |
| 300 ldr rD, [rA, #1234]! ; Also fine. | |
| 301 bic rA, #0xC0000000 | |
| 302 ldr rD, [rA], #1234 ; Looking good. | |
| 303 | |
| 304 In each case, we know ``rA`` points into the sandbox when the ``ldr`` | |
| 305 executes. We allow adding an immediate displacement to ``rA`` to | |
| 306 determine the final address (as in the first two examples here) because | |
| 307 the largest immediate displacement is ±4095 bytes, while our guard pages | |
| 308 are 8192 bytes wide. | |
| 309 | |
| 310 We also allow ARM's more unusual *load* and *store* instructions, such | |
| 311 as *load-multiple* and *store-multiple*, etc. | |
| 312 | |
| 313 Conditional *Load* and *Store* | |
| 314 """""""""""""""""""""""""""""" | |
| 315 | |
| 316 There's one problem with the pseudo-instructions shown above: they are | |
| 317 unconditional (assuming ``rA`` is valid). ARM compilers regularly use | |
| 318 conditional *load* and *store*, so we should support this in Native | |
| 319 Client. We do so by defining alternate, predictable | |
| 320 pseudo-instructions. Here is a conditional *store* | |
| 321 (*store-if-greater-than*) using this pseudo-instruction sequence: | |
| 322 | |
| 323 .. naclcode:: | |
| 324 :prettyprint: 0 | |
| 325 | |
| 326 bicgt rA, #0xC0000000 | |
| 327 strgt rX, [rA, #123] | |
| 328 | |
| 329 The Stack Pointer, Thread Pointer, and Program Counter | |
| 330 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| 331 | |
| 332 Stack Pointer | |
| 333 """"""""""""" | |
| 334 | |
| 335 In C-like languages, the stack is used to store return addresses during | |
| 336 function calls, as well as any local variables that won't fit in | |
| 337 registers. This makes stack operations very common. | |
| 338 | |
| 339 Native Client does not require guard instructions on any *load* or | |
| 340 *store* involving the stack pointer, ``sp``. This improves performance | |
| 341 and reduces code size. However, ARM's stack pointer isn't special: it's | |
| 342 just another register, called ``sp`` only by convention. To make it safe | |
| 343 to use this register as a *load* or *store* address without guards, we | |
| 344 add a rule: ``sp`` must always contain a valid address. | |
| 345 | |
| 346 We enforce this rule by restricting the sorts of operations that | |
| 347 programs can use to alter ``sp``. Programs can alter ``sp`` by adding or | |
| 348 subtracting an immediate, as a side-effect of a *load* or *store*: | |
| 349 | |
| 350 .. naclcode:: | |
| 351 :prettyprint: 0 | |
| 352 | |
| 353 ldr rX, [sp], #4! ; Load from stack, then add 4 to sp. | |
| 354 str rX, [sp, #1234]! ; Add 1234 to sp, then store to stack. | |
| 355 | |
| 356 These are safe because, as we mentioned before, the largest immediate | |
| 357 available in a *load* or *store* is ±4095. Even after adding or | |
| 358 subtracting 4095, the stack pointer will still be within the sandbox or | |
| 359 guard regions. | |
| 360 | |
| 361 Any other operation that alters ``sp`` must be followed by a guard | |
| 362 instruction. The most common alterations, in practice, are addition and | |
| 363 subtraction of arbitrary integers: | |
| 364 | |
| 365 .. naclcode:: | |
| 366 :prettyprint: 0 | |
| 367 | |
| 368 add sp, rX | |
| 369 bic sp, #0xC0000000 | |
| 370 | |
| 371 The ``bic`` is similar to the one we used for conditional *load* and | |
| 372 *store*, and serves exactly the same purpose: after it completes, ``sp`` | |
| 373 is a valid address. | |
| 374 | |
| 375 .. Note:: | |
| 376 :class: note | |
| 377 | |
| 378 Clever assembly programmers and compilers may want to use this | |
| 379 "trusted" property of ``sp`` to emit more efficient code: in a hot | |
| 380 loop instead of using ``sp`` as a stack pointer it can be temporarily | |
| 381 used as an index pointer (e.g. to traverse an array). This avoids the | |
| 382 extra ``bic`` whenever the pointer is updated in the loop. | |
| 383 | |
| 384 Thread Pointer Loads | |
| 385 """""""""""""""""""" | |
| 386 | |
| 387 The thread pointer and IRT thread pointer are stored in the trusted | |
| 388 address space. All uses and definitions of ``r9`` from untrusted code | |
| 389 are forbidden except as follows: | |
| 390 | |
| 391 .. naclcode:: | |
| 392 :prettyprint: 0 | |
| 393 | |
| 394 ldr Rn, [r9] ; Load user thread pointer. | |
| 395 ldr Rn, [r9, #4] ; Load IRT thread pointer. | |
| 396 | |
| 397 ``pc``-relative Loads | |
| 398 """"""""""""""""""""" | |
| 399 | |
| 400 By extension, we also allow *load* through the ``pc`` without a | |
| 401 mask. The explanation is quite similar: | |
| 402 | |
| 403 * Our control-flow isolation rules mean that the ``pc`` will always | |
| 404 point into the sandbox. | |
| 405 * The maximum immediate displacement that can be used in a | |
| 406 ``pc``-relative *load* is smaller than the width of the guard pages. | |
| 407 | |
| 408 We do not allow ``pc``-relative stores, because they look suspiciously | |
| 409 like self-modifying code, or any addressing mode that would alter the | |
| 410 ``pc`` as a side effect of the *load*. | |
| 411 | |
| 412 *Indirect Branch* | |
| 413 ^^^^^^^^^^^^^^^^^ | |
| 414 | |
| 415 There are two types of control flow on ARM: direct and indirect. Direct | |
| 416 control flow instructions have an embedded target address or | |
| 417 offset. Indirect control flow instructions take their destination | |
| 418 address from a register. The ``b`` (branch) and ``bl`` | |
| 419 (*branch-with-link*) instructions are *direct branch* and *call*, | |
| 420 respectively. The ``bx`` (*branch-exchange*) and ``blx`` | |
| 421 (*branch-with-link-exchange*) are the indirect equivalents. | |
| 422 | |
| 423 Because the program counter ``pc`` is simply another register, ARM also | |
| 424 has many implicit indirect control flow instructions. Programs can | |
| 425 operate on the ``pc`` using *add* or *load*, or even outlandish (and | |
| 426 often specified as having unpredictable-behavior) things like multiply! | |
| 427 In Native Client we ban all such instructions. Indirect control flow is | |
| 428 exclusively through ``bx`` and ``blx``. Because all of ARM's control | |
| 429 flow instructions are called *branch* instructions, we'll use the term | |
| 430 *indirect branch* from here on, even though this includes things like | |
| 431 *virtual call*, *return*, and the like. | |
| 432 | |
| 433 The Trouble with Indirection | |
| 434 """""""""""""""""""""""""""" | |
| 435 | |
| 436 *Indirect branch* present two problems for Native Client: | |
| 437 | |
| 438 * We must ensure that they don't send execution outside the sandbox. | |
| 439 * We must ensure that they don't break up the instructions inside a | |
| 440 pseudo-instruction, by landing on the second one. | |
| 441 | |
| 442 .. Note:: | |
| 443 :class: note | |
| 444 | |
| 445 On the x86 architectures we must also ensure that it doesn't land | |
| 446 inside an instruction. This is unnecessary on ARM, where all | |
| 447 instructions are 32-bit wide. | |
| 448 | |
| 449 Checking both of these for *direct branch* is easy: the validator just | |
| 450 pulls the (fixed) target address out of the instruction and checks what | |
| 451 it points to. | |
| 452 | |
| 453 The Native Client Solution: "Bundles" | |
| 454 """"""""""""""""""""""""""""""""""""" | |
| 455 | |
| 456 For *indirect branch*, we can address the first problem by simply | |
| 457 masking some high-order bits off the address, like we did for *load* and | |
| 458 *store*. The second problem is more subtle. Detecting every possible | |
| 459 route that every *indirect branch* might take is difficult. Instead, we | |
| 460 take the approach pioneered by the original Native Client: we restrict | |
| 461 the possible places that any *indirect branch* can land. On Native | |
| 462 Client for ARM, *indirect branch* can target any address that has its | |
| 463 bottom four bits clear---any address that's ``0 mod 16``. We call these | |
| 464 16-byte chunks of code "bundles". The validator makes sure that no | |
| 465 pseudo-instruction straddles a bundle boundary. Compilers must pad with` | |
| 466 `nop``\ s to ensure that every pseudo-instruction fits entirely inside | |
| 467 one bundle. | |
| 468 | |
| 469 Here is the *indirect branch* pseudo-instruction. As you can see, it | |
| 470 clears the top two and bottom four bits of the address: | |
| 471 | |
| 472 .. naclcode:: | |
| 473 :prettyprint: 0 | |
| 474 | |
| 475 bic rA, #0xC000000F | |
| 476 bx rA | |
| 477 | |
| 478 This particular pseudo-instruction (a ``bic`` followed by a ``bx``) is | |
| 479 used for computed jumps in switch tables and returning from functions, | |
| 480 among other uses. Recall that, under ARM's modified immediate rules, we | |
| 481 can fit the constant ``0xC000000F`` into the ``bic`` instruction's | |
| 482 immediate field: ``0xC000000F`` is the 8-bit constant ``0xFC``, rotated | |
| 483 right by 4 bits. | |
| 484 | |
| 485 The other useful variant is the *indirect branch-with-link*, which is | |
| 486 the ARM equivalent to *call*: | |
| 487 | |
| 488 .. naclcode:: | |
| 489 :prettyprint: 0 | |
| 490 | |
| 491 bic rA, #0xC000000F | |
| 492 blx rA | |
| 493 | |
| 494 This is used for indirect function calls---commonly seen in C++ programs | |
| 495 as virtual calls, but also for calling function pointers in C. | |
| 496 | |
| 497 Note that both *indirect branch* pseudo-instructions use ``bic``, rather | |
| 498 than the ``tst`` instruction we allow for *load* and *store*. There are | |
| 499 two reasons for this: | |
| 500 | |
| 501 1. Conditional *branch* is very common. Much more common than | |
| 502 conditional *load* and *store*. If we supported an alternative | |
| 503 ``tst``-based sequence for *branch*, it would be rare. | |
| 504 2. There's no performance benefit to using ``tst`` here on modern ARM | |
| 505 chips. *Branch* consumes its operands later in the pipeline than | |
| 506 *load* and *store* (since they don't have to generate an address, | |
| 507 etc) so this sequence doesn't stall. | |
| 508 | |
| 509 .. Note:: | |
| 510 :class: note | |
| 511 | |
| 512 At this point astute readers are wondering what the ``x`` in ``bx`` | |
| 513 and ``blx`` means. We told you it stood for "exchange", but exchange | |
| 514 to what? ARM, for all the reduced-ness of its instruction set, can | |
| 515 change execution mode from A32 (ARM) to T32 (Thumb) and back with | |
| 516 these *branch* instructions, called *interworking branch*. Recall that | |
| 517 A32 instructions are 32-bit wide, and T32 instructions are a mix of | |
| 518 both 16-bit or 32-bit wide. The destination address given to a | |
| 519 *branch* therefore cannot sensibly have its bottom bit set in either | |
| 520 instruction set: that would be an unaligned instruction in both cases, | |
| 521 and ARM simply doesn't support this. The bottom bit for the *indirect | |
| 522 branch* was therefore cleverly recycled by the ARM architecture to | |
| 523 mean "switch to T32 mode" when set! | |
| 524 | |
| 525 As you've figured out by now, Native Client's sandbox won't be very | |
| 526 happy if A32 instructions were to be executed as T32 instructions: who | |
| 527 know what they correspond to? A malicious person could craft valid | |
| 528 A32 code that's actually very naughty T32 code, somewhat like forming | |
| 529 a sentence that happens to be valid in English and French but with | |
| 530 completely different meanings, complimenting the reader in one | |
| 531 language and insulting them in the other. | |
| 532 | |
| 533 You've figured out by now that the bundle alignment restrictions of | |
| 534 the Native Client sandbox already take care of making this travesty | |
| 535 impossible: by masking off the bottom 4 bits of the destination the | |
| 536 interworking nature of ARM's *indirect branch* is completely avoided. | |
| 537 | |
| 538 *Call* and *Return* | |
| 539 """"""""""""""""""" | |
| 540 | |
| 541 On ARM, there is no *call* or *return* instruction. A *call* is simply a | |
| 542 *branch* that just happen to load a return address into ``lr``, the link | |
| 543 register. If the called function is a leaf (that is, if it calls no | |
| 544 other functions before returning), it simply branches to the address | |
| 545 stored in ``lr`` to *return* to its caller: | |
| 546 | |
| 547 .. naclcode:: | |
| 548 :prettyprint: 0 | |
| 549 | |
| 550 bic lr, #0xC000000F | |
| 551 bx lr | |
| 552 | |
| 553 If the function called other functions, however, it had to spill ``lr`` | |
| 554 onto the stack. On x86, this is done implicitly, but it is explicit on | |
| 555 ARM: | |
| 556 | |
| 557 .. naclcode:: | |
| 558 :prettyprint: 0 | |
| 559 | |
| 560 push { lr } | |
| 561 ; Some code here... | |
| 562 pop { lr } | |
| 563 bic lr, #0xC000000F | |
| 564 bx lr | |
| 565 | |
| 566 There are two things to note about this code. | |
| 567 | |
| 568 1. As we mentioned before, we don't allow arbitrary instructions to | |
| 569 write to the Program Counter, ``pc``. Thus, while a traditional ARM | |
| 570 program might have popped directly into ``pc`` to end the function, | |
| 571 we require a pop into a register, followed by a pseudo-instruction. | |
| 572 2. Function returns really are just *indirect branch*, with the same | |
| 573 restrictions. This means that functions can only return to addresses | |
| 574 that are bundle-aligned: ``0 mod 16``. | |
| 575 | |
| 576 The implication here is that a *call*\ ---the *branch* that enters | |
| 577 functions---must be placed at the end of the bundle, so that the return | |
| 578 address they generate is ``0 mod 16``. Otherwise, when we clear the | |
| 579 bottom four bits, the program would enter an infinite loop! (Native | |
| 580 Client doesn't try to prevent infinite loops, but the validator actually | |
| 581 does check the alignment of calls. This is because, when we were writing | |
| 582 the compiler, it was annoying to find out our calls were in the wrong | |
| 583 place by having the program run forever!) | |
| 584 | |
| 585 .. Note:: | |
| 586 :class: note | |
| 587 | |
| 588 Properly balancing the CPU's *call*/*return* actually allows it to | |
| 589 perform much better by allowing it to speculatively execute the return | |
| 590 address' code. For more information on ARM's *call*/*return* stack see | |
| 591 ARM's technical reference manual. | |
| 592 | |
| 593 Literal Pools and Data Bundles | |
| 594 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| 595 | |
| 596 In the section where we described the ARM architecture, we mentioned | |
| 597 ARM's unusual immediate forms. To restate: | |
| 598 | |
| 599 * ARM instructions are fixed-length, 32-bits, so we can't have an | |
| 600 instruction that includes an arbitrary 32-bit constant. | |
| 601 * Many ARM instructions can include a modified immediate constant, which | |
| 602 is flexible, but limited. | |
| 603 * For any other value (particularly addresses), ARM programs explicitly | |
| 604 load constants from inside the code itself. | |
| 605 | |
| 606 .. Note:: | |
| 607 :class: note | |
| 608 | |
| 609 ARMv7 introduces some instructions, ``movw`` and ``movt``, that try to | |
| 610 address this by letting us directly load larger constants. Our | |
| 611 toolchain uses this capability in some cases. | |
| 612 | |
| 613 Here's a typical example of the use of a literal pool. ARM assemblers | |
| 614 typically hide the details---this is the sort of code you'd see produced | |
| 615 by a disassembler, but with more comments. | |
| 616 | |
| 617 .. naclcode:: | |
| 618 :prettyprint: 0 | |
| 619 | |
| 620 ; C equivalent: "table[3] = 4" | |
| 621 ; 'table' is a static array of bytes. | |
| 622 ldr r0, [pc, #124] ; Load the address of the 'table', | |
| 623 ; "124" is the offset from here | |
| 624 ; to the constant below. | |
| 625 add r0, #3 ; Add the immediate array index. | |
| 626 mov r1, #4 ; Get the constant '4' into a register. | |
| 627 bic r0, #0xC0000000 ; Mask our array address. | |
| 628 strb r1, [r0] ; Store one byte. | |
| 629 ; ... | |
| 630 .word table ; Constant referenced above. | |
| 631 | |
| 632 Because table is a static array, the compiler knew its address at | |
| 633 compile-time---but the address didn't fit in a modified immediate. (Most | |
| 634 don't). So, instead of loading an immediate into ``r0`` with a ``mov``, | |
| 635 we stashed the address in the code, generated its address using ``pc``, | |
| 636 and loaded the constant. ARM compilers will typically group all the | |
| 637 embedded data together into a literal pool. These typically live just | |
| 638 past the end of functions, where they won't be executed. | |
| 639 | |
| 640 This is an important trick in ARM code, so it's important to support it | |
| 641 in Native Client... but there's a potential flaw. If we let programs | |
| 642 contain arbitrary data, mingled in with the code, couldn't they hide | |
| 643 malicious instructions this way? | |
| 644 | |
| 645 The answer is no, because the validator disassembles the entire | |
| 646 executable region of the program, without regard to whether the | |
| 647 programmer said a certain chunk was code or data. But this brings the | |
| 648 opposite problem: what if the program needs to contain a certain | |
| 649 constant that just happens to encode a malicious instruction? We want | |
| 650 to allow this, but we have to be certain it will never be executed as | |
| 651 code! | |
| 652 | |
| 653 Data Bundles to the Rescue | |
| 654 """""""""""""""""""""""""" | |
| 655 | |
| 656 As we discussed in the last section, ARM code in Native Client is | |
| 657 structured in 16-byte bundles. We allow literal pools by putting them in | |
| 658 special bundles, called data bundles. Each data bundle can contain 12 | |
| 659 bytes of arbitrary data, and the program can have as many data bundles | |
| 660 as it likes. | |
| 661 | |
| 662 Each data bundle starts with a breakpoint instruction, ``bkpt``. This | |
| 663 way, if an *indirect branch* tries to enter the data bundle, the process | |
| 664 will take a fault and the trusted runtime will intervene (by terminating | |
| 665 the program). For example: | |
| 666 | |
| 667 .. naclcode:: | |
| 668 :prettyprint: 0 | |
| 669 | |
| 670 bkpt #0x5BE0 ; Must be aligned 0 mod 16! | |
| 671 .word 0xDEADBEEF ; Arbitrary constants are A-OK. | |
| 672 svc #30 ; Trying to make a syscall? OK! | |
| 673 str r0, [r1] ; Unmasked stores are fine too. | |
| 674 | |
| 675 So, we have a way for programs to create an arbitrary, even dangerous, | |
| 676 chunk of data within their code. We can prevent *indirect branch* from | |
| 677 entering it. We can also prevent fall-through from the code just before | |
| 678 it, by the ``bkpt``. But what about *direct branch* straight into the | |
| 679 middle? | |
| 680 | |
| 681 The validator detects all data bundles (because this ``bkpt`` has a | |
| 682 special encoding) and marks them as off-limits for *direct branch*. If | |
| 683 it finds a *direct branch* into a data bundle, the entire program is | |
| 684 rejected as unsafe. Because *direct branch* cannot be modified at | |
| 685 runtime, the data bundles cannot be executed. | |
| 686 | |
| 687 .. Note:: | |
| 688 :class: note | |
| 689 | |
| 690 Clever readers may wonder: why use ``bkpt #0x5BE0``, that seems | |
| 691 awfully specific when you just need a special "roadblock" instruction! | |
| 692 Quite true, young Padawan! It happens that this odd ``bkpt`` | |
| 693 instruction is encoded as ``0xE125BE70`` in A32, and in T32 the | |
| 694 ``bkpt`` instruction is encoded as ``0xBExx`` (where ``xx`` could be | |
| 695 any 8-bit immediate, say ``0x70``) and ``0xE125`` encodes the *branch* | |
| 696 instruction ``b.n #0x250``. The special roadblock instruction | |
| 697 therefore doubles as a roadblock in T32, if anything were to go so | |
| 698 awry that we tried to execute it as a T32 instruction! Much defense, | |
| 699 such depth, wow! | |
| 700 | |
| 701 Trampolines and Memory Layout | |
| 702 ----------------------------- | |
| 703 | |
| 704 So far, the rules we've described make for boring programs: they can't | |
| 705 communicate with the outside world! | |
| 706 | |
| 707 * The program can't call an external library, or the operating system, | |
| 708 even to do something simple like draw some pixels on the screen. | |
| 709 * It also can't read or write memory outside of its dedicated sandbox, | |
| 710 so communicating that way is right out. | |
| 711 | |
| 712 We fix this by allowing the untrusted program to call into the trusted | |
| 713 runtime using a trampoline. A trampoline is simply a short stretch of | |
| 714 code, placed by the trusted runtime at a known location within the | |
| 715 sandbox, that is permitted to do things the untrusted program can't. | |
| 716 | |
| 717 Even though trampolines are inside the sandbox, the untrusted program | |
| 718 can't modify them: the trusted runtime marks them read-only. It also | |
| 719 can't do anything clever with the special instructions inside the | |
| 720 trampoline---for example, call it at a slightly offset address to bypass | |
| 721 some checks---because the validator only allows trampolines to be | |
| 722 reached by *indirect branch* (or *branch-with-link*). We structure the | |
| 723 trampolines carefully so that they're safe to enter at any ``0 mod 16`` | |
| 724 address. | |
| 725 | |
| 726 The validator can detect attempts to use the trampolines because they're | |
| 727 loaded at a fixed location in memory. Let's look at the memory map of | |
| 728 the Native Client sandbox. | |
| 729 | |
| 730 Memory Map | |
| 731 ^^^^^^^^^^ | |
| 732 | |
| 733 The ARM sandbox is always at virtual address ``0``, and is exactly 1GiB | |
| 734 in size. This includes the untrusted program's code and data, the | |
| 735 trampolines, and a small guard region to detect null pointer | |
| 736 dereferences. In practice, the untrusted program takes up a bit more | |
| 737 room than this, because of the need for additional guard regions at | |
| 738 either end of the sandbox. | |
| 739 | |
| 740 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 741 | Address | Size | Name | Purpose | | |
| 742 +================+=======+===================+================================== ==================================+ | |
| 743 | ``-0x2000`` | 8KiB | Bottom Guard | Keeps negative-displacement *load * or *store* from escaping. | | |
| 744 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 745 | ``0`` | 64KiB | Null Guard | Catches null pointer dereferences , guards against kernel exploits. | | |
| 746 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 747 | ``0x10000`` | 64KiB | Trampolines | Up to 2048 unique syscall entry p oints. | | |
| 748 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 749 | ``0x20000`` | ~1GiB | Untrusted Sandbox | Contains untrusted code, followed by its heap/stack/memory. | | |
| 750 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 751 | ``0x40000000`` | 8KiB | Top Guard | Keeps positive-displacement *load * or *store* from escaping. | | |
| 752 +----------------+-------+-------------------+---------------------------------- ----------------------------------+ | |
| 753 | |
| 754 Within the trampolines, the untrusted program can call any address | |
| 755 that's ``0 mod 16``. However, only even slots are used, so useful | |
| 756 trampolines are always ``0 mod 32``. If the program calls an odd slot, | |
| 757 it will fault, and the trusted runtime will shut it down. | |
| 758 | |
| 759 .. Note:: | |
| 760 :class: note | |
| 761 | |
| 762 This is a bit of speculative flexibility. While the current bundle | |
| 763 size of Native Client on ARM is 16 bytes, we've considered the | |
| 764 possibility of optional 32-byte bundles, to enable certain compiler | |
| 765 improvements. While this option isn't available to untrusted programs | |
| 766 today, we're trying to keep the system "32-byte clean". | |
| 767 | |
| 768 Inside a Trampoline | |
| 769 ^^^^^^^^^^^^^^^^^^^ | |
| 770 | |
| 771 When we introduced trampolines, we mentioned that they can do things | |
| 772 that untrusted programs can't. To be more specific, trampolines can jump | |
| 773 to locations outside the sandbox. On ARM, this is all they do. Here's a | |
| 774 typical trampoline fragment on ARM: | |
| 775 | |
| 776 .. naclcode:: | |
| 777 :prettyprint: 0 | |
| 778 | |
| 779 ; Even trampoline bundle: | |
| 780 push { r0-r3 } ; Save arguments that may be in registers. | |
| 781 push { lr } ; Save the untrusted return address, | |
| 782 ; separate step because it must be on top. | |
| 783 ldr r0, [pc, #4] ; Load the destination address from | |
| 784 ; the next bundle. | |
| 785 blx r0 ; Go! | |
| 786 ; The odd trampoline that immediately follows: | |
| 787 bkpt 0x5be0 ; Prevent entry to this data bundle. | |
| 788 .word address_of_routine | |
| 789 | |
| 790 The only odd thing here is that we push the incoming value of ``lr``, | |
| 791 and then use ``blx``---not ``bx``---to escape the sandbox. This is | |
| 792 because, in practice, all trampolines jump to the same routine in the | |
| 793 trusted runtime, called the syscall hook. It uses the return address | |
| 794 produced by the final ``blx`` instruction to determine which trampoline | |
| 795 was called. | |
| 796 | |
| 797 Loose Ends | |
| 798 ---------- | |
| 799 | |
| 800 Forbidden Instructions | |
| 801 ^^^^^^^^^^^^^^^^^^^^^^ | |
| 802 | |
| 803 To complete the sandbox, the validator ensures that the program does not | |
| 804 try to use certain forbidden instructions. | |
| 805 | |
| 806 * We forbid instructions that directly interact with the operating | |
| 807 system by going around the trusted runtime. We prevent this to limit | |
| 808 the functionality of the untrusted program, and to ensure portability | |
| 809 across operating systems. | |
| 810 * We forbid instructions that change the processor's execution mode to | |
| 811 Thumb, ThumbEE, or Jazelle. This would cause the code to be | |
| 812 interpreted differently than the validator's original 32-bit ARM | |
| 813 disassembly, so the validator results might be invalidated. | |
| 814 * We forbid instructions that aren't available to user code (i.e. have | |
| 815 to be used by an operating system kernel). This is purely out of | |
| 816 paranoia, because the hardware should prevent the instructions from | |
| 817 working. Essentially, we consider it "suspicious" if a program | |
| 818 contains these instructions---it might be trying to exploit a hardware | |
| 819 bug. | |
| 820 * We forbid instructions, or variants of instructions, that are | |
| 821 implementation-defined ("unpredictable") or deprecated in the ARMv7-A | |
| 822 architecture manual. | |
| 823 * Finally, we forbid a small number of instructions, such as ``setend``, | |
| 824 purely out of paranoia. It's easier to loosen the validator's | |
| 825 restrictions than to tighten them, so we err on the side of rejecting | |
| 826 safe instructions. | |
| 827 | |
| 828 If an instruction can't be decoded at all within the ARMv7-A instruction | |
| 829 set specification, it is forbidden. | |
| 830 | |
| 831 .. Note:: | |
| 832 :class: note | |
| 833 | |
| 834 Here is a list of instructions currently forbidden for security | |
| 835 reasons (that is, excluding deprecated or undefined instructions): | |
| 836 | |
| 837 * ``BLX`` (immediate): always changes to Thumb mode. | |
| 838 * ``BXJ``: always changes to Jazelle mode. | |
| 839 * ``CPS``: not available to user code. | |
| 840 * ``LDM``, exception return version: not available to user code. | |
| 841 * ``LDM``, kernel version: not available to user code. | |
| 842 * ``LDR*T`` (unprivileged load operations): theoretically harmless, | |
| 843 but suspicious when found in user code. Use ``LDR`` instead. | |
| 844 * ``MSR``, kernel version: not available to user code. | |
| 845 * ``RFE``: not available to user code. | |
| 846 * ``SETEND``: theoretically harmless, but suspicious when found in | |
| 847 user code. May make some future validator extensions difficult. | |
| 848 * ``SMC``: not available to user code. | |
| 849 * ``SRS``: not available to user code. | |
| 850 * ``STM``, kernel version: not available to user code. | |
| 851 * ``STR*T`` (unprivileged store operations): theoretically harmless, | |
| 852 but suspicious when found in user code. Use ``STR`` instead. | |
| 853 * ``SVC``/``SWI``: allows direct operating system interaction. | |
| 854 * Any unassigned hint instruction: difficult to reason about, so | |
| 855 treated as suspicious. | |
| 856 | |
| 857 More details are available in the `ARMv7 instruction table definition | |
| 858 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trus ted/validator_arm/armv7.table>`_. | |
| 859 | |
| 860 Coprocessors | |
| 861 ^^^^^^^^^^^^ | |
| 862 | |
| 863 ARM has traditionally added new instruction set features through | |
| 864 coprocessors. Coprocessors are accessed through a small set of | |
| 865 instructions, and often have their own register files. Floating point | |
| 866 and the NEON vector extensions are both implemented as coprocessors, as | |
| 867 is the MMU. | |
| 868 | |
| 869 We're confident that the side-effects of coprocessors in slots 10 and 11 | |
| 870 (that is, floating point, NEON, etc.) are well-understood. These are in | |
| 871 the coprocessor space reserved by ARM Ltd. for their own extensions | |
| 872 (``CP8``--\ ``CP15``), and are unlikely to change significantly. So, we | |
| 873 allow untrusted code to use coprocessors 10 and 11, and we mandate the | |
| 874 presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor | |
| 875 Extension, VFPv4, FP16 and other extensions are allowed but not | |
| 876 required, and may fail on processors that do not support them, it is | |
| 877 therefore the program's responsibility to validate their availability | |
| 878 before executing them. | |
| 879 | |
| 880 We don't allow access to any other ARM-reserved coprocessor | |
| 881 (``CP8``--\ ``CP9`` or ``CP12``--\ ``CP15``). It's possible that read | |
| 882 access to ``CP15`` might be useful, and we might allow it in the | |
| 883 future---but again, it's easier to loosen the restrictions than tighten | |
| 884 them, so we ban it for now. | |
| 885 | |
| 886 We do not, and probably never will, allow access to the vendor-specific | |
| 887 coprocessor space, ``CP0``--\ ``CP7``. We're simply not confident in our | |
| 888 ability to model the operations on these coprocessors, given that | |
| 889 vendors often leave them poorly-specified. Unfortunately this eliminates | |
| 890 some legacy floating point and vector implementations, but these are | |
| 891 superceded on ARMv7-A parts anyway. | |
| 892 | |
| 893 Validator Code | |
| 894 ^^^^^^^^^^^^^^ | |
| 895 | |
| 896 By now you're itching to see the sandbox validator's code and dissect | |
| 897 it. You'll have a disapointing read: at less that 500 lines of code | |
| 898 `validator.cc | |
| 899 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/validator.cc>`_ | |
| 900 is quite simple to understand and much shorter than this document. It's | |
| 901 of course dependent on the `ARMv7 instruction table definition | |
| 902 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/armv7.table>`_, | |
| 903 which teaches it about the ARMv7 instruction set. | |
| OLD | NEW |