OLD | NEW |
(Empty) | |
| 1 {{+bindTo:partials.standard_nacl_article}} |
| 2 |
| 3 <section id="arm-32-bit-sandbox"> |
| 4 <h1 id="arm-32-bit-sandbox">ARM 32-bit Sandbox</h1> |
| 5 <p>Native Client for ARM is a method for running programs—even malicious |
| 6 ones—safely, on computers that use 32-bit ARM processors. It’s an |
| 7 extension of earlier work on Native Client for x86 processors. This |
| 8 security is provided with a low performance overhead of about 10% over |
| 9 regular ARM code, and as you’ll see in this document the sandbox model |
| 10 is beautifully simple, meaning that the trusted codebase is much easier |
| 11 to validate.</p> |
| 12 <p>As an implementation detail, the Native Client 32-bit ARM sandbox is |
| 13 currently used by Portable Native Client to execute code on 32-bit ARM |
| 14 machines in a safe manner. The portable bitcode contained in a <strong>pexe</str
ong> |
| 15 is translated to a 32-bit ARM <strong>nexe</strong> before execution. This may c
hange |
| 16 at a point in time: Portable Native Client doesn’t necessarily need this |
| 17 sandbox to execute code on ARM. Note that the Portable Native Client |
| 18 compiler itself is also untrusted: it too runs in the ARM sandbox |
| 19 described in this document.</p> |
| 20 <p>On this page, we describe how Native Client works on 32-bit ARM. We |
| 21 assume no prior knowledge about the internals of Native Client, on x86 |
| 22 or any other architecture, but we do assume some familiarity with |
| 23 assembly languages in general.</p> |
| 24 <div class="contents local" id="contents" style="display: none"> |
| 25 <ul class="small-gap"> |
| 26 <li><p class="first"><a class="reference internal" href="#an-introduction-to-the
-arm-architecture" id="id2">An Introduction to the ARM Architecture</a></p> |
| 27 <ul class="small-gap"> |
| 28 <li><a class="reference internal" href="#about-arm-and-armv7-a" id="id3">About A
RM and ARMv7-A</a></li> |
| 29 <li><a class="reference internal" href="#arm-programmer-s-model" id="id4">ARM Pr
ogrammer’s Model</a></li> |
| 30 </ul> |
| 31 </li> |
| 32 <li><p class="first"><a class="reference internal" href="#the-native-client-appr
oach" id="id5">The Native Client Approach</a></p> |
| 33 <ul class="small-gap"> |
| 34 <li><p class="first"><a class="reference internal" href="#nacl-arm-pure-software
-fault-isolation" id="id6">NaCl/ARM: Pure Software Fault Isolation</a></p> |
| 35 <ul class="small-gap"> |
| 36 <li><a class="reference internal" href="#load-and-store" id="id7"><em>Load</em>
and <em>Store</em></a></li> |
| 37 <li><a class="reference internal" href="#the-stack-pointer-thread-pointer-and-pr
ogram-counter" id="id8">The Stack Pointer, Thread Pointer, and Program Counter</
a></li> |
| 38 <li><a class="reference internal" href="#indirect-branch" id="id9"><em>Indirect
Branch</em></a></li> |
| 39 <li><a class="reference internal" href="#literal-pools-and-data-bundles" id="id1
0">Literal Pools and Data Bundles</a></li> |
| 40 </ul> |
| 41 </li> |
| 42 <li><p class="first"><a class="reference internal" href="#trampolines-and-memory
-layout" id="id11">Trampolines and Memory Layout</a></p> |
| 43 <ul class="small-gap"> |
| 44 <li><a class="reference internal" href="#memory-map" id="id12">Memory Map</a></l
i> |
| 45 <li><a class="reference internal" href="#inside-a-trampoline" id="id13">Inside a
Trampoline</a></li> |
| 46 </ul> |
| 47 </li> |
| 48 <li><p class="first"><a class="reference internal" href="#loose-ends" id="id14">
Loose Ends</a></p> |
| 49 <ul class="small-gap"> |
| 50 <li><a class="reference internal" href="#forbidden-instructions" id="id15">Forbi
dden Instructions</a></li> |
| 51 <li><a class="reference internal" href="#coprocessors" id="id16">Coprocessors</a
></li> |
| 52 <li><a class="reference internal" href="#validator-code" id="id17">Validator Cod
e</a></li> |
| 53 </ul> |
| 54 </li> |
| 55 </ul> |
| 56 </li> |
| 57 </ul> |
| 58 |
| 59 </div><section id="an-introduction-to-the-arm-architecture"> |
| 60 <h2 id="an-introduction-to-the-arm-architecture">An Introduction to the ARM Arch
itecture</h2> |
| 61 <p>In this section, we summarize the relevant parts of the ARM processor |
| 62 architecture.</p> |
| 63 <section id="about-arm-and-armv7-a"> |
| 64 <h3 id="about-arm-and-armv7-a">About ARM and ARMv7-A</h3> |
| 65 <p>ARM is one of the older commercial “RISC” processor designs, dati
ng back |
| 66 to the early 1980s. Today, it is used primarily in embedded systems: |
| 67 everything from toys, to home automation, to automobiles. However, its |
| 68 most visible use is in cellular phones, tablets and some |
| 69 laptops.</p> |
| 70 <p>Through the years, there have been many revisions of the ARM |
| 71 architecture, written as ARMv<em>X</em> for some version <em>X</em>. Native Clie
nt |
| 72 specifically targets the ARMv7-A architecture commonly used in high-end |
| 73 phones and smartbooks. This revision, defined in the mid-2000s, adds a |
| 74 number of useful instructions, and specifies some portions of the system |
| 75 that used to be left to individual chip manufacturers. Critically, |
| 76 ARMv7-A specifies the “eXecute Never” bit, or <em>XN</em>. This page
table |
| 77 attribute lets us mark memory as non-executable. Our security relies on |
| 78 the presence of this feature.</p> |
| 79 <p>ARMv8 adds a new 64-bit instruction set architecture called A64, while |
| 80 also enhancing the 32-bit A32 ISA. For Native Client’s purposes the A32 |
| 81 ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new |
| 82 instructions. This document only discussed the 32-bit A32 instruction |
| 83 set: A64 would require a different sandboxing model.</p> |
| 84 </section><section id="arm-programmer-s-model"> |
| 85 <h3 id="arm-programmer-s-model">ARM Programmer’s Model</h3> |
| 86 <p>While modern ARM chips support several instruction encodings, 32-bit |
| 87 Native Client on ARM focuses on a single one: a fixed-width encoding |
| 88 where every instruction is 32-bits wide called A32 (previously, and |
| 89 confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called |
| 90 T32), Jazelle, ThumbEE and such aren’t supported by Native Client. This |
| 91 dramatically simplifies some of our analyses, as we’ll see later. Nearly |
| 92 every instruction can be conditionally executed based on the contents of |
| 93 a dedicated condition code register.</p> |
| 94 <p>ARM processors have 16 general-purpose registers used for integer and |
| 95 memory operations, written <code>r0</code> through <code>r15</code>. Of these, t
wo have |
| 96 special roles baked in to the hardware:</p> |
| 97 <ul class="small-gap"> |
| 98 <li><code>r14</code> is the Link Register. The ARM <em>call</em> instruction |
| 99 (<em>branch-with-link</em>) doesn’t use the stack directly. Instead, it |
| 100 stashes the return address in <code>r14</code>. In other circumstances, <code>r1
4</code> |
| 101 can be (and is!) used as a general-purpose register. When <code>r14</code> is |
| 102 playing its Link Register role, it’s referred to as <code>lr</code>.</li> |
| 103 <li><code>r15</code> is the Program Counter. While it can be read and written li
ke |
| 104 any other register, setting it to a new value will cause execution to |
| 105 jump to a new address. Using it in some circumstances is also |
| 106 undefined by the ARM architecture. Because of this, <code>r15</code> is never |
| 107 used for anything else, and is referred to as <code>pc</code>.</li> |
| 108 </ul> |
| 109 <p>Other registers are given roles by convention. The only important |
| 110 registers to Native Client are <code>r9</code> and <code>r13</code>, which are u
sed as the |
| 111 Thread Pointer location and Stack Pointer. When playing this role, |
| 112 they’re referred to as <code>tp</code> and <code>sp</code>.</p> |
| 113 <p>Like other RISC-inspired designs, ARM programs use explicit <em>load</em> and |
| 114 <em>store</em> instructions to access memory. All other instructions operate |
| 115 only on registers, or on registers and small constants called |
| 116 immediates. Because both instructions and data words are 32-bits, we |
| 117 can’t simply embed a 32-bit number into an instruction. ARM programs use |
| 118 three methods to work around this, all of which Native Client exploits:</p> |
| 119 <ol class="arabic simple"> |
| 120 <li>Many instructions can encode a modified immediate, which is an 8-bit |
| 121 number rotated right by an even number of bits.</li> |
| 122 <li>The <code>movw</code> and <code>movt</code> instructions can be used to set
the top and |
| 123 bottom 16-bits of a register, and can therefore encode any 32-bit |
| 124 immediate.</li> |
| 125 <li>For values that can’t be represented as modified immediates, ARM |
| 126 programs use <code>pc</code>-relative loads to load data from inside the |
| 127 code—hidden in a place where it won’t be executed such as “con
stant |
| 128 pools”, just past the final return of a function.</li> |
| 129 </ol> |
| 130 <p>We’ll introduce more details of the ARM instruction set later, as we |
| 131 walk through the system.</p> |
| 132 </section></section><section id="the-native-client-approach"> |
| 133 <h2 id="the-native-client-approach">The Native Client Approach</h2> |
| 134 <p>Native Client runs an untrusted program, potentially from an unknown or |
| 135 malicious source, inside a sandbox created by a trusted runtime. The |
| 136 trusted runtime allows the untrusted program to “call-out” and perfo
rm |
| 137 certain actions, such as drawing graphics, but prevents it from |
| 138 accessing the operating system directly. This “call-out” facility, |
| 139 called a trampoline, looks like a standard function call to the |
| 140 untrusted program, but it allows control to escape from the sandbox in a |
| 141 controlled way.</p> |
| 142 <p>The untrusted program and trusted runtime inhabit the same process, or |
| 143 virtual address space, maintained by the operating system. To keep the |
| 144 trusted runtime behaving the way we expect, we must prevent the |
| 145 untrusted program from accessing and modifying its internals. Since they |
| 146 share a virtual address space, we can’t rely on the operating system for |
| 147 this. Instead, we isolate the untrusted program from the trusted |
| 148 runtime.</p> |
| 149 <p>Unlike modern operating systems, we use a cooperative isolation |
| 150 method. Native Client can’t run any off-the-shelf program compiled for |
| 151 an off-the-shelf operating system. The program must be compiled to |
| 152 comply with Native Client’s rules. The details vary on each platform, |
| 153 but in general, the untrusted program:</p> |
| 154 <ul class="small-gap"> |
| 155 <li>Must not attempt to use certain forbidden instructions, such as system |
| 156 calls.</li> |
| 157 <li>Must not attempt to modify its own code without abiding by Native |
| 158 Client’s code modification rules.</li> |
| 159 <li>Must not jump into the middle of an instruction group, or otherwise do |
| 160 tricky things to cause instructions to be interpreted multiple ways.</li> |
| 161 <li>Must use special, strictly-defined instruction sequences to perform |
| 162 permitted but potentially dangerous actions. We call these sequences |
| 163 pseudo-instructions.</li> |
| 164 </ul> |
| 165 <p>We can’t simply take the program’s word that it complies with the
se |
| 166 rules—we call it “untrusted” for a reason! Nor do we require i
t to be |
| 167 produced by a special compiler; in practice, we don’t trust our |
| 168 compilers either. Instead, we apply a load-time validator that |
| 169 disassembles the program. The validator either proves that the program |
| 170 complies with our rules, or rejects it as unsafe. By keeping the rules |
| 171 simple, we keep the validator simple, small, and fast. We like to put |
| 172 our trust in small, simple things, and the validator is key to the |
| 173 system’s security.</p> |
| 174 <aside class="note"> |
| 175 For the computationally-inclined, all our validators scale linearly in |
| 176 the size of the program. |
| 177 </aside> |
| 178 <section id="nacl-arm-pure-software-fault-isolation"> |
| 179 <h3 id="nacl-arm-pure-software-fault-isolation">NaCl/ARM: Pure Software Fault Is
olation</h3> |
| 180 <p>In the original Native Client system for the x86, we used unusual |
| 181 hardware features of that processor (the segment registers) to isolate |
| 182 untrusted programs. This was simple and fast, but won’t work on ARM, |
| 183 which has nothing equivalent. Instead, we use pure software fault |
| 184 isolation.</p> |
| 185 <p>We use a fixed address space layout: the untrusted program gets the |
| 186 lowest gigabyte, addresses <code>0</code> through <code>0x3FFFFFFF</code>. The r
est of the |
| 187 address space holds the trusted runtime and the operating system. We |
| 188 isolate the program by requiring every <em>load</em>, <em>store</em>, and <em>in
direct |
| 189 branch</em> (to an address in a register) to use a pseudo-instruction. The |
| 190 pseudo-instructions ensure that the address stays within the |
| 191 sandbox. The <em>indirect branch</em> pseudo-instruction, in turn, ensures that |
| 192 such branches won’t split up other pseudo-instructions.</p> |
| 193 <p>At either side of the sandbox, we place small (8KiB) guard |
| 194 regions. These are simply areas in the process’s address space that are |
| 195 mapped without read, write, or execute permissions, so any attempt to |
| 196 access them for any reason—<em>load</em>, <em>store</em>, or <em>jump</em>
—will cause a |
| 197 fault.</p> |
| 198 <p>Finally, we ban the use of certain instructions, notably direct system |
| 199 calls. This is to ensure that the untrusted program can be run on any |
| 200 operating system supported by Native Client, and to prevent access to |
| 201 certain system features that might be used to subvert the sandbox. As a |
| 202 side effect, it helps to prevent programs from exploiting buggy |
| 203 operating system APIs.</p> |
| 204 <p>Let’s walk through the details, starting with the simplest part: <em>lo
ad</em> |
| 205 and <em>store</em>.</p> |
| 206 <section id="load-and-store"> |
| 207 <h4 id="load-and-store"><em>Load</em> and <em>Store</em></h4> |
| 208 <p>All access to memory must be through <em>load</em> and <em>store</em> |
| 209 pseudo-instructions. These are simply a native <em>load</em> or <em>store</em> |
| 210 instruction, preceded by a guard instruction.</p> |
| 211 <p>Each <em>load</em> or <em>store</em> pseudo-instruction is similar to the <em
>load</em> shown |
| 212 below. We use abstract “placeholder” registers instead of specific |
| 213 numbered registers for the sake of discussion. <code>rA</code> is the register |
| 214 holding the address to load from. <code>rD</code> is the destination for the |
| 215 loaded data.</p> |
| 216 <pre> |
| 217 bic rA, #0xC0000000 |
| 218 ldr rD, [rA] |
| 219 </pre> |
| 220 <p>The first instruction, <code>bic</code>, clears the top two bits of <code>rA<
/code>. In |
| 221 this case, that means that the value in <code>rA</code> is forced to an address |
| 222 inside our sandbox, between <code>0</code> and <code>0x3FFFFFFF</code>, inclusiv
e.</p> |
| 223 <p>The second instruction, <code>ldr</code>, uses the previously-sandboxed addre
ss |
| 224 to load a value. This address might not be the address that the program |
| 225 intended, and might cause an access to an unmapped memory location |
| 226 within the sandbox: <code>bic</code> forces the address to be valid, by clearing |
| 227 the top two bits. This is a no-op in a correct program.</p> |
| 228 <p>This illustrates a common property of all Native Client systems: we aim |
| 229 for safety, not correctness. A program using an invalid address in |
| 230 <code>rA</code> here is simply broken, so we are free to do whatever we want to |
| 231 preserve safety. In this case the program might load an invalid (but |
| 232 safe) value, or cause a segmentation fault limited to the untrusted |
| 233 code.</p> |
| 234 <p>Now, if we allowed arbitrary branches within the program, a malicious |
| 235 program could set up carefully-crafted values in <code>rA</code>, and then jump |
| 236 straight to the <code>ldr</code>. This is why we validate that programs never |
| 237 split pseudo-instructions.</p> |
| 238 <section id="alternative-sandboxing"> |
| 239 <h5 id="alternative-sandboxing">Alternative Sandboxing</h5> |
| 240 <pre> |
| 241 tst rA, #0xC0000000 |
| 242 ldreq rD, [rA] |
| 243 </pre> |
| 244 <p>The first instruction, <code>tst</code>, performs a bitwise-<code>AND</code>
of <code>rA</code> |
| 245 and the modified immediate literal, <code>0xC0000000</code>. It sets the |
| 246 condition flags based on the result, but does not write the result to a |
| 247 register. In particular, it sets the <code>Z</code> condition flag if the result |
| 248 was zero—if the two values had no set bits in common. In this case, |
| 249 that means that the value in <code>rA</code> was an address inside our sandbox, |
| 250 between <code>0</code> and <code>0x3FFFFFFF</code>, inclusive.</p> |
| 251 <p>The second instruction, <code>ldreq</code>, is a conditional load if equal. A
s we |
| 252 mentioned before, nearly all ARM instructions can be made |
| 253 conditional. In assembly language, we simply stick the desired condition |
| 254 on the end of the instruction’s mnemonic name. Here, the condition is |
| 255 <code>EQ</code>, which causes the instruction to execute only if the <code>Z</co
de> flag |
| 256 is set.</p> |
| 257 <p>Thus, when the pseudo-instruction executes, the <code>tst</code> sets <code>Z
</code> if |
| 258 (and only if) the value in <code>rA</code> is an address within the bounds of th
e |
| 259 sandbox, and then the <code>ldreq</code> loads if (and only if) it was. If <code
>rA</code> |
| 260 held an invalid address, the <em>load</em> does not execute, and <code>rD</code>
is |
| 261 unchanged.</p> |
| 262 <aside class="note"> |
| 263 The <code>tst</code>-based sequence is faster than the <code>bic</code>-based se
quence |
| 264 on modern ARM chips. It avoids a data dependency in the address |
| 265 register. This is why we keep both around. The <code>tst</code>-based sequence |
| 266 unfortunately leaks information on some processors, and is therefore |
| 267 forbidden on certain processors. This effectively means that it cannot |
| 268 be used for regular Native Client <strong>nexe</strong> files, but can be used w
ith |
| 269 Portable Native Client because the target processor is known at |
| 270 translation time from <strong>pexe</strong> to <strong>nexe</strong>. |
| 271 </aside> |
| 272 </section><section id="addressing-modes"> |
| 273 <h5 id="addressing-modes">Addressing Modes</h5> |
| 274 <p>ARM has an unusually rich set of addressing modes. We allow all but one: |
| 275 register-indexed, where two registers are added to determine the |
| 276 address.</p> |
| 277 <p>We permit simple <em>load</em> and <em>store</em>, as shown above. We also pe
rmit |
| 278 displacement, pre-index, and post-index memory operations:</p> |
| 279 <pre> |
| 280 bic rA, #0xC0000000 |
| 281 ldr rD, [rA, #1234] ; This is fine. |
| 282 bic rA, #0xC0000000 |
| 283 ldr rD, [rA, #1234]! ; Also fine. |
| 284 bic rA, #0xC0000000 |
| 285 ldr rD, [rA], #1234 ; Looking good. |
| 286 </pre> |
| 287 <p>In each case, we know <code>rA</code> points into the sandbox when the <code>
ldr</code> |
| 288 executes. We allow adding an immediate displacement to <code>rA</code> to |
| 289 determine the final address (as in the first two examples here) because |
| 290 the largest immediate displacement is ±4095 bytes, while our guard pages |
| 291 are 8192 bytes wide.</p> |
| 292 <p>We also allow ARM’s more unusual <em>load</em> and <em>store</em> instr
uctions, such |
| 293 as <em>load-multiple</em> and <em>store-multiple</em>, etc.</p> |
| 294 </section><section id="conditional-load-and-store"> |
| 295 <h5 id="conditional-load-and-store">Conditional <em>Load</em> and <em>Store</em>
</h5> |
| 296 <p>There’s one problem with the pseudo-instructions shown above: they are |
| 297 unconditional (assuming <code>rA</code> is valid). ARM compilers regularly use |
| 298 conditional <em>load</em> and <em>store</em>, so we should support this in Nativ
e |
| 299 Client. We do so by defining alternate, predictable |
| 300 pseudo-instructions. Here is a conditional <em>store</em> |
| 301 (<em>store-if-greater-than</em>) using this pseudo-instruction sequence:</p> |
| 302 <pre> |
| 303 bicgt rA, #0xC0000000 |
| 304 strgt rX, [rA, #123] |
| 305 </pre> |
| 306 </section></section><section id="the-stack-pointer-thread-pointer-and-program-co
unter"> |
| 307 <h4 id="the-stack-pointer-thread-pointer-and-program-counter">The Stack Pointer,
Thread Pointer, and Program Counter</h4> |
| 308 <section id="stack-pointer"> |
| 309 <h5 id="stack-pointer">Stack Pointer</h5> |
| 310 <p>In C-like languages, the stack is used to store return addresses during |
| 311 function calls, as well as any local variables that won’t fit in |
| 312 registers. This makes stack operations very common.</p> |
| 313 <p>Native Client does not require guard instructions on any <em>load</em> or |
| 314 <em>store</em> involving the stack pointer, <code>sp</code>. This improves perfo
rmance |
| 315 and reduces code size. However, ARM’s stack pointer isn’t special: i
t’s |
| 316 just another register, called <code>sp</code> only by convention. To make it saf
e |
| 317 to use this register as a <em>load</em> or <em>store</em> address without guards
, we |
| 318 add a rule: <code>sp</code> must always contain a valid address.</p> |
| 319 <p>We enforce this rule by restricting the sorts of operations that |
| 320 programs can use to alter <code>sp</code>. Programs can alter <code>sp</code> by
adding or |
| 321 subtracting an immediate, as a side-effect of a <em>load</em> or <em>store</em>:
</p> |
| 322 <pre> |
| 323 ldr rX, [sp], #4! ; Load from stack, then add 4 to sp. |
| 324 str rX, [sp, #1234]! ; Add 1234 to sp, then store to stack. |
| 325 </pre> |
| 326 <p>These are safe because, as we mentioned before, the largest immediate |
| 327 available in a <em>load</em> or <em>store</em> is ±4095. Even after adding or |
| 328 subtracting 4095, the stack pointer will still be within the sandbox or |
| 329 guard regions.</p> |
| 330 <p>Any other operation that alters <code>sp</code> must be followed by a guard |
| 331 instruction. The most common alterations, in practice, are addition and |
| 332 subtraction of arbitrary integers:</p> |
| 333 <pre> |
| 334 add sp, rX |
| 335 bic sp, #0xC0000000 |
| 336 </pre> |
| 337 <p>The <code>bic</code> is similar to the one we used for conditional <em>load</
em> and |
| 338 <em>store</em>, and serves exactly the same purpose: after it completes, <code>s
p</code> |
| 339 is a valid address.</p> |
| 340 <aside class="note"> |
| 341 Clever assembly programmers and compilers may want to use this |
| 342 “trusted” property of <code>sp</code> to emit more efficient code: i
n a hot |
| 343 loop instead of using <code>sp</code> as a stack pointer it can be temporarily |
| 344 used as an index pointer (e.g. to traverse an array). This avoids the |
| 345 extra <code>bic</code> whenever the pointer is updated in the loop. |
| 346 </aside> |
| 347 </section><section id="thread-pointer-loads"> |
| 348 <h5 id="thread-pointer-loads">Thread Pointer Loads</h5> |
| 349 <p>The thread pointer and IRT thread pointer are stored in the trusted |
| 350 address space. All uses and definitions of <code>r9</code> from untrusted code |
| 351 are forbidden except as follows:</p> |
| 352 <pre> |
| 353 ldr Rn, [r9] ; Load user thread pointer. |
| 354 ldr Rn, [r9, #4] ; Load IRT thread pointer. |
| 355 </pre> |
| 356 </section><section id="pc-relative-loads"> |
| 357 <h5 id="pc-relative-loads"><code>pc</code>-relative Loads</h5> |
| 358 <p>By extension, we also allow <em>load</em> through the <code>pc</code> without
a |
| 359 mask. The explanation is quite similar:</p> |
| 360 <ul class="small-gap"> |
| 361 <li>Our control-flow isolation rules mean that the <code>pc</code> will always |
| 362 point into the sandbox.</li> |
| 363 <li>The maximum immediate displacement that can be used in a |
| 364 <code>pc</code>-relative <em>load</em> is smaller than the width of the guard pa
ges.</li> |
| 365 </ul> |
| 366 <p>We do not allow <code>pc</code>-relative stores, because they look suspicious
ly |
| 367 like self-modifying code, or any addressing mode that would alter the |
| 368 <code>pc</code> as a side effect of the <em>load</em>.</p> |
| 369 </section></section><section id="indirect-branch"> |
| 370 <h4 id="indirect-branch"><em>Indirect Branch</em></h4> |
| 371 <p>There are two types of control flow on ARM: direct and indirect. Direct |
| 372 control flow instructions have an embedded target address or |
| 373 offset. Indirect control flow instructions take their destination |
| 374 address from a register. The <code>b</code> (branch) and <code>bl</code> |
| 375 (<em>branch-with-link</em>) instructions are <em>direct branch</em> and <em>call
</em>, |
| 376 respectively. The <code>bx</code> (<em>branch-exchange</em>) and <code>blx</code
> |
| 377 (<em>branch-with-link-exchange</em>) are the indirect equivalents.</p> |
| 378 <p>Because the program counter <code>pc</code> is simply another register, ARM a
lso |
| 379 has many implicit indirect control flow instructions. Programs can |
| 380 operate on the <code>pc</code> using <em>add</em> or <em>load</em>, or even outl
andish (and |
| 381 often specified as having unpredictable-behavior) things like multiply! |
| 382 In Native Client we ban all such instructions. Indirect control flow is |
| 383 exclusively through <code>bx</code> and <code>blx</code>. Because all of ARMR
17;s control |
| 384 flow instructions are called <em>branch</em> instructions, we’ll use the t
erm |
| 385 <em>indirect branch</em> from here on, even though this includes things like |
| 386 <em>virtual call</em>, <em>return</em>, and the like.</p> |
| 387 <section id="the-trouble-with-indirection"> |
| 388 <h5 id="the-trouble-with-indirection">The Trouble with Indirection</h5> |
| 389 <p><em>Indirect branch</em> present two problems for Native Client:</p> |
| 390 <ul class="small-gap"> |
| 391 <li>We must ensure that they don’t send execution outside the sandbox.</li
> |
| 392 <li>We must ensure that they don’t break up the instructions inside a |
| 393 pseudo-instruction, by landing on the second one.</li> |
| 394 </ul> |
| 395 <aside class="note"> |
| 396 On the x86 architectures we must also ensure that it doesn’t land |
| 397 inside an instruction. This is unnecessary on ARM, where all |
| 398 instructions are 32-bit wide. |
| 399 </aside> |
| 400 <p>Checking both of these for <em>direct branch</em> is easy: the validator just |
| 401 pulls the (fixed) target address out of the instruction and checks what |
| 402 it points to.</p> |
| 403 </section><section id="the-native-client-solution-bundles"> |
| 404 <h5 id="the-native-client-solution-bundles">The Native Client Solution: “B
undles”</h5> |
| 405 <p>For <em>indirect branch</em>, we can address the first problem by simply |
| 406 masking some high-order bits off the address, like we did for <em>load</em> and |
| 407 <em>store</em>. The second problem is more subtle. Detecting every possible |
| 408 route that every <em>indirect branch</em> might take is difficult. Instead, we |
| 409 take the approach pioneered by the original Native Client: we restrict |
| 410 the possible places that any <em>indirect branch</em> can land. On Native |
| 411 Client for ARM, <em>indirect branch</em> can target any address that has its |
| 412 bottom four bits clear—any address that’s <code>0 mod 16</code>. We
call these |
| 413 16-byte chunks of code “bundles”. The validator makes sure that no |
| 414 pseudo-instruction straddles a bundle boundary. Compilers must pad with` |
| 415 <cite>nop`</cite>s to ensure that every pseudo-instruction fits entirely inside |
| 416 one bundle.</p> |
| 417 <p>Here is the <em>indirect branch</em> pseudo-instruction. As you can see, it |
| 418 clears the top two and bottom four bits of the address:</p> |
| 419 <pre> |
| 420 bic rA, #0xC000000F |
| 421 bx rA |
| 422 </pre> |
| 423 <p>This particular pseudo-instruction (a <code>bic</code> followed by a <code>bx
</code>) is |
| 424 used for computed jumps in switch tables and returning from functions, |
| 425 among other uses. Recall that, under ARM’s modified immediate rules, we |
| 426 can fit the constant <code>0xC000000F</code> into the <code>bic</code> instructi
on’s |
| 427 immediate field: <code>0xC000000F</code> is the 8-bit constant <code>0xFC</code>
, rotated |
| 428 right by 4 bits.</p> |
| 429 <p>The other useful variant is the <em>indirect branch-with-link</em>, which is |
| 430 the ARM equivalent to <em>call</em>:</p> |
| 431 <pre> |
| 432 bic rA, #0xC000000F |
| 433 blx rA |
| 434 </pre> |
| 435 <p>This is used for indirect function calls—commonly seen in C++ programs |
| 436 as virtual calls, but also for calling function pointers in C.</p> |
| 437 <p>Note that both <em>indirect branch</em> pseudo-instructions use <code>bic</co
de>, rather |
| 438 than the <code>tst</code> instruction we allow for <em>load</em> and <em>store</
em>. There are |
| 439 two reasons for this:</p> |
| 440 <ol class="arabic simple"> |
| 441 <li>Conditional <em>branch</em> is very common. Much more common than |
| 442 conditional <em>load</em> and <em>store</em>. If we supported an alternative |
| 443 <code>tst</code>-based sequence for <em>branch</em>, it would be rare.</li> |
| 444 <li>There’s no performance benefit to using <code>tst</code> here on moder
n ARM |
| 445 chips. <em>Branch</em> consumes its operands later in the pipeline than |
| 446 <em>load</em> and <em>store</em> (since they don’t have to generate an add
ress, |
| 447 etc) so this sequence doesn’t stall.</li> |
| 448 </ol> |
| 449 <aside class="note"> |
| 450 <p>At this point astute readers are wondering what the <code>x</code> in <code>b
x</code> |
| 451 and <code>blx</code> means. We told you it stood for “exchange”, but
exchange |
| 452 to what? ARM, for all the reduced-ness of its instruction set, can |
| 453 change execution mode from A32 (ARM) to T32 (Thumb) and back with |
| 454 these <em>branch</em> instructions, called <em>interworking branch</em>. Recall
that |
| 455 A32 instructions are 32-bit wide, and T32 instructions are a mix of |
| 456 both 16-bit or 32-bit wide. The destination address given to a |
| 457 <em>branch</em> therefore cannot sensibly have its bottom bit set in either |
| 458 instruction set: that would be an unaligned instruction in both cases, |
| 459 and ARM simply doesn’t support this. The bottom bit for the <em>indirect |
| 460 branch</em> was therefore cleverly recycled by the ARM architecture to |
| 461 mean “switch to T32 mode” when set!</p> |
| 462 <p>As you’ve figured out by now, Native Client’s sandbox won’t
be very |
| 463 happy if A32 instructions were to be executed as T32 instructions: who |
| 464 know what they correspond to? A malicious person could craft valid |
| 465 A32 code that’s actually very naughty T32 code, somewhat like forming |
| 466 a sentence that happens to be valid in English and French but with |
| 467 completely different meanings, complimenting the reader in one |
| 468 language and insulting them in the other.</p> |
| 469 <p>You’ve figured out by now that the bundle alignment restrictions of |
| 470 the Native Client sandbox already take care of making this travesty |
| 471 impossible: by masking off the bottom 4 bits of the destination the |
| 472 interworking nature of ARM’s <em>indirect branch</em> is completely avoide
d.</p> |
| 473 |
| 474 </aside> |
| 475 </section><section id="call-and-return"> |
| 476 <h5 id="call-and-return"><em>Call</em> and <em>Return</em></h5> |
| 477 <p>On ARM, there is no <em>call</em> or <em>return</em> instruction. A <em>call<
/em> is simply a |
| 478 <em>branch</em> that just happen to load a return address into <code>lr</code>,
the link |
| 479 register. If the called function is a leaf (that is, if it calls no |
| 480 other functions before returning), it simply branches to the address |
| 481 stored in <code>lr</code> to <em>return</em> to its caller:</p> |
| 482 <pre> |
| 483 bic lr, #0xC000000F |
| 484 bx lr |
| 485 </pre> |
| 486 <p>If the function called other functions, however, it had to spill <code>lr</co
de> |
| 487 onto the stack. On x86, this is done implicitly, but it is explicit on |
| 488 ARM:</p> |
| 489 <pre> |
| 490 push { lr } |
| 491 ; Some code here... |
| 492 pop { lr } |
| 493 bic lr, #0xC000000F |
| 494 bx lr |
| 495 </pre> |
| 496 <p>There are two things to note about this code.</p> |
| 497 <ol class="arabic simple"> |
| 498 <li>As we mentioned before, we don’t allow arbitrary instructions to |
| 499 write to the Program Counter, <code>pc</code>. Thus, while a traditional ARM |
| 500 program might have popped directly into <code>pc</code> to end the function, |
| 501 we require a pop into a register, followed by a pseudo-instruction.</li> |
| 502 <li>Function returns really are just <em>indirect branch</em>, with the same |
| 503 restrictions. This means that functions can only return to addresses |
| 504 that are bundle-aligned: <code>0 mod 16</code>.</li> |
| 505 </ol> |
| 506 <p>The implication here is that a <em>call</em>—the <em>branch</em> that e
nters |
| 507 functions—must be placed at the end of the bundle, so that the return |
| 508 address they generate is <code>0 mod 16</code>. Otherwise, when we clear the |
| 509 bottom four bits, the program would enter an infinite loop! (Native |
| 510 Client doesn’t try to prevent infinite loops, but the validator actually |
| 511 does check the alignment of calls. This is because, when we were writing |
| 512 the compiler, it was annoying to find out our calls were in the wrong |
| 513 place by having the program run forever!)</p> |
| 514 <aside class="note"> |
| 515 Properly balancing the CPU’s <em>call</em>/<em>return</em> actually allows
it to |
| 516 perform much better by allowing it to speculatively execute the return |
| 517 address’ code. For more information on ARM’s <em>call</em>/<em>retur
n</em> stack see |
| 518 ARM’s technical reference manual. |
| 519 </aside> |
| 520 </section></section><section id="literal-pools-and-data-bundles"> |
| 521 <h4 id="literal-pools-and-data-bundles">Literal Pools and Data Bundles</h4> |
| 522 <p>In the section where we described the ARM architecture, we mentioned |
| 523 ARM’s unusual immediate forms. To restate:</p> |
| 524 <ul class="small-gap"> |
| 525 <li>ARM instructions are fixed-length, 32-bits, so we can’t have an |
| 526 instruction that includes an arbitrary 32-bit constant.</li> |
| 527 <li>Many ARM instructions can include a modified immediate constant, which |
| 528 is flexible, but limited.</li> |
| 529 <li>For any other value (particularly addresses), ARM programs explicitly |
| 530 load constants from inside the code itself.</li> |
| 531 </ul> |
| 532 <aside class="note"> |
| 533 ARMv7 introduces some instructions, <code>movw</code> and <code>movt</code>, tha
t try to |
| 534 address this by letting us directly load larger constants. Our |
| 535 toolchain uses this capability in some cases. |
| 536 </aside> |
| 537 <p>Here’s a typical example of the use of a literal pool. ARM assemblers |
| 538 typically hide the details—this is the sort of code you’d see produc
ed |
| 539 by a disassembler, but with more comments.</p> |
| 540 <pre> |
| 541 ; C equivalent: "table[3] = 4" |
| 542 ; 'table' is a static array of bytes. |
| 543 ldr r0, [pc, #124] ; Load the address of the 'table', |
| 544 ; "124" is the offset from here |
| 545 ; to the constant below. |
| 546 add r0, #3 ; Add the immediate array index. |
| 547 mov r1, #4 ; Get the constant '4' into a register. |
| 548 bic r0, #0xC0000000 ; Mask our array address. |
| 549 strb r1, [r0] ; Store one byte. |
| 550 ; ... |
| 551 .word table ; Constant referenced above. |
| 552 </pre> |
| 553 <p>Because table is a static array, the compiler knew its address at |
| 554 compile-time—but the address didn’t fit in a modified immediate. (Mo
st |
| 555 don’t). So, instead of loading an immediate into <code>r0</code> with a <
code>mov</code>, |
| 556 we stashed the address in the code, generated its address using <code>pc</code>, |
| 557 and loaded the constant. ARM compilers will typically group all the |
| 558 embedded data together into a literal pool. These typically live just |
| 559 past the end of functions, where they won’t be executed.</p> |
| 560 <p>This is an important trick in ARM code, so it’s important to support it |
| 561 in Native Client... but there’s a potential flaw. If we let programs |
| 562 contain arbitrary data, mingled in with the code, couldn’t they hide |
| 563 malicious instructions this way?</p> |
| 564 <p>The answer is no, because the validator disassembles the entire |
| 565 executable region of the program, without regard to whether the |
| 566 programmer said a certain chunk was code or data. But this brings the |
| 567 opposite problem: what if the program needs to contain a certain |
| 568 constant that just happens to encode a malicious instruction? We want |
| 569 to allow this, but we have to be certain it will never be executed as |
| 570 code!</p> |
| 571 <section id="data-bundles-to-the-rescue"> |
| 572 <h5 id="data-bundles-to-the-rescue">Data Bundles to the Rescue</h5> |
| 573 <p>As we discussed in the last section, ARM code in Native Client is |
| 574 structured in 16-byte bundles. We allow literal pools by putting them in |
| 575 special bundles, called data bundles. Each data bundle can contain 12 |
| 576 bytes of arbitrary data, and the program can have as many data bundles |
| 577 as it likes.</p> |
| 578 <p>Each data bundle starts with a breakpoint instruction, <code>bkpt</code>. Thi
s |
| 579 way, if an <em>indirect branch</em> tries to enter the data bundle, the process |
| 580 will take a fault and the trusted runtime will intervene (by terminating |
| 581 the program). For example:</p> |
| 582 <pre> |
| 583 bkpt #0x5BE0 ; Must be aligned 0 mod 16! |
| 584 .word 0xDEADBEEF ; Arbitrary constants are A-OK. |
| 585 svc #30 ; Trying to make a syscall? OK! |
| 586 str r0, [r1] ; Unmasked stores are fine too. |
| 587 </pre> |
| 588 <p>So, we have a way for programs to create an arbitrary, even dangerous, |
| 589 chunk of data within their code. We can prevent <em>indirect branch</em> from |
| 590 entering it. We can also prevent fall-through from the code just before |
| 591 it, by the <code>bkpt</code>. But what about <em>direct branch</em> straight int
o the |
| 592 middle?</p> |
| 593 <p>The validator detects all data bundles (because this <code>bkpt</code> has a |
| 594 special encoding) and marks them as off-limits for <em>direct branch</em>. If |
| 595 it finds a <em>direct branch</em> into a data bundle, the entire program is |
| 596 rejected as unsafe. Because <em>direct branch</em> cannot be modified at |
| 597 runtime, the data bundles cannot be executed.</p> |
| 598 <aside class="note"> |
| 599 Clever readers may wonder: why use <code>bkpt #0x5BE0</code>, that seems |
| 600 awfully specific when you just need a special “roadblock” instructio
n! |
| 601 Quite true, young Padawan! It happens that this odd <code>bkpt</code> |
| 602 instruction is encoded as <code>0xE125BE70</code> in A32, and in T32 the |
| 603 <code>bkpt</code> instruction is encoded as <code>0xBExx</code> (where <code>xx<
/code> could be |
| 604 any 8-bit immediate, say <code>0x70</code>) and <code>0xE125</code> encodes the
<em>branch</em> |
| 605 instruction <code>b.n #0x250</code>. The special roadblock instruction |
| 606 therefore doubles as a roadblock in T32, if anything were to go so |
| 607 awry that we tried to execute it as a T32 instruction! Much defense, |
| 608 such depth, wow! |
| 609 </aside> |
| 610 </section></section></section><section id="trampolines-and-memory-layout"> |
| 611 <h3 id="trampolines-and-memory-layout">Trampolines and Memory Layout</h3> |
| 612 <p>So far, the rules we’ve described make for boring programs: they can
217;t |
| 613 communicate with the outside world!</p> |
| 614 <ul class="small-gap"> |
| 615 <li>The program can’t call an external library, or the operating system, |
| 616 even to do something simple like draw some pixels on the screen.</li> |
| 617 <li>It also can’t read or write memory outside of its dedicated sandbox, |
| 618 so communicating that way is right out.</li> |
| 619 </ul> |
| 620 <p>We fix this by allowing the untrusted program to call into the trusted |
| 621 runtime using a trampoline. A trampoline is simply a short stretch of |
| 622 code, placed by the trusted runtime at a known location within the |
| 623 sandbox, that is permitted to do things the untrusted program can’t.</p> |
| 624 <p>Even though trampolines are inside the sandbox, the untrusted program |
| 625 can’t modify them: the trusted runtime marks them read-only. It also |
| 626 can’t do anything clever with the special instructions inside the |
| 627 trampoline—for example, call it at a slightly offset address to bypass |
| 628 some checks—because the validator only allows trampolines to be |
| 629 reached by <em>indirect branch</em> (or <em>branch-with-link</em>). We structure
the |
| 630 trampolines carefully so that they’re safe to enter at any <code>0 mod 16<
/code> |
| 631 address.</p> |
| 632 <p>The validator can detect attempts to use the trampolines because they’r
e |
| 633 loaded at a fixed location in memory. Let’s look at the memory map of |
| 634 the Native Client sandbox.</p> |
| 635 <section id="memory-map"> |
| 636 <h4 id="memory-map">Memory Map</h4> |
| 637 <p>The ARM sandbox is always at virtual address <code>0</code>, and is exactly 1
GiB |
| 638 in size. This includes the untrusted program’s code and data, the |
| 639 trampolines, and a small guard region to detect null pointer |
| 640 dereferences. In practice, the untrusted program takes up a bit more |
| 641 room than this, because of the need for additional guard regions at |
| 642 either end of the sandbox.</p> |
| 643 <table border="1" class="docutils"> |
| 644 <colgroup> |
| 645 </colgroup> |
| 646 <thead valign="bottom"> |
| 647 <tr class="row-odd"><th class="head">Address</th> |
| 648 <th class="head">Size</th> |
| 649 <th class="head">Name</th> |
| 650 <th class="head">Purpose</th> |
| 651 </tr> |
| 652 </thead> |
| 653 <tbody valign="top"> |
| 654 <tr class="row-even"><td><code>-0x2000</code></td> |
| 655 <td>8KiB</td> |
| 656 <td>Bottom Guard</td> |
| 657 <td>Keeps negative-displacement <em>load</em> or <em>store</em> from escaping.</
td> |
| 658 </tr> |
| 659 <tr class="row-odd"><td><code>0</code></td> |
| 660 <td>64KiB</td> |
| 661 <td>Null Guard</td> |
| 662 <td>Catches null pointer dereferences, guards against kernel exploits.</td> |
| 663 </tr> |
| 664 <tr class="row-even"><td><code>0x10000</code></td> |
| 665 <td>64KiB</td> |
| 666 <td>Trampolines</td> |
| 667 <td>Up to 2048 unique syscall entry points.</td> |
| 668 </tr> |
| 669 <tr class="row-odd"><td><code>0x20000</code></td> |
| 670 <td>~1GiB</td> |
| 671 <td>Untrusted Sandbox</td> |
| 672 <td>Contains untrusted code, followed by its heap/stack/memory.</td> |
| 673 </tr> |
| 674 <tr class="row-even"><td><code>0x40000000</code></td> |
| 675 <td>8KiB</td> |
| 676 <td>Top Guard</td> |
| 677 <td>Keeps positive-displacement <em>load</em> or <em>store</em> from escaping.</
td> |
| 678 </tr> |
| 679 </tbody> |
| 680 </table> |
| 681 <p>Within the trampolines, the untrusted program can call any address |
| 682 that’s <code>0 mod 16</code>. However, only even slots are used, so useful |
| 683 trampolines are always <code>0 mod 32</code>. If the program calls an odd slot, |
| 684 it will fault, and the trusted runtime will shut it down.</p> |
| 685 <aside class="note"> |
| 686 This is a bit of speculative flexibility. While the current bundle |
| 687 size of Native Client on ARM is 16 bytes, we’ve considered the |
| 688 possibility of optional 32-byte bundles, to enable certain compiler |
| 689 improvements. While this option isn’t available to untrusted programs |
| 690 today, we’re trying to keep the system “32-byte clean”. |
| 691 </aside> |
| 692 </section><section id="inside-a-trampoline"> |
| 693 <h4 id="inside-a-trampoline">Inside a Trampoline</h4> |
| 694 <p>When we introduced trampolines, we mentioned that they can do things |
| 695 that untrusted programs can’t. To be more specific, trampolines can jump |
| 696 to locations outside the sandbox. On ARM, this is all they do. Here’s a |
| 697 typical trampoline fragment on ARM:</p> |
| 698 <pre> |
| 699 ; Even trampoline bundle: |
| 700 push { r0-r3 } ; Save arguments that may be in registers. |
| 701 push { lr } ; Save the untrusted return address, |
| 702 ; separate step because it must be on top. |
| 703 ldr r0, [pc, #4] ; Load the destination address from |
| 704 ; the next bundle. |
| 705 blx r0 ; Go! |
| 706 ; The odd trampoline that immediately follows: |
| 707 bkpt 0x5be0 ; Prevent entry to this data bundle. |
| 708 .word address_of_routine |
| 709 </pre> |
| 710 <p>The only odd thing here is that we push the incoming value of <code>lr</code>
, |
| 711 and then use <code>blx</code>—not <code>bx</code>—to escape the sand
box. This is |
| 712 because, in practice, all trampolines jump to the same routine in the |
| 713 trusted runtime, called the syscall hook. It uses the return address |
| 714 produced by the final <code>blx</code> instruction to determine which trampoline |
| 715 was called.</p> |
| 716 </section></section><section id="loose-ends"> |
| 717 <h3 id="loose-ends">Loose Ends</h3> |
| 718 <section id="forbidden-instructions"> |
| 719 <h4 id="forbidden-instructions">Forbidden Instructions</h4> |
| 720 <p>To complete the sandbox, the validator ensures that the program does not |
| 721 try to use certain forbidden instructions.</p> |
| 722 <ul class="small-gap"> |
| 723 <li>We forbid instructions that directly interact with the operating |
| 724 system by going around the trusted runtime. We prevent this to limit |
| 725 the functionality of the untrusted program, and to ensure portability |
| 726 across operating systems.</li> |
| 727 <li>We forbid instructions that change the processor’s execution mode to |
| 728 Thumb, ThumbEE, or Jazelle. This would cause the code to be |
| 729 interpreted differently than the validator’s original 32-bit ARM |
| 730 disassembly, so the validator results might be invalidated.</li> |
| 731 <li>We forbid instructions that aren’t available to user code (i.e. have |
| 732 to be used by an operating system kernel). This is purely out of |
| 733 paranoia, because the hardware should prevent the instructions from |
| 734 working. Essentially, we consider it “suspicious” if a program |
| 735 contains these instructions—it might be trying to exploit a hardware |
| 736 bug.</li> |
| 737 <li>We forbid instructions, or variants of instructions, that are |
| 738 implementation-defined (“unpredictable”) or deprecated in the ARMv7-
A |
| 739 architecture manual.</li> |
| 740 <li>Finally, we forbid a small number of instructions, such as <code>setend</cod
e>, |
| 741 purely out of paranoia. It’s easier to loosen the validator’s |
| 742 restrictions than to tighten them, so we err on the side of rejecting |
| 743 safe instructions.</li> |
| 744 </ul> |
| 745 <p>If an instruction can’t be decoded at all within the ARMv7-A instructio
n |
| 746 set specification, it is forbidden.</p> |
| 747 <aside class="note"> |
| 748 <p>Here is a list of instructions currently forbidden for security |
| 749 reasons (that is, excluding deprecated or undefined instructions):</p> |
| 750 <ul class="small-gap"> |
| 751 <li><code>BLX</code> (immediate): always changes to Thumb mode.</li> |
| 752 <li><code>BXJ</code>: always changes to Jazelle mode.</li> |
| 753 <li><code>CPS</code>: not available to user code.</li> |
| 754 <li><code>LDM</code>, exception return version: not available to user code.</li> |
| 755 <li><code>LDM</code>, kernel version: not available to user code.</li> |
| 756 <li><code>LDR*T</code> (unprivileged load operations): theoretically harmless, |
| 757 but suspicious when found in user code. Use <code>LDR</code> instead.</li> |
| 758 <li><code>MSR</code>, kernel version: not available to user code.</li> |
| 759 <li><code>RFE</code>: not available to user code.</li> |
| 760 <li><code>SETEND</code>: theoretically harmless, but suspicious when found in |
| 761 user code. May make some future validator extensions difficult.</li> |
| 762 <li><code>SMC</code>: not available to user code.</li> |
| 763 <li><code>SRS</code>: not available to user code.</li> |
| 764 <li><code>STM</code>, kernel version: not available to user code.</li> |
| 765 <li><code>STR*T</code> (unprivileged store operations): theoretically harmless, |
| 766 but suspicious when found in user code. Use <code>STR</code> instead.</li> |
| 767 <li><code>SVC</code>/<code>SWI</code>: allows direct operating system interactio
n.</li> |
| 768 <li>Any unassigned hint instruction: difficult to reason about, so |
| 769 treated as suspicious.</li> |
| 770 </ul> |
| 771 <p>More details are available in the <a class="reference external" href="http://
src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/valida
tor_arm/armv7.table">ARMv7 instruction table definition</a>.</p> |
| 772 |
| 773 </aside> |
| 774 </section><section id="coprocessors"> |
| 775 <h4 id="coprocessors">Coprocessors</h4> |
| 776 <p>ARM has traditionally added new instruction set features through |
| 777 coprocessors. Coprocessors are accessed through a small set of |
| 778 instructions, and often have their own register files. Floating point |
| 779 and the NEON vector extensions are both implemented as coprocessors, as |
| 780 is the MMU.</p> |
| 781 <p>We’re confident that the side-effects of coprocessors in slots 10 and 1
1 |
| 782 (that is, floating point, NEON, etc.) are well-understood. These are in |
| 783 the coprocessor space reserved by ARM Ltd. for their own extensions |
| 784 (<code>CP8</code>–<code>CP15</code>), and are unlikely to change significa
ntly. So, we |
| 785 allow untrusted code to use coprocessors 10 and 11, and we mandate the |
| 786 presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor |
| 787 Extension, VFPv4, FP16 and other extensions are allowed but not |
| 788 required, and may fail on processors that do not support them, it is |
| 789 therefore the program’s responsibility to validate their availability |
| 790 before executing them.</p> |
| 791 <p>We don’t allow access to any other ARM-reserved coprocessor |
| 792 (<code>CP8</code>–<code>CP9</code> or <code>CP12</code>–<code>CP15</
code>). It’s possible that read |
| 793 access to <code>CP15</code> might be useful, and we might allow it in the |
| 794 future—but again, it’s easier to loosen the restrictions than tighte
n |
| 795 them, so we ban it for now.</p> |
| 796 <p>We do not, and probably never will, allow access to the vendor-specific |
| 797 coprocessor space, <code>CP0</code>–<code>CP7</code>. We’re simply n
ot confident in our |
| 798 ability to model the operations on these coprocessors, given that |
| 799 vendors often leave them poorly-specified. Unfortunately this eliminates |
| 800 some legacy floating point and vector implementations, but these are |
| 801 superceded on ARMv7-A parts anyway.</p> |
| 802 </section><section id="validator-code"> |
| 803 <h4 id="validator-code">Validator Code</h4> |
| 804 <p>By now you’re itching to see the sandbox validator’s code and dis
sect |
| 805 it. You’ll have a disapointing read: at less that 500 lines of code |
| 806 <a class="reference external" href="http://src.chromium.org/viewvc/native_client
/trunk/src/native_client/src/trusted/validator_arm/validator.cc">validator.cc</a
> |
| 807 is quite simple to understand and much shorter than this document. It’s |
| 808 of course dependent on the <a class="reference external" href="http://src.chromi
um.org/viewvc/native_client/trunk/src/native_client/src/trusted/validator_arm/ar
mv7.table">ARMv7 instruction table definition</a>, |
| 809 which teaches it about the ARMv7 instruction set.</p> |
| 810 </section></section></section></section> |
| 811 |
| 812 {{/partials.standard_nacl_article}} |
OLD | NEW |