Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(260)

Side by Side Diff: native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html

Issue 147803003: NaCl docs: add ARM 32-bit sandbox (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src
Patch Set: Address binji's comments. Created 6 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
OLDNEW
(Empty)
1 {{+bindTo:partials.standard_nacl_article}}
2
3 <section id="arm-32-bit-sandbox">
4 <h1 id="arm-32-bit-sandbox">ARM 32-bit Sandbox</h1>
5 <p>Native Client for ARM is a method for running programs&#8212;even malicious
6 ones&#8212;safely, on computers that use 32-bit ARM processors. It&#8217;s an
7 extension of earlier work on Native Client for x86 processors. This
8 security is provided with a low performance overhead of about 10% over
9 regular ARM code, and as you&#8217;ll see in this document the sandbox model
10 is beautifully simple, meaning that the trusted codebase is much easier
11 to validate.</p>
12 <p>As an implementation detail, the Native Client 32-bit ARM sandbox is
13 currently used by Portable Native Client to execute code on 32-bit ARM
14 machines in a safe manner. The portable bitcode contained in a <strong>pexe</str ong>
15 is translated to a 32-bit ARM <strong>nexe</strong> before execution. This may c hange
16 at a point in time: Portable Native Client doesn&#8217;t necessarily need this
17 sandbox to execute code on ARM. Note that the Portable Native Client
18 compiler itself is also untrusted: it too runs in the ARM sandbox
19 described in this document.</p>
20 <p>On this page, we describe how Native Client works on 32-bit ARM. We
21 assume no prior knowledge about the internals of Native Client, on x86
22 or any other architecture, but we do assume some familiarity with
23 assembly languages in general.</p>
24 <div class="contents local" id="contents" style="display: none">
25 <ul class="small-gap">
26 <li><p class="first"><a class="reference internal" href="#an-introduction-to-the -arm-architecture" id="id2">An Introduction to the ARM Architecture</a></p>
27 <ul class="small-gap">
28 <li><a class="reference internal" href="#about-arm-and-armv7-a" id="id3">About A RM and ARMv7-A</a></li>
29 <li><a class="reference internal" href="#arm-programmer-s-model" id="id4">ARM Pr ogrammer&#8217;s Model</a></li>
30 </ul>
31 </li>
32 <li><p class="first"><a class="reference internal" href="#the-native-client-appr oach" id="id5">The Native Client Approach</a></p>
33 <ul class="small-gap">
34 <li><p class="first"><a class="reference internal" href="#nacl-arm-pure-software -fault-isolation" id="id6">NaCl/ARM: Pure Software Fault Isolation</a></p>
35 <ul class="small-gap">
36 <li><a class="reference internal" href="#load-and-store" id="id7"><em>Load</em> and <em>Store</em></a></li>
37 <li><a class="reference internal" href="#the-stack-pointer-thread-pointer-and-pr ogram-counter" id="id8">The Stack Pointer, Thread Pointer, and Program Counter</ a></li>
38 <li><a class="reference internal" href="#indirect-branch" id="id9"><em>Indirect Branch</em></a></li>
39 <li><a class="reference internal" href="#literal-pools-and-data-bundles" id="id1 0">Literal Pools and Data Bundles</a></li>
40 </ul>
41 </li>
42 <li><p class="first"><a class="reference internal" href="#trampolines-and-memory -layout" id="id11">Trampolines and Memory Layout</a></p>
43 <ul class="small-gap">
44 <li><a class="reference internal" href="#memory-map" id="id12">Memory Map</a></l i>
45 <li><a class="reference internal" href="#inside-a-trampoline" id="id13">Inside a Trampoline</a></li>
46 </ul>
47 </li>
48 <li><p class="first"><a class="reference internal" href="#loose-ends" id="id14"> Loose Ends</a></p>
49 <ul class="small-gap">
50 <li><a class="reference internal" href="#forbidden-instructions" id="id15">Forbi dden Instructions</a></li>
51 <li><a class="reference internal" href="#coprocessors" id="id16">Coprocessors</a ></li>
52 <li><a class="reference internal" href="#validator-code" id="id17">Validator Cod e</a></li>
53 </ul>
54 </li>
55 </ul>
56 </li>
57 </ul>
58
59 </div><section id="an-introduction-to-the-arm-architecture">
60 <h2 id="an-introduction-to-the-arm-architecture">An Introduction to the ARM Arch itecture</h2>
61 <p>In this section, we summarize the relevant parts of the ARM processor
62 architecture.</p>
63 <section id="about-arm-and-armv7-a">
64 <h3 id="about-arm-and-armv7-a">About ARM and ARMv7-A</h3>
65 <p>ARM is one of the older commercial &#8220;RISC&#8221; processor designs, dati ng back
66 to the early 1980s. Today, it is used primarily in embedded systems:
67 everything from toys, to home automation, to automobiles. However, its
68 most visible use is in cellular phones, tablets and some
69 laptops.</p>
70 <p>Through the years, there have been many revisions of the ARM
71 architecture, written as ARMv<em>X</em> for some version <em>X</em>. Native Clie nt
72 specifically targets the ARMv7-A architecture commonly used in high-end
73 phones and smartbooks. This revision, defined in the mid-2000s, adds a
74 number of useful instructions, and specifies some portions of the system
75 that used to be left to individual chip manufacturers. Critically,
76 ARMv7-A specifies the &#8220;eXecute Never&#8221; bit, or <em>XN</em>. This page table
77 attribute lets us mark memory as non-executable. Our security relies on
78 the presence of this feature.</p>
79 <p>ARMv8 adds a new 64-bit instruction set architecture called A64, while
80 also enhancing the 32-bit A32 ISA. For Native Client&#8217;s purposes the A32
81 ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new
82 instructions. This document only discussed the 32-bit A32 instruction
83 set: A64 would require a different sandboxing model.</p>
84 </section><section id="arm-programmer-s-model">
85 <h3 id="arm-programmer-s-model">ARM Programmer&#8217;s Model</h3>
86 <p>While modern ARM chips support several instruction encodings, 32-bit
87 Native Client on ARM focuses on a single one: a fixed-width encoding
88 where every instruction is 32-bits wide called A32 (previously, and
89 confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called
90 T32), Jazelle, ThumbEE and such aren&#8217;t supported by Native Client. This
91 dramatically simplifies some of our analyses, as we&#8217;ll see later. Nearly
92 every instruction can be conditionally executed based on the contents of
93 a dedicated condition code register.</p>
94 <p>ARM processors have 16 general-purpose registers used for integer and
95 memory operations, written <code>r0</code> through <code>r15</code>. Of these, t wo have
96 special roles baked in to the hardware:</p>
97 <ul class="small-gap">
98 <li><code>r14</code> is the Link Register. The ARM <em>call</em> instruction
99 (<em>branch-with-link</em>) doesn&#8217;t use the stack directly. Instead, it
100 stashes the return address in <code>r14</code>. In other circumstances, <code>r1 4</code>
101 can be (and is!) used as a general-purpose register. When <code>r14</code> is
102 playing its Link Register role, it&#8217;s referred to as <code>lr</code>.</li>
103 <li><code>r15</code> is the Program Counter. While it can be read and written li ke
104 any other register, setting it to a new value will cause execution to
105 jump to a new address. Using it in some circumstances is also
106 undefined by the ARM architecture. Because of this, <code>r15</code> is never
107 used for anything else, and is referred to as <code>pc</code>.</li>
108 </ul>
109 <p>Other registers are given roles by convention. The only important
110 registers to Native Client are <code>r9</code> and <code>r13</code>, which are u sed as the
111 Thread Pointer location and Stack Pointer. When playing this role,
112 they&#8217;re referred to as <code>tp</code> and <code>sp</code>.</p>
113 <p>Like other RISC-inspired designs, ARM programs use explicit <em>load</em> and
114 <em>store</em> instructions to access memory. All other instructions operate
115 only on registers, or on registers and small constants called
116 immediates. Because both instructions and data words are 32-bits, we
117 can&#8217;t simply embed a 32-bit number into an instruction. ARM programs use
118 three methods to work around this, all of which Native Client exploits:</p>
119 <ol class="arabic simple">
120 <li>Many instructions can encode a modified immediate, which is an 8-bit
121 number rotated right by an even number of bits.</li>
122 <li>The <code>movw</code> and <code>movt</code> instructions can be used to set the top and
123 bottom 16-bits of a register, and can therefore encode any 32-bit
124 immediate.</li>
125 <li>For values that can&#8217;t be represented as modified immediates, ARM
126 programs use <code>pc</code>-relative loads to load data from inside the
127 code&#8212;hidden in a place where it won&#8217;t be executed such as &#8220;con stant
128 pools&#8221;, just past the final return of a function.</li>
129 </ol>
130 <p>We&#8217;ll introduce more details of the ARM instruction set later, as we
131 walk through the system.</p>
132 </section></section><section id="the-native-client-approach">
133 <h2 id="the-native-client-approach">The Native Client Approach</h2>
134 <p>Native Client runs an untrusted program, potentially from an unknown or
135 malicious source, inside a sandbox created by a trusted runtime. The
136 trusted runtime allows the untrusted program to &#8220;call-out&#8221; and perfo rm
137 certain actions, such as drawing graphics, but prevents it from
138 accessing the operating system directly. This &#8220;call-out&#8221; facility,
139 called a trampoline, looks like a standard function call to the
140 untrusted program, but it allows control to escape from the sandbox in a
141 controlled way.</p>
142 <p>The untrusted program and trusted runtime inhabit the same process, or
143 virtual address space, maintained by the operating system. To keep the
144 trusted runtime behaving the way we expect, we must prevent the
145 untrusted program from accessing and modifying its internals. Since they
146 share a virtual address space, we can&#8217;t rely on the operating system for
147 this. Instead, we isolate the untrusted program from the trusted
148 runtime.</p>
149 <p>Unlike modern operating systems, we use a cooperative isolation
150 method. Native Client can&#8217;t run any off-the-shelf program compiled for
151 an off-the-shelf operating system. The program must be compiled to
152 comply with Native Client&#8217;s rules. The details vary on each platform,
153 but in general, the untrusted program:</p>
154 <ul class="small-gap">
155 <li>Must not attempt to use certain forbidden instructions, such as system
156 calls.</li>
157 <li>Must not attempt to modify its own code without abiding by Native
158 Client&#8217;s code modification rules.</li>
159 <li>Must not jump into the middle of an instruction group, or otherwise do
160 tricky things to cause instructions to be interpreted multiple ways.</li>
161 <li>Must use special, strictly-defined instruction sequences to perform
162 permitted but potentially dangerous actions. We call these sequences
163 pseudo-instructions.</li>
164 </ul>
165 <p>We can&#8217;t simply take the program&#8217;s word that it complies with the se
166 rules&#8212;we call it &#8220;untrusted&#8221; for a reason! Nor do we require i t to be
167 produced by a special compiler; in practice, we don&#8217;t trust our
168 compilers either. Instead, we apply a load-time validator that
169 disassembles the program. The validator either proves that the program
170 complies with our rules, or rejects it as unsafe. By keeping the rules
171 simple, we keep the validator simple, small, and fast. We like to put
172 our trust in small, simple things, and the validator is key to the
173 system&#8217;s security.</p>
174 <aside class="note">
175 For the computationally-inclined, all our validators scale linearly in
176 the size of the program.
177 </aside>
178 <section id="nacl-arm-pure-software-fault-isolation">
179 <h3 id="nacl-arm-pure-software-fault-isolation">NaCl/ARM: Pure Software Fault Is olation</h3>
180 <p>In the original Native Client system for the x86, we used unusual
181 hardware features of that processor (the segment registers) to isolate
182 untrusted programs. This was simple and fast, but won&#8217;t work on ARM,
183 which has nothing equivalent. Instead, we use pure software fault
184 isolation.</p>
185 <p>We use a fixed address space layout: the untrusted program gets the
186 lowest gigabyte, addresses <code>0</code> through <code>0x3FFFFFFF</code>. The r est of the
187 address space holds the trusted runtime and the operating system. We
188 isolate the program by requiring every <em>load</em>, <em>store</em>, and <em>in direct
189 branch</em> (to an address in a register) to use a pseudo-instruction. The
190 pseudo-instructions ensure that the address stays within the
191 sandbox. The <em>indirect branch</em> pseudo-instruction, in turn, ensures that
192 such branches won&#8217;t split up other pseudo-instructions.</p>
193 <p>At either side of the sandbox, we place small (8KiB) guard
194 regions. These are simply areas in the process&#8217;s address space that are
195 mapped without read, write, or execute permissions, so any attempt to
196 access them for any reason&#8212;<em>load</em>, <em>store</em>, or <em>jump</em> &#8212;will cause a
197 fault.</p>
198 <p>Finally, we ban the use of certain instructions, notably direct system
199 calls. This is to ensure that the untrusted program can be run on any
200 operating system supported by Native Client, and to prevent access to
201 certain system features that might be used to subvert the sandbox. As a
202 side effect, it helps to prevent programs from exploiting buggy
203 operating system APIs.</p>
204 <p>Let&#8217;s walk through the details, starting with the simplest part: <em>lo ad</em>
205 and <em>store</em>.</p>
206 <section id="load-and-store">
207 <h4 id="load-and-store"><em>Load</em> and <em>Store</em></h4>
208 <p>All access to memory must be through <em>load</em> and <em>store</em>
209 pseudo-instructions. These are simply a native <em>load</em> or <em>store</em>
210 instruction, preceded by a guard instruction.</p>
211 <p>Each <em>load</em> or <em>store</em> pseudo-instruction is similar to the <em >load</em> shown
212 below. We use abstract &#8220;placeholder&#8221; registers instead of specific
213 numbered registers for the sake of discussion. <code>rA</code> is the register
214 holding the address to load from. <code>rD</code> is the destination for the
215 loaded data.</p>
216 <pre>
217 bic rA, #0xC0000000
218 ldr rD, [rA]
219 </pre>
220 <p>The first instruction, <code>bic</code>, clears the top two bits of <code>rA< /code>. In
221 this case, that means that the value in <code>rA</code> is forced to an address
222 inside our sandbox, between <code>0</code> and <code>0x3FFFFFFF</code>, inclusiv e.</p>
223 <p>The second instruction, <code>ldr</code>, uses the previously-sandboxed addre ss
224 to load a value. This address might not be the address that the program
225 intended, and might cause an access to an unmapped memory location
226 within the sandbox: <code>bic</code> forces the address to be valid, by clearing
227 the top two bits. This is a no-op in a correct program.</p>
228 <p>This illustrates a common property of all Native Client systems: we aim
229 for safety, not correctness. A program using an invalid address in
230 <code>rA</code> here is simply broken, so we are free to do whatever we want to
231 preserve safety. In this case the program might load an invalid (but
232 safe) value, or cause a segmentation fault limited to the untrusted
233 code.</p>
234 <p>Now, if we allowed arbitrary branches within the program, a malicious
235 program could set up carefully-crafted values in <code>rA</code>, and then jump
236 straight to the <code>ldr</code>. This is why we validate that programs never
237 split pseudo-instructions.</p>
238 <section id="alternative-sandboxing">
239 <h5 id="alternative-sandboxing">Alternative Sandboxing</h5>
240 <pre>
241 tst rA, #0xC0000000
242 ldreq rD, [rA]
243 </pre>
244 <p>The first instruction, <code>tst</code>, performs a bitwise-<code>AND</code> of <code>rA</code>
245 and the modified immediate literal, <code>0xC0000000</code>. It sets the
246 condition flags based on the result, but does not write the result to a
247 register. In particular, it sets the <code>Z</code> condition flag if the result
248 was zero&#8212;if the two values had no set bits in common. In this case,
249 that means that the value in <code>rA</code> was an address inside our sandbox,
250 between <code>0</code> and <code>0x3FFFFFFF</code>, inclusive.</p>
251 <p>The second instruction, <code>ldreq</code>, is a conditional load if equal. A s we
252 mentioned before, nearly all ARM instructions can be made
253 conditional. In assembly language, we simply stick the desired condition
254 on the end of the instruction&#8217;s mnemonic name. Here, the condition is
255 <code>EQ</code>, which causes the instruction to execute only if the <code>Z</co de> flag
256 is set.</p>
257 <p>Thus, when the pseudo-instruction executes, the <code>tst</code> sets <code>Z </code> if
258 (and only if) the value in <code>rA</code> is an address within the bounds of th e
259 sandbox, and then the <code>ldreq</code> loads if (and only if) it was. If <code >rA</code>
260 held an invalid address, the <em>load</em> does not execute, and <code>rD</code> is
261 unchanged.</p>
262 <aside class="note">
263 The <code>tst</code>-based sequence is faster than the <code>bic</code>-based se quence
264 on modern ARM chips. It avoids a data dependency in the address
265 register. This is why we keep both around. The <code>tst</code>-based sequence
266 unfortunately leaks information on some processors, and is therefore
267 forbidden on certain processors. This effectively means that it cannot
268 be used for regular Native Client <strong>nexe</strong> files, but can be used w ith
269 Portable Native Client because the target processor is known at
270 translation time from <strong>pexe</strong> to <strong>nexe</strong>.
271 </aside>
272 </section><section id="addressing-modes">
273 <h5 id="addressing-modes">Addressing Modes</h5>
274 <p>ARM has an unusually rich set of addressing modes. We allow all but one:
275 register-indexed, where two registers are added to determine the
276 address.</p>
277 <p>We permit simple <em>load</em> and <em>store</em>, as shown above. We also pe rmit
278 displacement, pre-index, and post-index memory operations:</p>
279 <pre>
280 bic rA, #0xC0000000
281 ldr rD, [rA, #1234] ; This is fine.
282 bic rA, #0xC0000000
283 ldr rD, [rA, #1234]! ; Also fine.
284 bic rA, #0xC0000000
285 ldr rD, [rA], #1234 ; Looking good.
286 </pre>
287 <p>In each case, we know <code>rA</code> points into the sandbox when the <code> ldr</code>
288 executes. We allow adding an immediate displacement to <code>rA</code> to
289 determine the final address (as in the first two examples here) because
290 the largest immediate displacement is ±4095 bytes, while our guard pages
291 are 8192 bytes wide.</p>
292 <p>We also allow ARM&#8217;s more unusual <em>load</em> and <em>store</em> instr uctions, such
293 as <em>load-multiple</em> and <em>store-multiple</em>, etc.</p>
294 </section><section id="conditional-load-and-store">
295 <h5 id="conditional-load-and-store">Conditional <em>Load</em> and <em>Store</em> </h5>
296 <p>There&#8217;s one problem with the pseudo-instructions shown above: they are
297 unconditional (assuming <code>rA</code> is valid). ARM compilers regularly use
298 conditional <em>load</em> and <em>store</em>, so we should support this in Nativ e
299 Client. We do so by defining alternate, predictable
300 pseudo-instructions. Here is a conditional <em>store</em>
301 (<em>store-if-greater-than</em>) using this pseudo-instruction sequence:</p>
302 <pre>
303 bicgt rA, #0xC0000000
304 strgt rX, [rA, #123]
305 </pre>
306 </section></section><section id="the-stack-pointer-thread-pointer-and-program-co unter">
307 <h4 id="the-stack-pointer-thread-pointer-and-program-counter">The Stack Pointer, Thread Pointer, and Program Counter</h4>
308 <section id="stack-pointer">
309 <h5 id="stack-pointer">Stack Pointer</h5>
310 <p>In C-like languages, the stack is used to store return addresses during
311 function calls, as well as any local variables that won&#8217;t fit in
312 registers. This makes stack operations very common.</p>
313 <p>Native Client does not require guard instructions on any <em>load</em> or
314 <em>store</em> involving the stack pointer, <code>sp</code>. This improves perfo rmance
315 and reduces code size. However, ARM&#8217;s stack pointer isn&#8217;t special: i t&#8217;s
316 just another register, called <code>sp</code> only by convention. To make it saf e
317 to use this register as a <em>load</em> or <em>store</em> address without guards , we
318 add a rule: <code>sp</code> must always contain a valid address.</p>
319 <p>We enforce this rule by restricting the sorts of operations that
320 programs can use to alter <code>sp</code>. Programs can alter <code>sp</code> by adding or
321 subtracting an immediate, as a side-effect of a <em>load</em> or <em>store</em>: </p>
322 <pre>
323 ldr rX, [sp], #4! ; Load from stack, then add 4 to sp.
324 str rX, [sp, #1234]! ; Add 1234 to sp, then store to stack.
325 </pre>
326 <p>These are safe because, as we mentioned before, the largest immediate
327 available in a <em>load</em> or <em>store</em> is ±4095. Even after adding or
328 subtracting 4095, the stack pointer will still be within the sandbox or
329 guard regions.</p>
330 <p>Any other operation that alters <code>sp</code> must be followed by a guard
331 instruction. The most common alterations, in practice, are addition and
332 subtraction of arbitrary integers:</p>
333 <pre>
334 add sp, rX
335 bic sp, #0xC0000000
336 </pre>
337 <p>The <code>bic</code> is similar to the one we used for conditional <em>load</ em> and
338 <em>store</em>, and serves exactly the same purpose: after it completes, <code>s p</code>
339 is a valid address.</p>
340 <aside class="note">
341 Clever assembly programmers and compilers may want to use this
342 &#8220;trusted&#8221; property of <code>sp</code> to emit more efficient code: i n a hot
343 loop instead of using <code>sp</code> as a stack pointer it can be temporarily
344 used as an index pointer (e.g. to traverse an array). This avoids the
345 extra <code>bic</code> whenever the pointer is updated in the loop.
346 </aside>
347 </section><section id="thread-pointer-loads">
348 <h5 id="thread-pointer-loads">Thread Pointer Loads</h5>
349 <p>The thread pointer and IRT thread pointer are stored in the trusted
350 address space. All uses and definitions of <code>r9</code> from untrusted code
351 are forbidden except as follows:</p>
352 <pre>
353 ldr Rn, [r9] ; Load user thread pointer.
354 ldr Rn, [r9, #4] ; Load IRT thread pointer.
355 </pre>
356 </section><section id="pc-relative-loads">
357 <h5 id="pc-relative-loads"><code>pc</code>-relative Loads</h5>
358 <p>By extension, we also allow <em>load</em> through the <code>pc</code> without a
359 mask. The explanation is quite similar:</p>
360 <ul class="small-gap">
361 <li>Our control-flow isolation rules mean that the <code>pc</code> will always
362 point into the sandbox.</li>
363 <li>The maximum immediate displacement that can be used in a
364 <code>pc</code>-relative <em>load</em> is smaller than the width of the guard pa ges.</li>
365 </ul>
366 <p>We do not allow <code>pc</code>-relative stores, because they look suspicious ly
367 like self-modifying code, or any addressing mode that would alter the
368 <code>pc</code> as a side effect of the <em>load</em>.</p>
369 </section></section><section id="indirect-branch">
370 <h4 id="indirect-branch"><em>Indirect Branch</em></h4>
371 <p>There are two types of control flow on ARM: direct and indirect. Direct
372 control flow instructions have an embedded target address or
373 offset. Indirect control flow instructions take their destination
374 address from a register. The <code>b</code> (branch) and <code>bl</code>
375 (<em>branch-with-link</em>) instructions are <em>direct branch</em> and <em>call </em>,
376 respectively. The <code>bx</code> (<em>branch-exchange</em>) and <code>blx</code >
377 (<em>branch-with-link-exchange</em>) are the indirect equivalents.</p>
378 <p>Because the program counter <code>pc</code> is simply another register, ARM a lso
379 has many implicit indirect control flow instructions. Programs can
380 operate on the <code>pc</code> using <em>add</em> or <em>load</em>, or even outl andish (and
381 often specified as having unpredictable-behavior) things like multiply!
382 In Native Client we ban all such instructions. Indirect control flow is
383 exclusively through <code>bx</code> and <code>blx</code>. Because all of ARM&#82 17;s control
384 flow instructions are called <em>branch</em> instructions, we&#8217;ll use the t erm
385 <em>indirect branch</em> from here on, even though this includes things like
386 <em>virtual call</em>, <em>return</em>, and the like.</p>
387 <section id="the-trouble-with-indirection">
388 <h5 id="the-trouble-with-indirection">The Trouble with Indirection</h5>
389 <p><em>Indirect branch</em> present two problems for Native Client:</p>
390 <ul class="small-gap">
391 <li>We must ensure that they don&#8217;t send execution outside the sandbox.</li >
392 <li>We must ensure that they don&#8217;t break up the instructions inside a
393 pseudo-instruction, by landing on the second one.</li>
394 </ul>
395 <aside class="note">
396 On the x86 architectures we must also ensure that it doesn&#8217;t land
397 inside an instruction. This is unnecessary on ARM, where all
398 instructions are 32-bit wide.
399 </aside>
400 <p>Checking both of these for <em>direct branch</em> is easy: the validator just
401 pulls the (fixed) target address out of the instruction and checks what
402 it points to.</p>
403 </section><section id="the-native-client-solution-bundles">
404 <h5 id="the-native-client-solution-bundles">The Native Client Solution: &#8220;B undles&#8221;</h5>
405 <p>For <em>indirect branch</em>, we can address the first problem by simply
406 masking some high-order bits off the address, like we did for <em>load</em> and
407 <em>store</em>. The second problem is more subtle. Detecting every possible
408 route that every <em>indirect branch</em> might take is difficult. Instead, we
409 take the approach pioneered by the original Native Client: we restrict
410 the possible places that any <em>indirect branch</em> can land. On Native
411 Client for ARM, <em>indirect branch</em> can target any address that has its
412 bottom four bits clear&#8212;any address that&#8217;s <code>0 mod 16</code>. We call these
413 16-byte chunks of code &#8220;bundles&#8221;. The validator makes sure that no
414 pseudo-instruction straddles a bundle boundary. Compilers must pad with`
415 <cite>nop`</cite>s to ensure that every pseudo-instruction fits entirely inside
416 one bundle.</p>
417 <p>Here is the <em>indirect branch</em> pseudo-instruction. As you can see, it
418 clears the top two and bottom four bits of the address:</p>
419 <pre>
420 bic rA, #0xC000000F
421 bx rA
422 </pre>
423 <p>This particular pseudo-instruction (a <code>bic</code> followed by a <code>bx </code>) is
424 used for computed jumps in switch tables and returning from functions,
425 among other uses. Recall that, under ARM&#8217;s modified immediate rules, we
426 can fit the constant <code>0xC000000F</code> into the <code>bic</code> instructi on&#8217;s
427 immediate field: <code>0xC000000F</code> is the 8-bit constant <code>0xFC</code> , rotated
428 right by 4 bits.</p>
429 <p>The other useful variant is the <em>indirect branch-with-link</em>, which is
430 the ARM equivalent to <em>call</em>:</p>
431 <pre>
432 bic rA, #0xC000000F
433 blx rA
434 </pre>
435 <p>This is used for indirect function calls&#8212;commonly seen in C++ programs
436 as virtual calls, but also for calling function pointers in C.</p>
437 <p>Note that both <em>indirect branch</em> pseudo-instructions use <code>bic</co de>, rather
438 than the <code>tst</code> instruction we allow for <em>load</em> and <em>store</ em>. There are
439 two reasons for this:</p>
440 <ol class="arabic simple">
441 <li>Conditional <em>branch</em> is very common. Much more common than
442 conditional <em>load</em> and <em>store</em>. If we supported an alternative
443 <code>tst</code>-based sequence for <em>branch</em>, it would be rare.</li>
444 <li>There&#8217;s no performance benefit to using <code>tst</code> here on moder n ARM
445 chips. <em>Branch</em> consumes its operands later in the pipeline than
446 <em>load</em> and <em>store</em> (since they don&#8217;t have to generate an add ress,
447 etc) so this sequence doesn&#8217;t stall.</li>
448 </ol>
449 <aside class="note">
450 <p>At this point astute readers are wondering what the <code>x</code> in <code>b x</code>
451 and <code>blx</code> means. We told you it stood for &#8220;exchange&#8221;, but exchange
452 to what? ARM, for all the reduced-ness of its instruction set, can
453 change execution mode from A32 (ARM) to T32 (Thumb) and back with
454 these <em>branch</em> instructions, called <em>interworking branch</em>. Recall that
455 A32 instructions are 32-bit wide, and T32 instructions are a mix of
456 both 16-bit or 32-bit wide. The destination address given to a
457 <em>branch</em> therefore cannot sensibly have its bottom bit set in either
458 instruction set: that would be an unaligned instruction in both cases,
459 and ARM simply doesn&#8217;t support this. The bottom bit for the <em>indirect
460 branch</em> was therefore cleverly recycled by the ARM architecture to
461 mean &#8220;switch to T32 mode&#8221; when set!</p>
462 <p>As you&#8217;ve figured out by now, Native Client&#8217;s sandbox won&#8217;t be very
463 happy if A32 instructions were to be executed as T32 instructions: who
464 know what they correspond to? A malicious person could craft valid
465 A32 code that&#8217;s actually very naughty T32 code, somewhat like forming
466 a sentence that happens to be valid in English and French but with
467 completely different meanings, complimenting the reader in one
468 language and insulting them in the other.</p>
469 <p>You&#8217;ve figured out by now that the bundle alignment restrictions of
470 the Native Client sandbox already take care of making this travesty
471 impossible: by masking off the bottom 4 bits of the destination the
472 interworking nature of ARM&#8217;s <em>indirect branch</em> is completely avoide d.</p>
473
474 </aside>
475 </section><section id="call-and-return">
476 <h5 id="call-and-return"><em>Call</em> and <em>Return</em></h5>
477 <p>On ARM, there is no <em>call</em> or <em>return</em> instruction. A <em>call< /em> is simply a
478 <em>branch</em> that just happen to load a return address into <code>lr</code>, the link
479 register. If the called function is a leaf (that is, if it calls no
480 other functions before returning), it simply branches to the address
481 stored in <code>lr</code> to <em>return</em> to its caller:</p>
482 <pre>
483 bic lr, #0xC000000F
484 bx lr
485 </pre>
486 <p>If the function called other functions, however, it had to spill <code>lr</co de>
487 onto the stack. On x86, this is done implicitly, but it is explicit on
488 ARM:</p>
489 <pre>
490 push { lr }
491 ; Some code here...
492 pop { lr }
493 bic lr, #0xC000000F
494 bx lr
495 </pre>
496 <p>There are two things to note about this code.</p>
497 <ol class="arabic simple">
498 <li>As we mentioned before, we don&#8217;t allow arbitrary instructions to
499 write to the Program Counter, <code>pc</code>. Thus, while a traditional ARM
500 program might have popped directly into <code>pc</code> to end the function,
501 we require a pop into a register, followed by a pseudo-instruction.</li>
502 <li>Function returns really are just <em>indirect branch</em>, with the same
503 restrictions. This means that functions can only return to addresses
504 that are bundle-aligned: <code>0 mod 16</code>.</li>
505 </ol>
506 <p>The implication here is that a <em>call</em>&#8212;the <em>branch</em> that e nters
507 functions&#8212;must be placed at the end of the bundle, so that the return
508 address they generate is <code>0 mod 16</code>. Otherwise, when we clear the
509 bottom four bits, the program would enter an infinite loop! (Native
510 Client doesn&#8217;t try to prevent infinite loops, but the validator actually
511 does check the alignment of calls. This is because, when we were writing
512 the compiler, it was annoying to find out our calls were in the wrong
513 place by having the program run forever!)</p>
514 <aside class="note">
515 Properly balancing the CPU&#8217;s <em>call</em>/<em>return</em> actually allows it to
516 perform much better by allowing it to speculatively execute the return
517 address&#8217; code. For more information on ARM&#8217;s <em>call</em>/<em>retur n</em> stack see
518 ARM&#8217;s technical reference manual.
519 </aside>
520 </section></section><section id="literal-pools-and-data-bundles">
521 <h4 id="literal-pools-and-data-bundles">Literal Pools and Data Bundles</h4>
522 <p>In the section where we described the ARM architecture, we mentioned
523 ARM&#8217;s unusual immediate forms. To restate:</p>
524 <ul class="small-gap">
525 <li>ARM instructions are fixed-length, 32-bits, so we can&#8217;t have an
526 instruction that includes an arbitrary 32-bit constant.</li>
527 <li>Many ARM instructions can include a modified immediate constant, which
528 is flexible, but limited.</li>
529 <li>For any other value (particularly addresses), ARM programs explicitly
530 load constants from inside the code itself.</li>
531 </ul>
532 <aside class="note">
533 ARMv7 introduces some instructions, <code>movw</code> and <code>movt</code>, tha t try to
534 address this by letting us directly load larger constants. Our
535 toolchain uses this capability in some cases.
536 </aside>
537 <p>Here&#8217;s a typical example of the use of a literal pool. ARM assemblers
538 typically hide the details&#8212;this is the sort of code you&#8217;d see produc ed
539 by a disassembler, but with more comments.</p>
540 <pre>
541 ; C equivalent: &quot;table[3] = 4&quot;
542 ; 'table' is a static array of bytes.
543 ldr r0, [pc, #124] ; Load the address of the 'table',
544 ; &quot;124&quot; is the offset from here
545 ; to the constant below.
546 add r0, #3 ; Add the immediate array index.
547 mov r1, #4 ; Get the constant '4' into a register.
548 bic r0, #0xC0000000 ; Mask our array address.
549 strb r1, [r0] ; Store one byte.
550 ; ...
551 .word table ; Constant referenced above.
552 </pre>
553 <p>Because table is a static array, the compiler knew its address at
554 compile-time&#8212;but the address didn&#8217;t fit in a modified immediate. (Mo st
555 don&#8217;t). So, instead of loading an immediate into <code>r0</code> with a < code>mov</code>,
556 we stashed the address in the code, generated its address using <code>pc</code>,
557 and loaded the constant. ARM compilers will typically group all the
558 embedded data together into a literal pool. These typically live just
559 past the end of functions, where they won&#8217;t be executed.</p>
560 <p>This is an important trick in ARM code, so it&#8217;s important to support it
561 in Native Client... but there&#8217;s a potential flaw. If we let programs
562 contain arbitrary data, mingled in with the code, couldn&#8217;t they hide
563 malicious instructions this way?</p>
564 <p>The answer is no, because the validator disassembles the entire
565 executable region of the program, without regard to whether the
566 programmer said a certain chunk was code or data. But this brings the
567 opposite problem: what if the program needs to contain a certain
568 constant that just happens to encode a malicious instruction? We want
569 to allow this, but we have to be certain it will never be executed as
570 code!</p>
571 <section id="data-bundles-to-the-rescue">
572 <h5 id="data-bundles-to-the-rescue">Data Bundles to the Rescue</h5>
573 <p>As we discussed in the last section, ARM code in Native Client is
574 structured in 16-byte bundles. We allow literal pools by putting them in
575 special bundles, called data bundles. Each data bundle can contain 12
576 bytes of arbitrary data, and the program can have as many data bundles
577 as it likes.</p>
578 <p>Each data bundle starts with a breakpoint instruction, <code>bkpt</code>. Thi s
579 way, if an <em>indirect branch</em> tries to enter the data bundle, the process
580 will take a fault and the trusted runtime will intervene (by terminating
581 the program). For example:</p>
582 <pre>
583 bkpt #0x5BE0 ; Must be aligned 0 mod 16!
584 .word 0xDEADBEEF ; Arbitrary constants are A-OK.
585 svc #30 ; Trying to make a syscall? OK!
586 str r0, [r1] ; Unmasked stores are fine too.
587 </pre>
588 <p>So, we have a way for programs to create an arbitrary, even dangerous,
589 chunk of data within their code. We can prevent <em>indirect branch</em> from
590 entering it. We can also prevent fall-through from the code just before
591 it, by the <code>bkpt</code>. But what about <em>direct branch</em> straight int o the
592 middle?</p>
593 <p>The validator detects all data bundles (because this <code>bkpt</code> has a
594 special encoding) and marks them as off-limits for <em>direct branch</em>. If
595 it finds a <em>direct branch</em> into a data bundle, the entire program is
596 rejected as unsafe. Because <em>direct branch</em> cannot be modified at
597 runtime, the data bundles cannot be executed.</p>
598 <aside class="note">
599 Clever readers may wonder: why use <code>bkpt #0x5BE0</code>, that seems
600 awfully specific when you just need a special &#8220;roadblock&#8221; instructio n!
601 Quite true, young Padawan! It happens that this odd <code>bkpt</code>
602 instruction is encoded as <code>0xE125BE70</code> in A32, and in T32 the
603 <code>bkpt</code> instruction is encoded as <code>0xBExx</code> (where <code>xx< /code> could be
604 any 8-bit immediate, say <code>0x70</code>) and <code>0xE125</code> encodes the <em>branch</em>
605 instruction <code>b.n #0x250</code>. The special roadblock instruction
606 therefore doubles as a roadblock in T32, if anything were to go so
607 awry that we tried to execute it as a T32 instruction! Much defense,
608 such depth, wow!
609 </aside>
610 </section></section></section><section id="trampolines-and-memory-layout">
611 <h3 id="trampolines-and-memory-layout">Trampolines and Memory Layout</h3>
612 <p>So far, the rules we&#8217;ve described make for boring programs: they can&#8 217;t
613 communicate with the outside world!</p>
614 <ul class="small-gap">
615 <li>The program can&#8217;t call an external library, or the operating system,
616 even to do something simple like draw some pixels on the screen.</li>
617 <li>It also can&#8217;t read or write memory outside of its dedicated sandbox,
618 so communicating that way is right out.</li>
619 </ul>
620 <p>We fix this by allowing the untrusted program to call into the trusted
621 runtime using a trampoline. A trampoline is simply a short stretch of
622 code, placed by the trusted runtime at a known location within the
623 sandbox, that is permitted to do things the untrusted program can&#8217;t.</p>
624 <p>Even though trampolines are inside the sandbox, the untrusted program
625 can&#8217;t modify them: the trusted runtime marks them read-only. It also
626 can&#8217;t do anything clever with the special instructions inside the
627 trampoline&#8212;for example, call it at a slightly offset address to bypass
628 some checks&#8212;because the validator only allows trampolines to be
629 reached by <em>indirect branch</em> (or <em>branch-with-link</em>). We structure the
630 trampolines carefully so that they&#8217;re safe to enter at any <code>0 mod 16< /code>
631 address.</p>
632 <p>The validator can detect attempts to use the trampolines because they&#8217;r e
633 loaded at a fixed location in memory. Let&#8217;s look at the memory map of
634 the Native Client sandbox.</p>
635 <section id="memory-map">
636 <h4 id="memory-map">Memory Map</h4>
637 <p>The ARM sandbox is always at virtual address <code>0</code>, and is exactly 1 GiB
638 in size. This includes the untrusted program&#8217;s code and data, the
639 trampolines, and a small guard region to detect null pointer
640 dereferences. In practice, the untrusted program takes up a bit more
641 room than this, because of the need for additional guard regions at
642 either end of the sandbox.</p>
643 <table border="1" class="docutils">
644 <colgroup>
645 </colgroup>
646 <thead valign="bottom">
647 <tr class="row-odd"><th class="head">Address</th>
648 <th class="head">Size</th>
649 <th class="head">Name</th>
650 <th class="head">Purpose</th>
651 </tr>
652 </thead>
653 <tbody valign="top">
654 <tr class="row-even"><td><code>-0x2000</code></td>
655 <td>8KiB</td>
656 <td>Bottom Guard</td>
657 <td>Keeps negative-displacement <em>load</em> or <em>store</em> from escaping.</ td>
658 </tr>
659 <tr class="row-odd"><td><code>0</code></td>
660 <td>64KiB</td>
661 <td>Null Guard</td>
662 <td>Catches null pointer dereferences, guards against kernel exploits.</td>
663 </tr>
664 <tr class="row-even"><td><code>0x10000</code></td>
665 <td>64KiB</td>
666 <td>Trampolines</td>
667 <td>Up to 2048 unique syscall entry points.</td>
668 </tr>
669 <tr class="row-odd"><td><code>0x20000</code></td>
670 <td>~1GiB</td>
671 <td>Untrusted Sandbox</td>
672 <td>Contains untrusted code, followed by its heap/stack/memory.</td>
673 </tr>
674 <tr class="row-even"><td><code>0x40000000</code></td>
675 <td>8KiB</td>
676 <td>Top Guard</td>
677 <td>Keeps positive-displacement <em>load</em> or <em>store</em> from escaping.</ td>
678 </tr>
679 </tbody>
680 </table>
681 <p>Within the trampolines, the untrusted program can call any address
682 that&#8217;s <code>0 mod 16</code>. However, only even slots are used, so useful
683 trampolines are always <code>0 mod 32</code>. If the program calls an odd slot,
684 it will fault, and the trusted runtime will shut it down.</p>
685 <aside class="note">
686 This is a bit of speculative flexibility. While the current bundle
687 size of Native Client on ARM is 16 bytes, we&#8217;ve considered the
688 possibility of optional 32-byte bundles, to enable certain compiler
689 improvements. While this option isn&#8217;t available to untrusted programs
690 today, we&#8217;re trying to keep the system &#8220;32-byte clean&#8221;.
691 </aside>
692 </section><section id="inside-a-trampoline">
693 <h4 id="inside-a-trampoline">Inside a Trampoline</h4>
694 <p>When we introduced trampolines, we mentioned that they can do things
695 that untrusted programs can&#8217;t. To be more specific, trampolines can jump
696 to locations outside the sandbox. On ARM, this is all they do. Here&#8217;s a
697 typical trampoline fragment on ARM:</p>
698 <pre>
699 ; Even trampoline bundle:
700 push { r0-r3 } ; Save arguments that may be in registers.
701 push { lr } ; Save the untrusted return address,
702 ; separate step because it must be on top.
703 ldr r0, [pc, #4] ; Load the destination address from
704 ; the next bundle.
705 blx r0 ; Go!
706 ; The odd trampoline that immediately follows:
707 bkpt 0x5be0 ; Prevent entry to this data bundle.
708 .word address_of_routine
709 </pre>
710 <p>The only odd thing here is that we push the incoming value of <code>lr</code> ,
711 and then use <code>blx</code>&#8212;not <code>bx</code>&#8212;to escape the sand box. This is
712 because, in practice, all trampolines jump to the same routine in the
713 trusted runtime, called the syscall hook. It uses the return address
714 produced by the final <code>blx</code> instruction to determine which trampoline
715 was called.</p>
716 </section></section><section id="loose-ends">
717 <h3 id="loose-ends">Loose Ends</h3>
718 <section id="forbidden-instructions">
719 <h4 id="forbidden-instructions">Forbidden Instructions</h4>
720 <p>To complete the sandbox, the validator ensures that the program does not
721 try to use certain forbidden instructions.</p>
722 <ul class="small-gap">
723 <li>We forbid instructions that directly interact with the operating
724 system by going around the trusted runtime. We prevent this to limit
725 the functionality of the untrusted program, and to ensure portability
726 across operating systems.</li>
727 <li>We forbid instructions that change the processor&#8217;s execution mode to
728 Thumb, ThumbEE, or Jazelle. This would cause the code to be
729 interpreted differently than the validator&#8217;s original 32-bit ARM
730 disassembly, so the validator results might be invalidated.</li>
731 <li>We forbid instructions that aren&#8217;t available to user code (i.e. have
732 to be used by an operating system kernel). This is purely out of
733 paranoia, because the hardware should prevent the instructions from
734 working. Essentially, we consider it &#8220;suspicious&#8221; if a program
735 contains these instructions&#8212;it might be trying to exploit a hardware
736 bug.</li>
737 <li>We forbid instructions, or variants of instructions, that are
738 implementation-defined (&#8220;unpredictable&#8221;) or deprecated in the ARMv7- A
739 architecture manual.</li>
740 <li>Finally, we forbid a small number of instructions, such as <code>setend</cod e>,
741 purely out of paranoia. It&#8217;s easier to loosen the validator&#8217;s
742 restrictions than to tighten them, so we err on the side of rejecting
743 safe instructions.</li>
744 </ul>
745 <p>If an instruction can&#8217;t be decoded at all within the ARMv7-A instructio n
746 set specification, it is forbidden.</p>
747 <aside class="note">
748 <p>Here is a list of instructions currently forbidden for security
749 reasons (that is, excluding deprecated or undefined instructions):</p>
750 <ul class="small-gap">
751 <li><code>BLX</code> (immediate): always changes to Thumb mode.</li>
752 <li><code>BXJ</code>: always changes to Jazelle mode.</li>
753 <li><code>CPS</code>: not available to user code.</li>
754 <li><code>LDM</code>, exception return version: not available to user code.</li>
755 <li><code>LDM</code>, kernel version: not available to user code.</li>
756 <li><code>LDR*T</code> (unprivileged load operations): theoretically harmless,
757 but suspicious when found in user code. Use <code>LDR</code> instead.</li>
758 <li><code>MSR</code>, kernel version: not available to user code.</li>
759 <li><code>RFE</code>: not available to user code.</li>
760 <li><code>SETEND</code>: theoretically harmless, but suspicious when found in
761 user code. May make some future validator extensions difficult.</li>
762 <li><code>SMC</code>: not available to user code.</li>
763 <li><code>SRS</code>: not available to user code.</li>
764 <li><code>STM</code>, kernel version: not available to user code.</li>
765 <li><code>STR*T</code> (unprivileged store operations): theoretically harmless,
766 but suspicious when found in user code. Use <code>STR</code> instead.</li>
767 <li><code>SVC</code>/<code>SWI</code>: allows direct operating system interactio n.</li>
768 <li>Any unassigned hint instruction: difficult to reason about, so
769 treated as suspicious.</li>
770 </ul>
771 <p>More details are available in the <a class="reference external" href="http:// src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/valida tor_arm/armv7.table">ARMv7 instruction table definition</a>.</p>
772
773 </aside>
774 </section><section id="coprocessors">
775 <h4 id="coprocessors">Coprocessors</h4>
776 <p>ARM has traditionally added new instruction set features through
777 coprocessors. Coprocessors are accessed through a small set of
778 instructions, and often have their own register files. Floating point
779 and the NEON vector extensions are both implemented as coprocessors, as
780 is the MMU.</p>
781 <p>We&#8217;re confident that the side-effects of coprocessors in slots 10 and 1 1
782 (that is, floating point, NEON, etc.) are well-understood. These are in
783 the coprocessor space reserved by ARM Ltd. for their own extensions
784 (<code>CP8</code>&#8211;<code>CP15</code>), and are unlikely to change significa ntly. So, we
785 allow untrusted code to use coprocessors 10 and 11, and we mandate the
786 presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor
787 Extension, VFPv4, FP16 and other extensions are allowed but not
788 required, and may fail on processors that do not support them, it is
789 therefore the program&#8217;s responsibility to validate their availability
790 before executing them.</p>
791 <p>We don&#8217;t allow access to any other ARM-reserved coprocessor
792 (<code>CP8</code>&#8211;<code>CP9</code> or <code>CP12</code>&#8211;<code>CP15</ code>). It&#8217;s possible that read
793 access to <code>CP15</code> might be useful, and we might allow it in the
794 future&#8212;but again, it&#8217;s easier to loosen the restrictions than tighte n
795 them, so we ban it for now.</p>
796 <p>We do not, and probably never will, allow access to the vendor-specific
797 coprocessor space, <code>CP0</code>&#8211;<code>CP7</code>. We&#8217;re simply n ot confident in our
798 ability to model the operations on these coprocessors, given that
799 vendors often leave them poorly-specified. Unfortunately this eliminates
800 some legacy floating point and vector implementations, but these are
801 superceded on ARMv7-A parts anyway.</p>
802 </section><section id="validator-code">
803 <h4 id="validator-code">Validator Code</h4>
804 <p>By now you&#8217;re itching to see the sandbox validator&#8217;s code and dis sect
805 it. You&#8217;ll have a disapointing read: at less that 500 lines of code
806 <a class="reference external" href="http://src.chromium.org/viewvc/native_client /trunk/src/native_client/src/trusted/validator_arm/validator.cc">validator.cc</a >
807 is quite simple to understand and much shorter than this document. It&#8217;s
808 of course dependent on the <a class="reference external" href="http://src.chromi um.org/viewvc/native_client/trunk/src/native_client/src/trusted/validator_arm/ar mv7.table">ARMv7 instruction table definition</a>,
809 which teaches it about the ARMv7 instruction set.</p>
810 </section></section></section></section>
811
812 {{/partials.standard_nacl_article}}
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698