Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(983)

Unified Diff: native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html

Issue 147803003: NaCl docs: add ARM 32-bit sandbox (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src
Patch Set: Address binji's comments. Created 6 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html
diff --git a/native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html b/native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html
new file mode 100644
index 0000000000000000000000000000000000000000..a43d829d7e7ca1a473f043e9d471a08fc6762f2a
--- /dev/null
+++ b/native_client_sdk/doc_generated/reference/sandbox_internals/arm-32-bit-sandbox.html
@@ -0,0 +1,812 @@
+{{+bindTo:partials.standard_nacl_article}}
+
+<section id="arm-32-bit-sandbox">
+<h1 id="arm-32-bit-sandbox">ARM 32-bit Sandbox</h1>
+<p>Native Client for ARM is a method for running programs&#8212;even malicious
+ones&#8212;safely, on computers that use 32-bit ARM processors. It&#8217;s an
+extension of earlier work on Native Client for x86 processors. This
+security is provided with a low performance overhead of about 10% over
+regular ARM code, and as you&#8217;ll see in this document the sandbox model
+is beautifully simple, meaning that the trusted codebase is much easier
+to validate.</p>
+<p>As an implementation detail, the Native Client 32-bit ARM sandbox is
+currently used by Portable Native Client to execute code on 32-bit ARM
+machines in a safe manner. The portable bitcode contained in a <strong>pexe</strong>
+is translated to a 32-bit ARM <strong>nexe</strong> before execution. This may change
+at a point in time: Portable Native Client doesn&#8217;t necessarily need this
+sandbox to execute code on ARM. Note that the Portable Native Client
+compiler itself is also untrusted: it too runs in the ARM sandbox
+described in this document.</p>
+<p>On this page, we describe how Native Client works on 32-bit ARM. We
+assume no prior knowledge about the internals of Native Client, on x86
+or any other architecture, but we do assume some familiarity with
+assembly languages in general.</p>
+<div class="contents local" id="contents" style="display: none">
+<ul class="small-gap">
+<li><p class="first"><a class="reference internal" href="#an-introduction-to-the-arm-architecture" id="id2">An Introduction to the ARM Architecture</a></p>
+<ul class="small-gap">
+<li><a class="reference internal" href="#about-arm-and-armv7-a" id="id3">About ARM and ARMv7-A</a></li>
+<li><a class="reference internal" href="#arm-programmer-s-model" id="id4">ARM Programmer&#8217;s Model</a></li>
+</ul>
+</li>
+<li><p class="first"><a class="reference internal" href="#the-native-client-approach" id="id5">The Native Client Approach</a></p>
+<ul class="small-gap">
+<li><p class="first"><a class="reference internal" href="#nacl-arm-pure-software-fault-isolation" id="id6">NaCl/ARM: Pure Software Fault Isolation</a></p>
+<ul class="small-gap">
+<li><a class="reference internal" href="#load-and-store" id="id7"><em>Load</em> and <em>Store</em></a></li>
+<li><a class="reference internal" href="#the-stack-pointer-thread-pointer-and-program-counter" id="id8">The Stack Pointer, Thread Pointer, and Program Counter</a></li>
+<li><a class="reference internal" href="#indirect-branch" id="id9"><em>Indirect Branch</em></a></li>
+<li><a class="reference internal" href="#literal-pools-and-data-bundles" id="id10">Literal Pools and Data Bundles</a></li>
+</ul>
+</li>
+<li><p class="first"><a class="reference internal" href="#trampolines-and-memory-layout" id="id11">Trampolines and Memory Layout</a></p>
+<ul class="small-gap">
+<li><a class="reference internal" href="#memory-map" id="id12">Memory Map</a></li>
+<li><a class="reference internal" href="#inside-a-trampoline" id="id13">Inside a Trampoline</a></li>
+</ul>
+</li>
+<li><p class="first"><a class="reference internal" href="#loose-ends" id="id14">Loose Ends</a></p>
+<ul class="small-gap">
+<li><a class="reference internal" href="#forbidden-instructions" id="id15">Forbidden Instructions</a></li>
+<li><a class="reference internal" href="#coprocessors" id="id16">Coprocessors</a></li>
+<li><a class="reference internal" href="#validator-code" id="id17">Validator Code</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+
+</div><section id="an-introduction-to-the-arm-architecture">
+<h2 id="an-introduction-to-the-arm-architecture">An Introduction to the ARM Architecture</h2>
+<p>In this section, we summarize the relevant parts of the ARM processor
+architecture.</p>
+<section id="about-arm-and-armv7-a">
+<h3 id="about-arm-and-armv7-a">About ARM and ARMv7-A</h3>
+<p>ARM is one of the older commercial &#8220;RISC&#8221; processor designs, dating back
+to the early 1980s. Today, it is used primarily in embedded systems:
+everything from toys, to home automation, to automobiles. However, its
+most visible use is in cellular phones, tablets and some
+laptops.</p>
+<p>Through the years, there have been many revisions of the ARM
+architecture, written as ARMv<em>X</em> for some version <em>X</em>. Native Client
+specifically targets the ARMv7-A architecture commonly used in high-end
+phones and smartbooks. This revision, defined in the mid-2000s, adds a
+number of useful instructions, and specifies some portions of the system
+that used to be left to individual chip manufacturers. Critically,
+ARMv7-A specifies the &#8220;eXecute Never&#8221; bit, or <em>XN</em>. This pagetable
+attribute lets us mark memory as non-executable. Our security relies on
+the presence of this feature.</p>
+<p>ARMv8 adds a new 64-bit instruction set architecture called A64, while
+also enhancing the 32-bit A32 ISA. For Native Client&#8217;s purposes the A32
+ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new
+instructions. This document only discussed the 32-bit A32 instruction
+set: A64 would require a different sandboxing model.</p>
+</section><section id="arm-programmer-s-model">
+<h3 id="arm-programmer-s-model">ARM Programmer&#8217;s Model</h3>
+<p>While modern ARM chips support several instruction encodings, 32-bit
+Native Client on ARM focuses on a single one: a fixed-width encoding
+where every instruction is 32-bits wide called A32 (previously, and
+confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called
+T32), Jazelle, ThumbEE and such aren&#8217;t supported by Native Client. This
+dramatically simplifies some of our analyses, as we&#8217;ll see later. Nearly
+every instruction can be conditionally executed based on the contents of
+a dedicated condition code register.</p>
+<p>ARM processors have 16 general-purpose registers used for integer and
+memory operations, written <code>r0</code> through <code>r15</code>. Of these, two have
+special roles baked in to the hardware:</p>
+<ul class="small-gap">
+<li><code>r14</code> is the Link Register. The ARM <em>call</em> instruction
+(<em>branch-with-link</em>) doesn&#8217;t use the stack directly. Instead, it
+stashes the return address in <code>r14</code>. In other circumstances, <code>r14</code>
+can be (and is!) used as a general-purpose register. When <code>r14</code> is
+playing its Link Register role, it&#8217;s referred to as <code>lr</code>.</li>
+<li><code>r15</code> is the Program Counter. While it can be read and written like
+any other register, setting it to a new value will cause execution to
+jump to a new address. Using it in some circumstances is also
+undefined by the ARM architecture. Because of this, <code>r15</code> is never
+used for anything else, and is referred to as <code>pc</code>.</li>
+</ul>
+<p>Other registers are given roles by convention. The only important
+registers to Native Client are <code>r9</code> and <code>r13</code>, which are used as the
+Thread Pointer location and Stack Pointer. When playing this role,
+they&#8217;re referred to as <code>tp</code> and <code>sp</code>.</p>
+<p>Like other RISC-inspired designs, ARM programs use explicit <em>load</em> and
+<em>store</em> instructions to access memory. All other instructions operate
+only on registers, or on registers and small constants called
+immediates. Because both instructions and data words are 32-bits, we
+can&#8217;t simply embed a 32-bit number into an instruction. ARM programs use
+three methods to work around this, all of which Native Client exploits:</p>
+<ol class="arabic simple">
+<li>Many instructions can encode a modified immediate, which is an 8-bit
+number rotated right by an even number of bits.</li>
+<li>The <code>movw</code> and <code>movt</code> instructions can be used to set the top and
+bottom 16-bits of a register, and can therefore encode any 32-bit
+immediate.</li>
+<li>For values that can&#8217;t be represented as modified immediates, ARM
+programs use <code>pc</code>-relative loads to load data from inside the
+code&#8212;hidden in a place where it won&#8217;t be executed such as &#8220;constant
+pools&#8221;, just past the final return of a function.</li>
+</ol>
+<p>We&#8217;ll introduce more details of the ARM instruction set later, as we
+walk through the system.</p>
+</section></section><section id="the-native-client-approach">
+<h2 id="the-native-client-approach">The Native Client Approach</h2>
+<p>Native Client runs an untrusted program, potentially from an unknown or
+malicious source, inside a sandbox created by a trusted runtime. The
+trusted runtime allows the untrusted program to &#8220;call-out&#8221; and perform
+certain actions, such as drawing graphics, but prevents it from
+accessing the operating system directly. This &#8220;call-out&#8221; facility,
+called a trampoline, looks like a standard function call to the
+untrusted program, but it allows control to escape from the sandbox in a
+controlled way.</p>
+<p>The untrusted program and trusted runtime inhabit the same process, or
+virtual address space, maintained by the operating system. To keep the
+trusted runtime behaving the way we expect, we must prevent the
+untrusted program from accessing and modifying its internals. Since they
+share a virtual address space, we can&#8217;t rely on the operating system for
+this. Instead, we isolate the untrusted program from the trusted
+runtime.</p>
+<p>Unlike modern operating systems, we use a cooperative isolation
+method. Native Client can&#8217;t run any off-the-shelf program compiled for
+an off-the-shelf operating system. The program must be compiled to
+comply with Native Client&#8217;s rules. The details vary on each platform,
+but in general, the untrusted program:</p>
+<ul class="small-gap">
+<li>Must not attempt to use certain forbidden instructions, such as system
+calls.</li>
+<li>Must not attempt to modify its own code without abiding by Native
+Client&#8217;s code modification rules.</li>
+<li>Must not jump into the middle of an instruction group, or otherwise do
+tricky things to cause instructions to be interpreted multiple ways.</li>
+<li>Must use special, strictly-defined instruction sequences to perform
+permitted but potentially dangerous actions. We call these sequences
+pseudo-instructions.</li>
+</ul>
+<p>We can&#8217;t simply take the program&#8217;s word that it complies with these
+rules&#8212;we call it &#8220;untrusted&#8221; for a reason! Nor do we require it to be
+produced by a special compiler; in practice, we don&#8217;t trust our
+compilers either. Instead, we apply a load-time validator that
+disassembles the program. The validator either proves that the program
+complies with our rules, or rejects it as unsafe. By keeping the rules
+simple, we keep the validator simple, small, and fast. We like to put
+our trust in small, simple things, and the validator is key to the
+system&#8217;s security.</p>
+<aside class="note">
+For the computationally-inclined, all our validators scale linearly in
+the size of the program.
+</aside>
+<section id="nacl-arm-pure-software-fault-isolation">
+<h3 id="nacl-arm-pure-software-fault-isolation">NaCl/ARM: Pure Software Fault Isolation</h3>
+<p>In the original Native Client system for the x86, we used unusual
+hardware features of that processor (the segment registers) to isolate
+untrusted programs. This was simple and fast, but won&#8217;t work on ARM,
+which has nothing equivalent. Instead, we use pure software fault
+isolation.</p>
+<p>We use a fixed address space layout: the untrusted program gets the
+lowest gigabyte, addresses <code>0</code> through <code>0x3FFFFFFF</code>. The rest of the
+address space holds the trusted runtime and the operating system. We
+isolate the program by requiring every <em>load</em>, <em>store</em>, and <em>indirect
+branch</em> (to an address in a register) to use a pseudo-instruction. The
+pseudo-instructions ensure that the address stays within the
+sandbox. The <em>indirect branch</em> pseudo-instruction, in turn, ensures that
+such branches won&#8217;t split up other pseudo-instructions.</p>
+<p>At either side of the sandbox, we place small (8KiB) guard
+regions. These are simply areas in the process&#8217;s address space that are
+mapped without read, write, or execute permissions, so any attempt to
+access them for any reason&#8212;<em>load</em>, <em>store</em>, or <em>jump</em>&#8212;will cause a
+fault.</p>
+<p>Finally, we ban the use of certain instructions, notably direct system
+calls. This is to ensure that the untrusted program can be run on any
+operating system supported by Native Client, and to prevent access to
+certain system features that might be used to subvert the sandbox. As a
+side effect, it helps to prevent programs from exploiting buggy
+operating system APIs.</p>
+<p>Let&#8217;s walk through the details, starting with the simplest part: <em>load</em>
+and <em>store</em>.</p>
+<section id="load-and-store">
+<h4 id="load-and-store"><em>Load</em> and <em>Store</em></h4>
+<p>All access to memory must be through <em>load</em> and <em>store</em>
+pseudo-instructions. These are simply a native <em>load</em> or <em>store</em>
+instruction, preceded by a guard instruction.</p>
+<p>Each <em>load</em> or <em>store</em> pseudo-instruction is similar to the <em>load</em> shown
+below. We use abstract &#8220;placeholder&#8221; registers instead of specific
+numbered registers for the sake of discussion. <code>rA</code> is the register
+holding the address to load from. <code>rD</code> is the destination for the
+loaded data.</p>
+<pre>
+bic rA, #0xC0000000
+ldr rD, [rA]
+</pre>
+<p>The first instruction, <code>bic</code>, clears the top two bits of <code>rA</code>. In
+this case, that means that the value in <code>rA</code> is forced to an address
+inside our sandbox, between <code>0</code> and <code>0x3FFFFFFF</code>, inclusive.</p>
+<p>The second instruction, <code>ldr</code>, uses the previously-sandboxed address
+to load a value. This address might not be the address that the program
+intended, and might cause an access to an unmapped memory location
+within the sandbox: <code>bic</code> forces the address to be valid, by clearing
+the top two bits. This is a no-op in a correct program.</p>
+<p>This illustrates a common property of all Native Client systems: we aim
+for safety, not correctness. A program using an invalid address in
+<code>rA</code> here is simply broken, so we are free to do whatever we want to
+preserve safety. In this case the program might load an invalid (but
+safe) value, or cause a segmentation fault limited to the untrusted
+code.</p>
+<p>Now, if we allowed arbitrary branches within the program, a malicious
+program could set up carefully-crafted values in <code>rA</code>, and then jump
+straight to the <code>ldr</code>. This is why we validate that programs never
+split pseudo-instructions.</p>
+<section id="alternative-sandboxing">
+<h5 id="alternative-sandboxing">Alternative Sandboxing</h5>
+<pre>
+tst rA, #0xC0000000
+ldreq rD, [rA]
+</pre>
+<p>The first instruction, <code>tst</code>, performs a bitwise-<code>AND</code> of <code>rA</code>
+and the modified immediate literal, <code>0xC0000000</code>. It sets the
+condition flags based on the result, but does not write the result to a
+register. In particular, it sets the <code>Z</code> condition flag if the result
+was zero&#8212;if the two values had no set bits in common. In this case,
+that means that the value in <code>rA</code> was an address inside our sandbox,
+between <code>0</code> and <code>0x3FFFFFFF</code>, inclusive.</p>
+<p>The second instruction, <code>ldreq</code>, is a conditional load if equal. As we
+mentioned before, nearly all ARM instructions can be made
+conditional. In assembly language, we simply stick the desired condition
+on the end of the instruction&#8217;s mnemonic name. Here, the condition is
+<code>EQ</code>, which causes the instruction to execute only if the <code>Z</code> flag
+is set.</p>
+<p>Thus, when the pseudo-instruction executes, the <code>tst</code> sets <code>Z</code> if
+(and only if) the value in <code>rA</code> is an address within the bounds of the
+sandbox, and then the <code>ldreq</code> loads if (and only if) it was. If <code>rA</code>
+held an invalid address, the <em>load</em> does not execute, and <code>rD</code> is
+unchanged.</p>
+<aside class="note">
+The <code>tst</code>-based sequence is faster than the <code>bic</code>-based sequence
+on modern ARM chips. It avoids a data dependency in the address
+register. This is why we keep both around. The <code>tst</code>-based sequence
+unfortunately leaks information on some processors, and is therefore
+forbidden on certain processors. This effectively means that it cannot
+be used for regular Native Client <strong>nexe</strong> files, but can be used with
+Portable Native Client because the target processor is known at
+translation time from <strong>pexe</strong> to <strong>nexe</strong>.
+</aside>
+</section><section id="addressing-modes">
+<h5 id="addressing-modes">Addressing Modes</h5>
+<p>ARM has an unusually rich set of addressing modes. We allow all but one:
+register-indexed, where two registers are added to determine the
+address.</p>
+<p>We permit simple <em>load</em> and <em>store</em>, as shown above. We also permit
+displacement, pre-index, and post-index memory operations:</p>
+<pre>
+bic rA, #0xC0000000
+ldr rD, [rA, #1234] ; This is fine.
+bic rA, #0xC0000000
+ldr rD, [rA, #1234]! ; Also fine.
+bic rA, #0xC0000000
+ldr rD, [rA], #1234 ; Looking good.
+</pre>
+<p>In each case, we know <code>rA</code> points into the sandbox when the <code>ldr</code>
+executes. We allow adding an immediate displacement to <code>rA</code> to
+determine the final address (as in the first two examples here) because
+the largest immediate displacement is ±4095 bytes, while our guard pages
+are 8192 bytes wide.</p>
+<p>We also allow ARM&#8217;s more unusual <em>load</em> and <em>store</em> instructions, such
+as <em>load-multiple</em> and <em>store-multiple</em>, etc.</p>
+</section><section id="conditional-load-and-store">
+<h5 id="conditional-load-and-store">Conditional <em>Load</em> and <em>Store</em></h5>
+<p>There&#8217;s one problem with the pseudo-instructions shown above: they are
+unconditional (assuming <code>rA</code> is valid). ARM compilers regularly use
+conditional <em>load</em> and <em>store</em>, so we should support this in Native
+Client. We do so by defining alternate, predictable
+pseudo-instructions. Here is a conditional <em>store</em>
+(<em>store-if-greater-than</em>) using this pseudo-instruction sequence:</p>
+<pre>
+bicgt rA, #0xC0000000
+strgt rX, [rA, #123]
+</pre>
+</section></section><section id="the-stack-pointer-thread-pointer-and-program-counter">
+<h4 id="the-stack-pointer-thread-pointer-and-program-counter">The Stack Pointer, Thread Pointer, and Program Counter</h4>
+<section id="stack-pointer">
+<h5 id="stack-pointer">Stack Pointer</h5>
+<p>In C-like languages, the stack is used to store return addresses during
+function calls, as well as any local variables that won&#8217;t fit in
+registers. This makes stack operations very common.</p>
+<p>Native Client does not require guard instructions on any <em>load</em> or
+<em>store</em> involving the stack pointer, <code>sp</code>. This improves performance
+and reduces code size. However, ARM&#8217;s stack pointer isn&#8217;t special: it&#8217;s
+just another register, called <code>sp</code> only by convention. To make it safe
+to use this register as a <em>load</em> or <em>store</em> address without guards, we
+add a rule: <code>sp</code> must always contain a valid address.</p>
+<p>We enforce this rule by restricting the sorts of operations that
+programs can use to alter <code>sp</code>. Programs can alter <code>sp</code> by adding or
+subtracting an immediate, as a side-effect of a <em>load</em> or <em>store</em>:</p>
+<pre>
+ldr rX, [sp], #4! ; Load from stack, then add 4 to sp.
+str rX, [sp, #1234]! ; Add 1234 to sp, then store to stack.
+</pre>
+<p>These are safe because, as we mentioned before, the largest immediate
+available in a <em>load</em> or <em>store</em> is ±4095. Even after adding or
+subtracting 4095, the stack pointer will still be within the sandbox or
+guard regions.</p>
+<p>Any other operation that alters <code>sp</code> must be followed by a guard
+instruction. The most common alterations, in practice, are addition and
+subtraction of arbitrary integers:</p>
+<pre>
+add sp, rX
+bic sp, #0xC0000000
+</pre>
+<p>The <code>bic</code> is similar to the one we used for conditional <em>load</em> and
+<em>store</em>, and serves exactly the same purpose: after it completes, <code>sp</code>
+is a valid address.</p>
+<aside class="note">
+Clever assembly programmers and compilers may want to use this
+&#8220;trusted&#8221; property of <code>sp</code> to emit more efficient code: in a hot
+loop instead of using <code>sp</code> as a stack pointer it can be temporarily
+used as an index pointer (e.g. to traverse an array). This avoids the
+extra <code>bic</code> whenever the pointer is updated in the loop.
+</aside>
+</section><section id="thread-pointer-loads">
+<h5 id="thread-pointer-loads">Thread Pointer Loads</h5>
+<p>The thread pointer and IRT thread pointer are stored in the trusted
+address space. All uses and definitions of <code>r9</code> from untrusted code
+are forbidden except as follows:</p>
+<pre>
+ldr Rn, [r9] ; Load user thread pointer.
+ldr Rn, [r9, #4] ; Load IRT thread pointer.
+</pre>
+</section><section id="pc-relative-loads">
+<h5 id="pc-relative-loads"><code>pc</code>-relative Loads</h5>
+<p>By extension, we also allow <em>load</em> through the <code>pc</code> without a
+mask. The explanation is quite similar:</p>
+<ul class="small-gap">
+<li>Our control-flow isolation rules mean that the <code>pc</code> will always
+point into the sandbox.</li>
+<li>The maximum immediate displacement that can be used in a
+<code>pc</code>-relative <em>load</em> is smaller than the width of the guard pages.</li>
+</ul>
+<p>We do not allow <code>pc</code>-relative stores, because they look suspiciously
+like self-modifying code, or any addressing mode that would alter the
+<code>pc</code> as a side effect of the <em>load</em>.</p>
+</section></section><section id="indirect-branch">
+<h4 id="indirect-branch"><em>Indirect Branch</em></h4>
+<p>There are two types of control flow on ARM: direct and indirect. Direct
+control flow instructions have an embedded target address or
+offset. Indirect control flow instructions take their destination
+address from a register. The <code>b</code> (branch) and <code>bl</code>
+(<em>branch-with-link</em>) instructions are <em>direct branch</em> and <em>call</em>,
+respectively. The <code>bx</code> (<em>branch-exchange</em>) and <code>blx</code>
+(<em>branch-with-link-exchange</em>) are the indirect equivalents.</p>
+<p>Because the program counter <code>pc</code> is simply another register, ARM also
+has many implicit indirect control flow instructions. Programs can
+operate on the <code>pc</code> using <em>add</em> or <em>load</em>, or even outlandish (and
+often specified as having unpredictable-behavior) things like multiply!
+In Native Client we ban all such instructions. Indirect control flow is
+exclusively through <code>bx</code> and <code>blx</code>. Because all of ARM&#8217;s control
+flow instructions are called <em>branch</em> instructions, we&#8217;ll use the term
+<em>indirect branch</em> from here on, even though this includes things like
+<em>virtual call</em>, <em>return</em>, and the like.</p>
+<section id="the-trouble-with-indirection">
+<h5 id="the-trouble-with-indirection">The Trouble with Indirection</h5>
+<p><em>Indirect branch</em> present two problems for Native Client:</p>
+<ul class="small-gap">
+<li>We must ensure that they don&#8217;t send execution outside the sandbox.</li>
+<li>We must ensure that they don&#8217;t break up the instructions inside a
+pseudo-instruction, by landing on the second one.</li>
+</ul>
+<aside class="note">
+On the x86 architectures we must also ensure that it doesn&#8217;t land
+inside an instruction. This is unnecessary on ARM, where all
+instructions are 32-bit wide.
+</aside>
+<p>Checking both of these for <em>direct branch</em> is easy: the validator just
+pulls the (fixed) target address out of the instruction and checks what
+it points to.</p>
+</section><section id="the-native-client-solution-bundles">
+<h5 id="the-native-client-solution-bundles">The Native Client Solution: &#8220;Bundles&#8221;</h5>
+<p>For <em>indirect branch</em>, we can address the first problem by simply
+masking some high-order bits off the address, like we did for <em>load</em> and
+<em>store</em>. The second problem is more subtle. Detecting every possible
+route that every <em>indirect branch</em> might take is difficult. Instead, we
+take the approach pioneered by the original Native Client: we restrict
+the possible places that any <em>indirect branch</em> can land. On Native
+Client for ARM, <em>indirect branch</em> can target any address that has its
+bottom four bits clear&#8212;any address that&#8217;s <code>0 mod 16</code>. We call these
+16-byte chunks of code &#8220;bundles&#8221;. The validator makes sure that no
+pseudo-instruction straddles a bundle boundary. Compilers must pad with`
+<cite>nop`</cite>s to ensure that every pseudo-instruction fits entirely inside
+one bundle.</p>
+<p>Here is the <em>indirect branch</em> pseudo-instruction. As you can see, it
+clears the top two and bottom four bits of the address:</p>
+<pre>
+bic rA, #0xC000000F
+bx rA
+</pre>
+<p>This particular pseudo-instruction (a <code>bic</code> followed by a <code>bx</code>) is
+used for computed jumps in switch tables and returning from functions,
+among other uses. Recall that, under ARM&#8217;s modified immediate rules, we
+can fit the constant <code>0xC000000F</code> into the <code>bic</code> instruction&#8217;s
+immediate field: <code>0xC000000F</code> is the 8-bit constant <code>0xFC</code>, rotated
+right by 4 bits.</p>
+<p>The other useful variant is the <em>indirect branch-with-link</em>, which is
+the ARM equivalent to <em>call</em>:</p>
+<pre>
+bic rA, #0xC000000F
+blx rA
+</pre>
+<p>This is used for indirect function calls&#8212;commonly seen in C++ programs
+as virtual calls, but also for calling function pointers in C.</p>
+<p>Note that both <em>indirect branch</em> pseudo-instructions use <code>bic</code>, rather
+than the <code>tst</code> instruction we allow for <em>load</em> and <em>store</em>. There are
+two reasons for this:</p>
+<ol class="arabic simple">
+<li>Conditional <em>branch</em> is very common. Much more common than
+conditional <em>load</em> and <em>store</em>. If we supported an alternative
+<code>tst</code>-based sequence for <em>branch</em>, it would be rare.</li>
+<li>There&#8217;s no performance benefit to using <code>tst</code> here on modern ARM
+chips. <em>Branch</em> consumes its operands later in the pipeline than
+<em>load</em> and <em>store</em> (since they don&#8217;t have to generate an address,
+etc) so this sequence doesn&#8217;t stall.</li>
+</ol>
+<aside class="note">
+<p>At this point astute readers are wondering what the <code>x</code> in <code>bx</code>
+and <code>blx</code> means. We told you it stood for &#8220;exchange&#8221;, but exchange
+to what? ARM, for all the reduced-ness of its instruction set, can
+change execution mode from A32 (ARM) to T32 (Thumb) and back with
+these <em>branch</em> instructions, called <em>interworking branch</em>. Recall that
+A32 instructions are 32-bit wide, and T32 instructions are a mix of
+both 16-bit or 32-bit wide. The destination address given to a
+<em>branch</em> therefore cannot sensibly have its bottom bit set in either
+instruction set: that would be an unaligned instruction in both cases,
+and ARM simply doesn&#8217;t support this. The bottom bit for the <em>indirect
+branch</em> was therefore cleverly recycled by the ARM architecture to
+mean &#8220;switch to T32 mode&#8221; when set!</p>
+<p>As you&#8217;ve figured out by now, Native Client&#8217;s sandbox won&#8217;t be very
+happy if A32 instructions were to be executed as T32 instructions: who
+know what they correspond to? A malicious person could craft valid
+A32 code that&#8217;s actually very naughty T32 code, somewhat like forming
+a sentence that happens to be valid in English and French but with
+completely different meanings, complimenting the reader in one
+language and insulting them in the other.</p>
+<p>You&#8217;ve figured out by now that the bundle alignment restrictions of
+the Native Client sandbox already take care of making this travesty
+impossible: by masking off the bottom 4 bits of the destination the
+interworking nature of ARM&#8217;s <em>indirect branch</em> is completely avoided.</p>
+
+</aside>
+</section><section id="call-and-return">
+<h5 id="call-and-return"><em>Call</em> and <em>Return</em></h5>
+<p>On ARM, there is no <em>call</em> or <em>return</em> instruction. A <em>call</em> is simply a
+<em>branch</em> that just happen to load a return address into <code>lr</code>, the link
+register. If the called function is a leaf (that is, if it calls no
+other functions before returning), it simply branches to the address
+stored in <code>lr</code> to <em>return</em> to its caller:</p>
+<pre>
+bic lr, #0xC000000F
+bx lr
+</pre>
+<p>If the function called other functions, however, it had to spill <code>lr</code>
+onto the stack. On x86, this is done implicitly, but it is explicit on
+ARM:</p>
+<pre>
+push { lr }
+; Some code here...
+pop { lr }
+bic lr, #0xC000000F
+bx lr
+</pre>
+<p>There are two things to note about this code.</p>
+<ol class="arabic simple">
+<li>As we mentioned before, we don&#8217;t allow arbitrary instructions to
+write to the Program Counter, <code>pc</code>. Thus, while a traditional ARM
+program might have popped directly into <code>pc</code> to end the function,
+we require a pop into a register, followed by a pseudo-instruction.</li>
+<li>Function returns really are just <em>indirect branch</em>, with the same
+restrictions. This means that functions can only return to addresses
+that are bundle-aligned: <code>0 mod 16</code>.</li>
+</ol>
+<p>The implication here is that a <em>call</em>&#8212;the <em>branch</em> that enters
+functions&#8212;must be placed at the end of the bundle, so that the return
+address they generate is <code>0 mod 16</code>. Otherwise, when we clear the
+bottom four bits, the program would enter an infinite loop! (Native
+Client doesn&#8217;t try to prevent infinite loops, but the validator actually
+does check the alignment of calls. This is because, when we were writing
+the compiler, it was annoying to find out our calls were in the wrong
+place by having the program run forever!)</p>
+<aside class="note">
+Properly balancing the CPU&#8217;s <em>call</em>/<em>return</em> actually allows it to
+perform much better by allowing it to speculatively execute the return
+address&#8217; code. For more information on ARM&#8217;s <em>call</em>/<em>return</em> stack see
+ARM&#8217;s technical reference manual.
+</aside>
+</section></section><section id="literal-pools-and-data-bundles">
+<h4 id="literal-pools-and-data-bundles">Literal Pools and Data Bundles</h4>
+<p>In the section where we described the ARM architecture, we mentioned
+ARM&#8217;s unusual immediate forms. To restate:</p>
+<ul class="small-gap">
+<li>ARM instructions are fixed-length, 32-bits, so we can&#8217;t have an
+instruction that includes an arbitrary 32-bit constant.</li>
+<li>Many ARM instructions can include a modified immediate constant, which
+is flexible, but limited.</li>
+<li>For any other value (particularly addresses), ARM programs explicitly
+load constants from inside the code itself.</li>
+</ul>
+<aside class="note">
+ARMv7 introduces some instructions, <code>movw</code> and <code>movt</code>, that try to
+address this by letting us directly load larger constants. Our
+toolchain uses this capability in some cases.
+</aside>
+<p>Here&#8217;s a typical example of the use of a literal pool. ARM assemblers
+typically hide the details&#8212;this is the sort of code you&#8217;d see produced
+by a disassembler, but with more comments.</p>
+<pre>
+; C equivalent: &quot;table[3] = 4&quot;
+; 'table' is a static array of bytes.
+ldr r0, [pc, #124] ; Load the address of the 'table',
+ ; &quot;124&quot; is the offset from here
+ ; to the constant below.
+add r0, #3 ; Add the immediate array index.
+mov r1, #4 ; Get the constant '4' into a register.
+bic r0, #0xC0000000 ; Mask our array address.
+strb r1, [r0] ; Store one byte.
+; ...
+.word table ; Constant referenced above.
+</pre>
+<p>Because table is a static array, the compiler knew its address at
+compile-time&#8212;but the address didn&#8217;t fit in a modified immediate. (Most
+don&#8217;t). So, instead of loading an immediate into <code>r0</code> with a <code>mov</code>,
+we stashed the address in the code, generated its address using <code>pc</code>,
+and loaded the constant. ARM compilers will typically group all the
+embedded data together into a literal pool. These typically live just
+past the end of functions, where they won&#8217;t be executed.</p>
+<p>This is an important trick in ARM code, so it&#8217;s important to support it
+in Native Client... but there&#8217;s a potential flaw. If we let programs
+contain arbitrary data, mingled in with the code, couldn&#8217;t they hide
+malicious instructions this way?</p>
+<p>The answer is no, because the validator disassembles the entire
+executable region of the program, without regard to whether the
+programmer said a certain chunk was code or data. But this brings the
+opposite problem: what if the program needs to contain a certain
+constant that just happens to encode a malicious instruction? We want
+to allow this, but we have to be certain it will never be executed as
+code!</p>
+<section id="data-bundles-to-the-rescue">
+<h5 id="data-bundles-to-the-rescue">Data Bundles to the Rescue</h5>
+<p>As we discussed in the last section, ARM code in Native Client is
+structured in 16-byte bundles. We allow literal pools by putting them in
+special bundles, called data bundles. Each data bundle can contain 12
+bytes of arbitrary data, and the program can have as many data bundles
+as it likes.</p>
+<p>Each data bundle starts with a breakpoint instruction, <code>bkpt</code>. This
+way, if an <em>indirect branch</em> tries to enter the data bundle, the process
+will take a fault and the trusted runtime will intervene (by terminating
+the program). For example:</p>
+<pre>
+bkpt #0x5BE0 ; Must be aligned 0 mod 16!
+.word 0xDEADBEEF ; Arbitrary constants are A-OK.
+svc #30 ; Trying to make a syscall? OK!
+str r0, [r1] ; Unmasked stores are fine too.
+</pre>
+<p>So, we have a way for programs to create an arbitrary, even dangerous,
+chunk of data within their code. We can prevent <em>indirect branch</em> from
+entering it. We can also prevent fall-through from the code just before
+it, by the <code>bkpt</code>. But what about <em>direct branch</em> straight into the
+middle?</p>
+<p>The validator detects all data bundles (because this <code>bkpt</code> has a
+special encoding) and marks them as off-limits for <em>direct branch</em>. If
+it finds a <em>direct branch</em> into a data bundle, the entire program is
+rejected as unsafe. Because <em>direct branch</em> cannot be modified at
+runtime, the data bundles cannot be executed.</p>
+<aside class="note">
+Clever readers may wonder: why use <code>bkpt #0x5BE0</code>, that seems
+awfully specific when you just need a special &#8220;roadblock&#8221; instruction!
+Quite true, young Padawan! It happens that this odd <code>bkpt</code>
+instruction is encoded as <code>0xE125BE70</code> in A32, and in T32 the
+<code>bkpt</code> instruction is encoded as <code>0xBExx</code> (where <code>xx</code> could be
+any 8-bit immediate, say <code>0x70</code>) and <code>0xE125</code> encodes the <em>branch</em>
+instruction <code>b.n #0x250</code>. The special roadblock instruction
+therefore doubles as a roadblock in T32, if anything were to go so
+awry that we tried to execute it as a T32 instruction! Much defense,
+such depth, wow!
+</aside>
+</section></section></section><section id="trampolines-and-memory-layout">
+<h3 id="trampolines-and-memory-layout">Trampolines and Memory Layout</h3>
+<p>So far, the rules we&#8217;ve described make for boring programs: they can&#8217;t
+communicate with the outside world!</p>
+<ul class="small-gap">
+<li>The program can&#8217;t call an external library, or the operating system,
+even to do something simple like draw some pixels on the screen.</li>
+<li>It also can&#8217;t read or write memory outside of its dedicated sandbox,
+so communicating that way is right out.</li>
+</ul>
+<p>We fix this by allowing the untrusted program to call into the trusted
+runtime using a trampoline. A trampoline is simply a short stretch of
+code, placed by the trusted runtime at a known location within the
+sandbox, that is permitted to do things the untrusted program can&#8217;t.</p>
+<p>Even though trampolines are inside the sandbox, the untrusted program
+can&#8217;t modify them: the trusted runtime marks them read-only. It also
+can&#8217;t do anything clever with the special instructions inside the
+trampoline&#8212;for example, call it at a slightly offset address to bypass
+some checks&#8212;because the validator only allows trampolines to be
+reached by <em>indirect branch</em> (or <em>branch-with-link</em>). We structure the
+trampolines carefully so that they&#8217;re safe to enter at any <code>0 mod 16</code>
+address.</p>
+<p>The validator can detect attempts to use the trampolines because they&#8217;re
+loaded at a fixed location in memory. Let&#8217;s look at the memory map of
+the Native Client sandbox.</p>
+<section id="memory-map">
+<h4 id="memory-map">Memory Map</h4>
+<p>The ARM sandbox is always at virtual address <code>0</code>, and is exactly 1GiB
+in size. This includes the untrusted program&#8217;s code and data, the
+trampolines, and a small guard region to detect null pointer
+dereferences. In practice, the untrusted program takes up a bit more
+room than this, because of the need for additional guard regions at
+either end of the sandbox.</p>
+<table border="1" class="docutils">
+<colgroup>
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Address</th>
+<th class="head">Size</th>
+<th class="head">Name</th>
+<th class="head">Purpose</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><code>-0x2000</code></td>
+<td>8KiB</td>
+<td>Bottom Guard</td>
+<td>Keeps negative-displacement <em>load</em> or <em>store</em> from escaping.</td>
+</tr>
+<tr class="row-odd"><td><code>0</code></td>
+<td>64KiB</td>
+<td>Null Guard</td>
+<td>Catches null pointer dereferences, guards against kernel exploits.</td>
+</tr>
+<tr class="row-even"><td><code>0x10000</code></td>
+<td>64KiB</td>
+<td>Trampolines</td>
+<td>Up to 2048 unique syscall entry points.</td>
+</tr>
+<tr class="row-odd"><td><code>0x20000</code></td>
+<td>~1GiB</td>
+<td>Untrusted Sandbox</td>
+<td>Contains untrusted code, followed by its heap/stack/memory.</td>
+</tr>
+<tr class="row-even"><td><code>0x40000000</code></td>
+<td>8KiB</td>
+<td>Top Guard</td>
+<td>Keeps positive-displacement <em>load</em> or <em>store</em> from escaping.</td>
+</tr>
+</tbody>
+</table>
+<p>Within the trampolines, the untrusted program can call any address
+that&#8217;s <code>0 mod 16</code>. However, only even slots are used, so useful
+trampolines are always <code>0 mod 32</code>. If the program calls an odd slot,
+it will fault, and the trusted runtime will shut it down.</p>
+<aside class="note">
+This is a bit of speculative flexibility. While the current bundle
+size of Native Client on ARM is 16 bytes, we&#8217;ve considered the
+possibility of optional 32-byte bundles, to enable certain compiler
+improvements. While this option isn&#8217;t available to untrusted programs
+today, we&#8217;re trying to keep the system &#8220;32-byte clean&#8221;.
+</aside>
+</section><section id="inside-a-trampoline">
+<h4 id="inside-a-trampoline">Inside a Trampoline</h4>
+<p>When we introduced trampolines, we mentioned that they can do things
+that untrusted programs can&#8217;t. To be more specific, trampolines can jump
+to locations outside the sandbox. On ARM, this is all they do. Here&#8217;s a
+typical trampoline fragment on ARM:</p>
+<pre>
+; Even trampoline bundle:
+push { r0-r3 } ; Save arguments that may be in registers.
+push { lr } ; Save the untrusted return address,
+ ; separate step because it must be on top.
+ldr r0, [pc, #4] ; Load the destination address from
+ ; the next bundle.
+blx r0 ; Go!
+; The odd trampoline that immediately follows:
+bkpt 0x5be0 ; Prevent entry to this data bundle.
+.word address_of_routine
+</pre>
+<p>The only odd thing here is that we push the incoming value of <code>lr</code>,
+and then use <code>blx</code>&#8212;not <code>bx</code>&#8212;to escape the sandbox. This is
+because, in practice, all trampolines jump to the same routine in the
+trusted runtime, called the syscall hook. It uses the return address
+produced by the final <code>blx</code> instruction to determine which trampoline
+was called.</p>
+</section></section><section id="loose-ends">
+<h3 id="loose-ends">Loose Ends</h3>
+<section id="forbidden-instructions">
+<h4 id="forbidden-instructions">Forbidden Instructions</h4>
+<p>To complete the sandbox, the validator ensures that the program does not
+try to use certain forbidden instructions.</p>
+<ul class="small-gap">
+<li>We forbid instructions that directly interact with the operating
+system by going around the trusted runtime. We prevent this to limit
+the functionality of the untrusted program, and to ensure portability
+across operating systems.</li>
+<li>We forbid instructions that change the processor&#8217;s execution mode to
+Thumb, ThumbEE, or Jazelle. This would cause the code to be
+interpreted differently than the validator&#8217;s original 32-bit ARM
+disassembly, so the validator results might be invalidated.</li>
+<li>We forbid instructions that aren&#8217;t available to user code (i.e. have
+to be used by an operating system kernel). This is purely out of
+paranoia, because the hardware should prevent the instructions from
+working. Essentially, we consider it &#8220;suspicious&#8221; if a program
+contains these instructions&#8212;it might be trying to exploit a hardware
+bug.</li>
+<li>We forbid instructions, or variants of instructions, that are
+implementation-defined (&#8220;unpredictable&#8221;) or deprecated in the ARMv7-A
+architecture manual.</li>
+<li>Finally, we forbid a small number of instructions, such as <code>setend</code>,
+purely out of paranoia. It&#8217;s easier to loosen the validator&#8217;s
+restrictions than to tighten them, so we err on the side of rejecting
+safe instructions.</li>
+</ul>
+<p>If an instruction can&#8217;t be decoded at all within the ARMv7-A instruction
+set specification, it is forbidden.</p>
+<aside class="note">
+<p>Here is a list of instructions currently forbidden for security
+reasons (that is, excluding deprecated or undefined instructions):</p>
+<ul class="small-gap">
+<li><code>BLX</code> (immediate): always changes to Thumb mode.</li>
+<li><code>BXJ</code>: always changes to Jazelle mode.</li>
+<li><code>CPS</code>: not available to user code.</li>
+<li><code>LDM</code>, exception return version: not available to user code.</li>
+<li><code>LDM</code>, kernel version: not available to user code.</li>
+<li><code>LDR*T</code> (unprivileged load operations): theoretically harmless,
+but suspicious when found in user code. Use <code>LDR</code> instead.</li>
+<li><code>MSR</code>, kernel version: not available to user code.</li>
+<li><code>RFE</code>: not available to user code.</li>
+<li><code>SETEND</code>: theoretically harmless, but suspicious when found in
+user code. May make some future validator extensions difficult.</li>
+<li><code>SMC</code>: not available to user code.</li>
+<li><code>SRS</code>: not available to user code.</li>
+<li><code>STM</code>, kernel version: not available to user code.</li>
+<li><code>STR*T</code> (unprivileged store operations): theoretically harmless,
+but suspicious when found in user code. Use <code>STR</code> instead.</li>
+<li><code>SVC</code>/<code>SWI</code>: allows direct operating system interaction.</li>
+<li>Any unassigned hint instruction: difficult to reason about, so
+treated as suspicious.</li>
+</ul>
+<p>More details are available in the <a class="reference external" href="http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/validator_arm/armv7.table">ARMv7 instruction table definition</a>.</p>
+
+</aside>
+</section><section id="coprocessors">
+<h4 id="coprocessors">Coprocessors</h4>
+<p>ARM has traditionally added new instruction set features through
+coprocessors. Coprocessors are accessed through a small set of
+instructions, and often have their own register files. Floating point
+and the NEON vector extensions are both implemented as coprocessors, as
+is the MMU.</p>
+<p>We&#8217;re confident that the side-effects of coprocessors in slots 10 and 11
+(that is, floating point, NEON, etc.) are well-understood. These are in
+the coprocessor space reserved by ARM Ltd. for their own extensions
+(<code>CP8</code>&#8211;<code>CP15</code>), and are unlikely to change significantly. So, we
+allow untrusted code to use coprocessors 10 and 11, and we mandate the
+presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor
+Extension, VFPv4, FP16 and other extensions are allowed but not
+required, and may fail on processors that do not support them, it is
+therefore the program&#8217;s responsibility to validate their availability
+before executing them.</p>
+<p>We don&#8217;t allow access to any other ARM-reserved coprocessor
+(<code>CP8</code>&#8211;<code>CP9</code> or <code>CP12</code>&#8211;<code>CP15</code>). It&#8217;s possible that read
+access to <code>CP15</code> might be useful, and we might allow it in the
+future&#8212;but again, it&#8217;s easier to loosen the restrictions than tighten
+them, so we ban it for now.</p>
+<p>We do not, and probably never will, allow access to the vendor-specific
+coprocessor space, <code>CP0</code>&#8211;<code>CP7</code>. We&#8217;re simply not confident in our
+ability to model the operations on these coprocessors, given that
+vendors often leave them poorly-specified. Unfortunately this eliminates
+some legacy floating point and vector implementations, but these are
+superceded on ARMv7-A parts anyway.</p>
+</section><section id="validator-code">
+<h4 id="validator-code">Validator Code</h4>
+<p>By now you&#8217;re itching to see the sandbox validator&#8217;s code and dissect
+it. You&#8217;ll have a disapointing read: at less that 500 lines of code
+<a class="reference external" href="http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/validator_arm/validator.cc">validator.cc</a>
+is quite simple to understand and much shorter than this document. It&#8217;s
+of course dependent on the <a class="reference external" href="http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/validator_arm/armv7.table">ARMv7 instruction table definition</a>,
+which teaches it about the ARMv7 instruction set.</p>
+</section></section></section></section>
+
+{{/partials.standard_nacl_article}}

Powered by Google App Engine
This is Rietveld 408576698