native_client_sdk/src/doc/reference/sandbox_internals/arm-32-bit-sandbox.rst - Issue 147803003: NaCl docs: add ARM 32-bit sandbox

Side by Side Diff: native_client_sdk/src/doc/reference/sandbox_internals/arm-32-bit-sandbox.rst

Issue 147803003: NaCl docs: add ARM 32-bit sandbox (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src

Patch Set: Created 6 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
(Empty)
	1 ==================

	2 ARM 32-bit Sandbox

	3 ==================

	4

	5 Native Client for ARM is a method for running programs---even malicious

	6 ones---safely, on computers that use 32-bit ARM processors. It's an

	7 extension of earlier work on Native Client for x86 processors. This

	8 security is provided with a low performance overhead of about 10% over

	9 regular ARM code, and as you'll see in this document the sandbox model

	10 is beautifully simple, meaning that the trusted codebase is much easier

	11 to validate.

	12

	13 As an implementation detail, the Native Client 32-bit ARM sandbox is

	14 currently used by Portable Native Client to execute code on 32-bit ARM

	15 machines in a safe manner. The portable bitcode contained in a pexe

	16 is translated to a 32-bit ARM nexe before execution. This may change

	17 at a point in time: Portable Native Client doesn't necessarily need this

	18 sandbox to execute code on ARM. Note that the Portable Native Client

	19 compiler itself is also untrusted: it too runs in the ARM sandbox

	20 described in this document.

	21

	22 On this page, we describe how Native Client works on 32-bit ARM. We

	23 assume no prior knowledge about the internals of Native Client, on x86

	24 or any other architecture, but we do assume some familiarity with

	25 assembly languages in general.

	26

	27 .. contents::

	28 :local:

	29 :backlinks: none

	30 :depth: 3

	31

	32 An Introduction to the ARM Architecture

	33 =======================================

	34

	35 In this section, we summarize the relevant parts of the ARM processor

	36 architecture.

	37

	38 About ARM and ARMv7-A

	39 ---------------------

	40

	41 ARM is one of the older commercial "RISC" processor designs, dating back

	42 to the early 1980s. Today, it is used primarily in embedded systems:

	43 everything from toys, to home automation, to automobiles. However, its

	44 most visible use is in cellular phones, tablets and some

	45 laptops.

	46

	47 Through the years, there have been many revisions of the ARM

	48 architecture, written as ARMv\ X for some version X. Native Client

	49 specifically targets the ARMv7-A architecture commonly used in high-end

	50 phones and smartbooks. This revision, defined in the mid-2000s, adds a

	51 number of useful instructions, and specifies some portions of the system

	52 that used to be left to individual chip manufacturers. Critically,

	53 ARMv7-A specifies the "eXecute Never" bit, or XN. This pagetable

	54 attribute lets us mark memory as non-executable. Our security relies on

	55 the presence of this feature.

	56

	57 ARMv8 adds a new 64-bit instruction set architecture called A64, while

	58 also enhancing the 32-bit A32 ISA. For Native Client's purposes the A32

	59 ISA is equivalent to the ARMv7 ARM ISA, albeit with a few new

	60 instructions. This document only discussed the 32-bit A32 instruction

	61 set: A64 would require a different sandboxing model.

	62

	63 ARM Programmer's Model

	64 ----------------------

	65

	66 While modern ARM chips support several instruction encodings, 32-bit

	67 Native Client on ARM focuses on a single one: a fixed-width encoding

	68 where every instruction is 32-bits wide called A32 (previously, and

	69 confusingly, called simply ARM). Thumb, Thumb2 (now confusingly called

	70 T32), Jazelle, ThumbEE and such aren't supported by Native Client. This

	71 dramatically simplifies some of our analyses, as we'll see later. Nearly

	72 every instruction can be conditionally executed based on the contents of

	73 a dedicated condition code register.

	74

	75 ARM processors have 16 general-purpose registers used for integer and

	76 memory operations, written ``r0`` through ``r15``. Of these, two have

	77 special roles baked in to the hardware:

	78

	79 * ``r14`` is the Link Register. The ARM call instruction

	80 (branch-with-link) doesn't use the stack directly. Instead, it

	81 stashes the return address in ``r14``. In other circumstances, ``r14``

	82 can be (and is!) used as a general-purpose register. When ``r14`` is

	83 playing its Link Register role, it's referred to as ``lr``.

	84 * ``r15`` is the Program Counter. While it can be read and written like

	85 any other register, setting it to a new value will cause execution to

	86 jump to a new address. Using it in some circumstances is also

	87 undefined by the ARM architecture. Because of this, ``r15`` is never

	88 used for anything else, and is referred to as ``pc``.

	89

	90 Other registers are given roles by convention. The only important

	91 registers to Native Client are ``r9`` and ``r13``, which are used as the

	92 Thread Pointer location and Stack Pointer. When playing this role,

	93 they're referred to as ``tp`` and ``sp``.

	94

	95 Like other RISC-inspired designs, ARM programs use explicit load and

	96 store instructions to access memory. All other instructions operate

	97 only on registers, or on registers and small constants called

	98 immediates. Because both instructions and data words are 32-bits, we

	99 can't simply embed a 32-bit number into an instruction. ARM programs use

	100 three methods to work around this, all of which Native Client exploits:

	101

	102 1. Many instructions can encode a modified immediate, which is an 8-bit

	103 number rotated right by an even number of bits.

	104 2. The ``movw`` and ``movt`` instructions can be used to set the top and

	105 bottom 16-bits of a register, and can therefore encode any 32-bit

	106 immediate.

	107 3. For values that can't be represented as modified immediates, ARM

	108 programs use ``pc``-relative loads to load data from inside the

	109 code---hidden in a place where it won't be executed such as "constant

	110 pools", just past the final return of a function.

	111

	112 We'll introduce more details of the ARM instruction set later, as we

	113 walk through the system.

	114

	115 The Native Client Approach

	116 ==========================

	117

	118 Native Client runs an untrusted program, potentially from an unknown or

	119 malicious source, inside a sandbox created by a trusted runtime. The

	120 trusted runtime allows the untrusted program to "call-out" and perform

	121 certain actions, such as drawing graphics, but prevents it from

	122 accessing the operating system directly. This "call-out" facility,

	123 called a trampoline, looks like a standard function call to the

	124 untrusted program, but it allows control to escape from the sandbox in a

	125 controlled way.

	126

	127 The untrusted program and trusted runtime inhabit the same process, or

	128 virtual address space, maintained by the operating system. To keep the

	129 trusted runtime behaving the way we expect, we must prevent the

	130 untrusted program from accessing and modifying its internals. Since they

	131 share a virtual address space, we can't rely on the operating system for

	132 this. Instead, we isolate the untrusted program from the trusted

	133 runtime.

	134

	135 Unlike modern operating systems, we use a cooperative isolation

	136 method. Native Client can't run any off-the-shelf program compiled for

	137 an off-the-shelf operating system. The program must be compiled to

	138 comply with Native Client's rules. The details vary on each platform,

	139 but in general, the untrusted program:

	140

	141 * Must not attempt to use certain forbidden instructions, such as system

	142 calls.

	143 * Must not attempt to modify its own code without abiding by Native

	144 Client's code modification rules.

	145 * Must not jump into the middle of an instruction group, or otherwise do

	146 tricky things to cause instructions to be interpreted multiple ways.

	147 * Must use special, strictly-defined instruction sequences to perform

	148 permitted but potentially dangerous actions. We call these sequences

	149 pseudo-instructions.

	150

	151 We can't simply take the program's word that it complies with these

	152 rules---we call it "untrusted" for a reason! Nor do we require it to be

	153 produced by a special compiler; in practice, we don't trust our

	154 compilers either. Instead, we apply a load-time validator that

	155 disassembles the program. The validator either proves that the program

	156 complies with our rules, or rejects it as unsafe. By keeping the rules

	157 simple, we keep the validator simple, small, and fast. We like to put

	158 our trust in small, simple things, and the validator is key to the

	159 system's security.

	160

	161 .. Note::

	162 :class: note

	163

	164 For the computationally-inclined, all our validators scale linearly in

	165 the size of the program.

	166

	167 NaCl/ARM: Pure Software Fault Isolation

	168 ---------------------------------------

	169

	170 In the original Native Client system for the x86, we used unusual

	171 hardware features of that processor (the segment registers) to isolate

	172 untrusted programs. This was simple and fast, but won't work on ARM,

	173 which has nothing equivalent. Instead, we use pure software fault

	174 isolation.

	175

	176 We use a fixed address space layout: the untrusted program gets the

	177 lowest gigabyte, addresses ``0`` through ``0x3FFFFFFF``. The rest of the

	178 address space holds the trusted runtime and the operating system. We

	179 isolate the program by requiring every load, store, and *indirect

	180 branch* (to an address in a register) to use a pseudo-instruction. The

	181 pseudo-instructions ensure that the address stays within the

	182 sandbox. The indirect branch pseudo-instruction, in turn, ensures that

	183 such branches won't split up other pseudo-instructions.

	184

	185 At either side of the sandbox, we place small (8KiB) guard

	186 regions. These are simply areas in the process's address space that are

	187 mapped without read, write, or execute permissions, so any attempt to

	188 access them for any reason---load, store, or jump---will cause a

	189 fault.

	190

	191 Finally, we ban the use of certain instructions, notably direct system

	192 calls. This is to ensure that the untrusted program can be run on any

	193 operating system supported by Native Client, and to prevent access to

	194 certain system features that might be used to subvert the sandbox. As a

	195 side effect, it helps to prevent programs from exploiting buggy

	196 operating system APIs.

	197

	198 Let's walk through the details, starting with the simplest part: load

	199 and store.

	200

	201 Loads and Stores

	202 ^^^^^^^^^^^^^^^^

	203

	204 All access to memory must be through load and store

	205 pseudo-instructions. These are simply a native load or store

	206 instruction, preceded by a guard instruction.

	207

	208 Each load or store pseudo-instruction is similar to the load shown

	209 below. We use abstract "placeholder" registers instead of specific

	210 numbered registers for the sake of discussion. ``rA`` is the register

	211 holding the address to load from. ``rD`` is the destination for the

	212 loaded data.

	213

	214 .. naclcode::

	215

	216 bic rA, #0xC0000000

	217 ldr rD, [rA]

	218

	219 The first instruction, ``bic``, clears the top two bits of ``rA``. In

	220 this case, that means that the value in ``rA`` is forced to an address

	221 inside our sandbox, between ``0`` and ``0x3FFFFFFF``, inclusive.

	222

	223 The second instruction, ``ldr``, uses the previously-sandboxed address

	224 to load a value. This address might not be the address that the program

	225 intended, and might cause an access to an unmapped memory location

	226 within the sandbox: ``bic`` forces the address to be valid, by clearing

	227 the top two bits. This is a no-op in a correct program.

	228

	229 This illustrates a common property of all Native Client systems: we aim

	230 for safety, not correctness. A program using an invalid address in

	231 ``rA`` here is simply broken, so we are free to do whatever we want to

	232 preserve safety. In this case the program might load an invalid (but

	233 safe) value, or cause a segmentation fault limited to the untrusted

	234 code.

	235

	236 Now, if we allowed arbitrary branches within the program, a malicious

	237 program could set up carefully-crafted values in ``rA``, and then jump

	238 straight to the ``ldr``. This is why we validate that programs never

	239 split pseudo-instructions.

	240

	241 Alternative Sandboxing
	binji 2014/02/06 21:23:39 It's not immediately clear to me if this sandboxin It's not immediately clear to me if this sandboxing is allowed, or just a potentially different way to do it. I assume it is allowed, but maybe a quick sentence like "Another valid way to load..." JF 2014/02/06 23:26:36 It's currently disallowed because NaCl can't reall Show quoted text On 2014/02/06 21:23:39, binji wrote: > It's not immediately clear to me if this sandboxing is allowed, or just a > potentially different way to do it. I assume it is allowed, but maybe a quick > sentence like "Another valid way to load..." It's currently disallowed because NaCl can't really half-allow it. PNaCl could, so the feature is still in. I'll update the documentation to make that clear by moving the note up and enhancing it.
	242 """"""""""""""""""""""

	243

	244 .. naclcode::

	245

	246 tst rA, #0xC0000000

	247 ldreq rD, [rA]

	248

	249 The first instruction, ``tst``, performs a bitwise-\ ``AND`` of ``rA`` and

	250 the modified immediate literal, ``0xC0000000``. It sets the condition

	251 flags based on the result, but does not write the result to a

	252 register. In particular, it sets the ``Z`` condition flag if the result

	253 was zero---if the two values had no one-bits in common. In this case,
	binji 2014/02/06 21:23:39 nit: "no one-bits" reads like "no one" bits to me. nit: "no one-bits" reads like "no one" bits to me. Maybe reword? JF 2014/02/06 23:26:36 Changed to "had no set bits in common". Show quoted text On 2014/02/06 21:23:39, binji wrote: > nit: "no one-bits" reads like "no one" bits to me. Maybe reword? Changed to "had no set bits in common".
	254 that means that the value in ``rA`` was an address inside our sandbox,

	255 between ``0`` and ``0x3FFFFFFF``, inclusive.

	256

	257 The second instruction, ``ldreq``, is a conditional load if equal. As we

	258 mentioned before, nearly all ARM instructions can be made

	259 conditional. In assembly language, we simply stick the desired condition

	260 on the end of the instruction's mnemonic name. Here, the condition is

	261 ``EQ``, which causes the instruction to execute only if the ``Z`` flag

	262 is set.

	263

	264 Thus, when the pseudo-instruction executes, the ``tst`` sets ``Z`` if

	265 (and only if) the value in ``rA`` is an address within the bounds of the

	266 sandbox, and then the ``ldreq`` loads if (and only if) it was. If ``rA``

	267 held an invalid address, the load does not execute, and ``rD`` is

	268 unchanged.

	269

	270 Addressing Modes

	271 """"""""""""""""

	272

	273 ARM has an unusually rich set of addressing modes. We allow all but one:

	274 register-indexed, where two registers are added to determine the

	275 address.

	276

	277 We permit simple load and store, as shown above. We also permit

	278 displacement, pre-index, and post-index memory operations:

	279

	280 .. naclcode::

	281

	282 bic rA, #0xC0000000

	283 ldr rD, [rA, #1234] ; this is fine

	284

	285 bic rA, #0xC0000000

	286 ldr rD, [rA, #1234]! ; also fine

	287

	288 bic rA, #0xC0000000

	289 ldr rD, [rA], #1234 ; looking good

	290

	291 In each case, we know ``rA`` points into the sandbox when the ``ldr``

	292 executes. We allow adding an immediate displacement to ``rA`` to

	293 determine the final address (as in the first two examples here) because

	294 the largest immediate displacement is ±4095 bytes, while our guard pages

	295 are 8192 bytes wide.

	296

	297 We also allow ARM's more unusual load and store instructions, such

	298 as load-multiple and store-multiple, etc.

	299

	300 Conditional Load and Store

	301 """"""""""""""""""""""""""""""

	302

	303 There's one problem with the pseudo-instructions shown above: they are

	304 unconditional (assuming ``rA`` is valid). ARM compilers regularly use

	305 conditional load and store, so we should support this in Native

	306 Client. We do so by defining alternate, predicable
	binji 2014/02/06 21:23:39 sp: predictable sp: predictable JF 2014/02/06 23:26:36 Done. Show quoted text On 2014/02/06 21:23:39, binji wrote: > sp: predictable Done.
	307 pseudo-instructions. Here is a conditional store

	308 (store-if-greater-than) using this pseudo-instruction sequence:

	309

	310 .. naclcode::

	311

	312 bicgt rA, #0xC0000000

	313 strgt rX, [rA, #123]

	314

	315 .. Note::

	316 :class: note

	317

	318 The ``tst``-based sequence is faster than the ``bic``-based sequence

	319 on modern ARM chips. It avoids a data dependency in the address

	320 register. This is why we keep both around. The ``tst``-based sequence

	321 unfortunately leaks information on some processors, and is therefore

	322 forbidden in these cases.
	binji 2014/02/06 21:23:39 This is a bit unclear. What happens if you try to This is a bit unclear. What happens if you try to use the tst-based sequence on these processors? Does it just not run? JF 2014/02/06 23:26:36 I moved this note up and clarified. Show quoted text On 2014/02/06 21:23:39, binji wrote: > This is a bit unclear. What happens if you try to use the tst-based sequence on > these processors? Does it just not run? I moved this note up and clarified.
	323

	324 The Stack Pointer, Thread Pointer, and Program Counter

	325 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	326

	327 Stack Pointer

	328 """""""""""""

	329

	330 In C-like languages, the stack is used to store return addresses during

	331 function calls, as well as any local variables that won't fit in

	332 registers. This makes stack operations very common.

	333

	334 Native Client does not require guard instructions on any load or

	335 store involving the stack pointer, ``sp``. This improves performance

	336 and reduces code size. However, ARM's stack pointer isn't special: it's

	337 just another register, called ``sp`` only by convention. To make it safe

	338 to use this register as a load or store address without guards, we

	339 add a rule: ``sp`` must always contain a valid address.

	340

	341 We enforce this rule by restricting the sorts of operations that

	342 programs can use to alter ``sp``. Programs can alter ``sp`` by adding or

	343 subtracting an immediate, as a side-effect of a load or store:

	344

	345 .. naclcode::

	346

	347 ldr rX, [sp], #4! ; loads from stack, then adds 4 to sp

	348

	349 pop {rX} ; equivalent

	350

	351 str rX, [sp, #1234]! ; adds 1234 to sp, then stores to stack

	352

	353 These are safe because, as we mentioned before, the largest immediate

	354 available in a load or store is ±4095. Even after adding or

	355 subtracting 4095, the stack pointer will still be within the sandbox or

	356 guard regions.
	binji 2014/02/06 21:23:39 But what happens when you execute one of these sp- But what happens when you execute one of these sp-updating operations three times? Isn't it possible that sp is of bounds in that case? Or am I misunderstanding: ldr rX, [sp], #4! ; loads from stack, then adds 4 to sp JF 2014/02/06 23:26:36 Correct, but the key here is that you also perform Show quoted text On 2014/02/06 21:23:39, binji wrote: > But what happens when you execute one of these sp-updating operations three > times? Isn't it possible that sp is of bounds in that case? > > Or am I misunderstanding: > > ldr rX, [sp], #4! ; loads from stack, then adds 4 to sp Correct, but the key here is that you also perform an access when you do this (either before or after the update), so doing these updates in a loop is perfectly fine because you won't be able to jump over the red zones.
	357

	358 Any other operation that alters ``sp`` must be followed by a guard

	359 instruction. The most common alterations, in practice, are addition and

	360 subtraction of arbitrary integers:

	361

	362 .. naclcode::

	363

	364 add sp, rX

	365 bic sp, #0xC0000000

	366

	367 The ``bic`` is similar to the one we used for conditional load and

	368 store, and serves exactly the same purpose: after it completes, ``sp``

	369 is a valid address.

	370

	371 .. Note::

	372 :class: note

	373

	374 Clever assembly programmers and compilers may want to use this

	375 "trusted" property of ``sp`` to emit more efficient code: in a hot

	376 loop instead of using ``sp`` as a stack pointer it can be temporarily

	377 used as an index pointer (e.g. to traverse an array). This avoids the

	378 extra ``bic`` whenever the pointer is updated in the loop.

	379

	380 Thread Pointer Loads

	381 """"""""""""""""""""

	382

	383 The thread pointer and IRT thread pointer are stored in the trusted

	384 address space. All uses and definitions of ``r9`` from untrusted code

	385 are forbidden except as follows:

	386

	387 .. naclcode::

	388

	389 ldr Rn, [r9] ; load use thread pointer.
	binji 2014/02/06 21:23:39 load use? load use? JF 2014/02/06 23:26:36 "user", already fixed. Show quoted text On 2014/02/06 21:23:39, binji wrote: > load use? "user", already fixed.
	390 ldr Rn, [r9, #4] ; load IRT thread pointer.

	391

	392 ``pc``-relative Loads

	393 """""""""""""""""""""

	394

	395 By extension, we also allow load through the ``pc`` without a

	396 mask. The explanation is quite similar:

	397

	398 * Our control-flow isolation rules mean that the ``pc`` will always

	399 point into the sandbox.

	400 * The maximum immediate displacement that can be used in a

	401 ``pc``-relative load is smaller than the width of the guard pages.

	402

	403 We do not allow ``pc``-relative stores, because they look suspiciously

	404 like self-modifying code, or any addressing mode that would alter the

	405 ``pc`` as a side effect of the load.

	406

	407 Indirect Branches

	408 ^^^^^^^^^^^^^^^^^

	409

	410 There are two types of control flow on ARM: direct and indirect. Direct

	411 control flow instructions have an embedded target address or

	412 offset. Indirect control flow instructions take their destination

	413 address from a register. The ``b`` (branch) and ``bl``
	binji 2014/02/06 21:23:39 (branch)? What is the (* ) syntax doing anyway? (branch)? What is the ( ) syntax doing anyway? JF 2014/02/06 23:26:36 The parens are just part of the text, no ReST mean Show quoted text On 2014/02/06 21:23:39, binji wrote: > (branch)? What is the ( *) syntax doing anyway? The parens are just part of the text, no ReST meaning. The stars put emphasis, I'm using it consistently for load/store/branch/call/return and such in this document when I talk about a conceptual CPU instruction (so, not an actual assembly instruction, but a concept that corresponds to CPU instruction(s)).
	414 (branch-with-link) instructions are direct branch and call,

	415 respectively. The ``bx`` (branch-exchange) and ``blx``

	416 (branch-with-link-exchange) are the indirect equivalents.

	417

	418 Because the program counter ``pc`` is simply another register, ARM also

	419 has many implicit indirect control flow instructions. Programs can

	420 operate on the ``pc`` using add or load, or even outlandish (and

	421 often specified as having unpredictable-behavior) things like multiply!

	422 In Native Client we ban all such instructions. Indirect control flow is

	423 exclusively through ``bx`` and ``blx``. Because all of ARM's control

	424 flow instructions are called branch instructions, we'll use the term

	425 indirect branch from here on, even though this includes things like

	426 virtual call, return, and the like.

	427

	428 The Trouble with Indirection

	429 """"""""""""""""""""""""""""

	430

	431 Indirect branch present two problems for Native Client:

	432

	433 * We must ensure that they don't send execution outside the sandbox.

	434 * We must ensure that they don't break up the instructions inside a

	435 pseudo-instruction, by landing on the second one.

	436

	437 .. Note::

	438 :class: note

	439

	440 On the x86 architectures we must also ensure that it doesn't land

	441 inside an instruction. This is unnecessary on ARM, where all

	442 instructions are 32-bit wide.

	443

	444 Checking both of these for direct branch is easy: the validator just

	445 pulls the (fixed) target address out of the instruction and checks what

	446 it points to.

	447

	448 The Native Client Solution: "Bundles"

	449 """""""""""""""""""""""""""""""""""""

	450

	451 For indirect branch, we can address the first problem by simply

	452 masking some high-order bits off the address, like we did for load and

	453 store. The second problem is more subtle. Detecting every possible

	454 route that every indirect branch might take is difficult. Instead, we

	455 take the approach pioneered by the original Native Client: we restrict

	456 the possible places that any indirect branch can land. On Native

	457 Client for ARM, indirect branch can target any address that has its

	458 bottom four bits clear---any address that's ``0 mod 16``. We call these

	459 16-byte chunks of code "bundles". The validator makes sure that no

	460 pseudo-instruction straddles a bundle boundary. Compilers must pad with`

	461 `nop``\ s to ensure that every pseudo-instruction fits entirely inside

	462 one bundle.

	463

	464 Here is the indirect branch pseudo-instruction. As you can see, it

	465 clears the top two and bottom four bits of the address:

	466

	467 .. naclcode::

	468

	469 bic rA, #0xC000000F

	470 bx rA

	471

	472 This particular pseudo-instruction (a ``bic`` followed by a ``bx``) is

	473 used for computed jumps in switch tables and returning from functions,

	474 among other uses. Recall that, under ARM's modified immediate rules, we

	475 can fit the constant ``0xC000000F`` into the ``bic`` instruction's

	476 immediate field: ``0xC000000F`` is the 8-bit constant ``0xFC``, rotated

	477 right by 4 bits.

	478

	479 The other useful variant is the indirect branch-with-link, which is

	480 the ARM equivalent to call:

	481

	482 .. naclcode::

	483

	484 bic rA, #0xC000000F

	485 blx rA

	486

	487 This is used for indirect function calls---commonly seen in C++ programs

	488 as virtual calls, but also for calling function pointers in C.

	489

	490 Note that both indirect branch pseudo-instructions use ``bic``, rather

	491 than the ``tst`` instruction we allow for load and store. There are

	492 two reasons for this:

	493

	494 1. Conditional branch is very common. Much more common than

	495 conditional load and store. If we supported an alternative

	496 ``tst``-based sequence for branch, it would be rare.

	497 2. There's no performance benefit to using ``tst`` here on modern ARM

	498 chips. Branch consumes its operands later in the pipeline than

	499 load and store (since they don't have to generate an address,

	500 etc) so this sequence doesn't stall.

	501

	502 .. Note::

	503 :class: note

	504

	505 At this point astute readers are wondering what the ``x`` in ``bx``

	506 and ``blx`` means. We told you it stood for "exchange", but exchange

	507 to what? ARM, for all the reduced-ness of its instruction set, can

	508 change execution mode from A32 (ARM) to T32 (Thumb) and back with

	509 these branch instructions, called interworking branch. Recall that

	510 A32 instructions are 32-bit wide, and T32 instructions are a mix of

	511 both 16-bit or 32-bit wide. The destination address given to a

	512 branch therefore cannot sensibly have its bottom bit set in either

	513 instruction set: that would be an unaligned instruction in both cases,

	514 and ARM simply doesn't support this. The bottom bit for the *indirect

	515 branch* was therefore cleverly recycled by the ARM architecture to

	516 mean "switch to T32 mode" when set!

	517

	518 As you've figured out by now, Native Client's sandbox won't be very

	519 happy if A32 instructions were to be executed as T32 instructions: who

	520 know what they correspond to? A malicious person could craft valid

	521 A32 code that's actually very naughty T32 code, somewhat like forming

	522 a sentence that happens to be valid in English and French but with

	523 completely different meanings, complimenting the reader in one

	524 language and insulting them in the other.
	binji 2014/02/06 21:23:39 I think I need an example of this sentence. :) I think I need an example of this sentence. :) JF 2014/02/06 23:26:36 I know right!!! I actually have a G+ post about th Show quoted text On 2014/02/06 21:23:39, binji wrote: > I think I need an example of this sentence. :) I know right!!! I actually have a G+ post about this, you should read it ;-) https://plus.google.com/+JFBastien/posts/eeRQWfUTEK2 Actually screw it I'll copy/paste it here for legacy, I'd like to have a better explanation of this but couldn't figure one out: Lazy +: is there a word to describe "a sentence that has one meaning in a language, and another meaning in a different language"? It doesn't necessarily have to be written, it could be phonetic. For programming languages this is apparently called a polyglot (since the program "speaks" multiple languages), but I'm asking specifically for natural languages. The only example I can remember at the moment won't make sense unless you're French Canadian and know Latin: "alacris apis hostis" would usually translate to "cheerful bee enemy", but phonetically it sounds like a fairly colorful French Canadian swear. For context, I'm trying to explain mode-switching attacks on ARM where innocuous-looking ARM instructions are actually malicious Thumb instructions. This isn't unlike hiding instructions within constants or within sub-instructions in x86. This matters because ARM has the wonderful BX and BLX indirect branch instructions which, when the bottom bit of the destination register is set, switch from ARM to Thumb mode. FWIW the ARM NaCl sandbox defends against this in a few ways. I've never heard of people doing such attacks because usually arbitrary code execution is its own ROPy means, but in the NaCl context an attacker may have extra very theoretical hoops to jump through, and may want to switch to Thumb to execute carefully-crafted malicious code spray that otherwise passed ARM validation.
	525

	526 You've figured out by now that the bundle alignment restrictions of

	527 the Native Client sandbox already take care of making this travesty

	528 impossible: by masking off the bottom 4 bits of the destination the

	529 interworking nature of ARM's indirect branch is completely avoided.

	530

	531 Call and Return

	532 """""""""""""""""""

	533

	534 On ARM, there is no call or return instruction. A call is simply a

	535 branch that just happen to load a return address into ``lr``, the link

	536 register. If the called function is a leaf (that is, if it calls no

	537 other functions before returning), it simply branches to the address

	538 stored in ``lr``:
	binji 2014/02/06 21:23:39 This is describing returning, correct? The way I r This is describing returning, correct? The way I read this it is not clear. JF 2014/02/06 23:26:36 Yes, I clarified. Show quoted text On 2014/02/06 21:23:39, binji wrote: > This is describing returning, correct? The way I read this it is not clear. Yes, I clarified.
	539

	540 .. naclcode::

	541

	542 bic lr, #0xC000000F

	543 bx lr

	544

	545 If the function called other functions, however, it had to spill ``lr``

	546 onto the stack. On x86, this is done implicitly, but it is explicit on

	547 ARM:

	548

	549 .. naclcode::

	550

	551 push { lr }

	552 ; ... some code here ...

	553 pop { lr }

	554 bic lr, #0xC000000F

	555 bx lr

	556

	557 There are two things to note about this code.

	558

	559 1. As we mentioned before, we don't allow arbitrary instructions to

	560 write to the Program Counter, ``pc``. Thus, while a traditional ARM

	561 program might have popped directly into ``pc`` to end the function,

	562 we require a pop into a register, followed by a pseudo-instruction.

	563 2. Function returns really are just indirect branch, with the same

	564 restrictions. This means that functions can only return to addresses

	565 that are bundle-aligned: ``0 mod 16``.

	566

	567 The implication here is that a call\ ---the branch that enter
	binji 2014/02/06 21:23:39 s/enter/enters/? s/enter/enters/? JF 2014/02/06 23:26:36 Done. Show quoted text On 2014/02/06 21:23:39, binji wrote: > s/enter/enters/? Done.
	568 functions---must be placed at the end of the bundle, so that the return

	569 address they generate is ``0 mod 16``. Otherwise, when we clear the

	570 bottom four bits, the program would enter an infinite loop! (Native

	571 Client doesn't try to prevent infinite loops, but the validator actually

	572 does check the alignment of calls. This is because, when we were writing

	573 the compiler, it was annoying to find out our calls were in the wrong

	574 place by having the program run forever!).
	binji 2014/02/06 21:23:39 "!)." looks weird to me. Isn't it supposed to be j "!)." looks weird to me. Isn't it supposed to be just "!)" JF 2014/02/06 23:26:36 Done. Show quoted text On 2014/02/06 21:23:39, binji wrote: > "!)." looks weird to me. Isn't it supposed to be just "!)" Done.
	575

	576 .. Note::

	577 :class: note

	578

	579 Properly balancing the CPU's call/return actually allows it to

	580 perform much better by allowing it to speculatively execute the return

	581 address' code. For more information on ARM's call/return stack see
	binji 2014/02/06 21:23:39 I've usually seen address's, but I defer to Andy o I've usually seen address's, but I defer to Andy on this. :) JF 2014/02/06 23:26:36 Leaving as-is for now. Show quoted text On 2014/02/06 21:23:39, binji wrote: > I've usually seen address's, but I defer to Andy on this. :) Leaving as-is for now.
	582 ARM's technical reference manual.

	583

	584 Literal Pools and Data Bundles

	585 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	586

	587 In the section where we described the ARM architecture, we mentioned

	588 ARM's unusual immediate forms. To restate:

	589

	590 * ARM instructions are fixed-length, 32-bits, so we can't have an

	591 instruction that includes an arbitrary 32-bit constant.

	592 * Many ARM instructions can include a modified immediate constant, which

	593 is flexible, but limited.

	594 * For any other value (particularly addresses), ARM programs explicitly

	595 load constants from inside the code itself.

	596

	597 .. Note::

	598 :class: note

	599

	600 ARMv7 introduces some instructions, ``movw`` and ``movt``, that try to

	601 address this by letting us directly load larger constants. Our

	602 toolchain uses this capability in some cases.

	603

	604 Here's a typical example of the use of a literal pool. ARM assemblers

	605 typically hide the details---this is the sort of code you'd see produced

	606 by a disassembler, but with more comments.

	607

	608 .. naclcode::

	609

	610 ; C equivalent: "table[3] = 4"

	611 ; 'table' is a static array of bytes.

	612 ldr r0, [pc, #124] ; load the address of the 'table'

	613 ; ("124" is the offset from here to the constant belo w)

	614 add r0, #3 ; add the immediate array index

	615 mov r1, #4 ; get the constant '4' into a register

	616 bic r0, #0xC0000000 ; mask our array address

	617 strb r1, [r0] ; store one byte

	618

	619 ...

	620

	621 .word table ; constant referenced above

	622

	623 Because table is a static array, the compiler knew its address at

	624 compile-time---but the address didn't fit in a modified immediate. (Most

	625 don't). So, instead of loading an immediate into ``r0`` with a ``mov``,

	626 we stashed the address in the code, generated its address using ``pc``,

	627 and loaded the constant. ARM compilers will typically group all the

	628 embedded data together into a literal pool. These typically live just

	629 past the end of functions, where they won't be executed.

	630

	631 This is an important trick in ARM code, so it's important to support it

	632 in Native Client... but there's a potential flaw. If we let programs

	633 contain arbitrary data, mingled in with the code, couldn't they hide

	634 malicious instructions this way?

	635

	636 The answer is no, because the validator disassembles the entire

	637 executable region of the program, without regard to whether the

	638 programmer said a certain chunk was code or data. But this brings the

	639 opposite problem: what if the program needs to contain a certain

	640 constant that just happens to encode a malicious instruction? We want

	641 to allow this, but we have to be certain it will never be executed as

	642 code!

	643

	644 Data Bundles to the Rescue

	645 """"""""""""""""""""""""""

	646

	647 As we discussed in the last section, ARM code in Native Client is

	648 structured in 16-byte bundles. We allow literal pools by putting them in

	649 special bundles, called data bundles. Each data bundle can contain 12

	650 bytes of arbitrary data, and the program can have as many data bundles

	651 as it likes.

	652

	653 Each data bundle starts with a breakpoint instruction, ``bkpt``. This

	654 way, if an indirect branch tries to enter the data bundle, the process

	655 will take a fault and the trusted runtime will intervene (by terminating

	656 the program). For example:

	657

	658 .. naclcode::

	659

	660 bkpt #0x5BE0 ; must be aligned 0 mod 16!

	661 .word 0xDEADBEEF ; arbitrary constants are A-ok

	662 svc #30 ; trying to make a syscall? ok!

	663 str r0, [r1] ; unmasked stores are fine too

	664

	665 So, we have a way for programs to create an arbitrary, even dangerous,

	666 chunk of data within their code. We can prevent indirect branch from

	667 entering it. We can also prevent fall-through from the code just before

	668 it, by the ``bkpt``. But what about direct branch straight into the

	669 middle?

	670

	671 The validator detects all data bundles (because this ``bkpt`` has a

	672 special encoding) and marks them as off-limits for direct branch. If

	673 it finds a direct branch into a data bundle, the entire program is

	674 rejected as unsafe. Because direct branch cannot be modified at

	675 runtime, the data bundles cannot be executed.

	676

	677 .. Note::

	678 :class: note

	679

	680 Clever readers may wonder: why use ``bkpt #0x5BE0``, that seems

	681 awfully specific when you just need a special "roadblock" instruction!

	682 Quite true, young Padawan! It happens that this odd ``bkpt``

	683 instruction is encoded as ``0xE125BE70`` in A32, and in T32 the

	684 ``bkpt`` instruction is encoded as ``0xBExx`` (where ``xx`` could be

	685 any 8-bit immediate, say ``0x70``) and ``0xE125`` encodes the branch

	686 instruction ``b.n #0x250``. The special roadblock instruction

	687 therefore doubles as a roadblock in T32, if anything were to go so

	688 awry that we tried to execute it as a T32 instruction! Much defense,

	689 such depth, wow!

	690

	691 Trampolines and Memory Layout

	692 -----------------------------

	693

	694 So far, the rules we've described make for boring programs: they can't

	695 communicate with the outside world!

	696

	697 * The program can't call an external library, or the operating system,

	698 even to do something simple like draw some pixels on the screen.

	699 * It also can't read or write memory outside of its dedicated sandbox,

	700 so communicating that way is right out.

	701

	702 We fix this by allowing the untrusted program to call into the trusted

	703 runtime using a trampoline. A trampoline is simply a short stretch of

	704 code, placed by the trusted runtime at a known location within the

	705 sandbox, that is permitted to do things the untrusted program can't.

	706

	707 Even though trampolines are inside the sandbox, the untrusted program

	708 can't modify them: the trusted runtime marks them read-only. It also

	709 can't do anything clever with the special instructions inside the

	710 trampoline---for example, call it at a slightly offset address to bypass

	711 some checks---because the validator only allows trampolines to be

	712 reached by indirect branch (or branch-with-link). We structure the

	713 trampolines carefully so that they're safe to enter at any ``0 mod 16``

	714 address.

	715

	716 The validator can detect attempts to use the trampolines because they're

	717 loaded at a fixed location in memory. Let's look at the memory map of

	718 the Native Client sandbox.

	719

	720 Memory Map

	721 ^^^^^^^^^^

	722

	723 The ARM sandbox is always at virtual address ``0``, and is exactly 1GiB

	724 in size. This includes the untrusted program's code and data, the

	725 trampolines, and a small guard region to detect null pointer

	726 dereferences. In practice, the untrusted program takes up a bit more

	727 room than this, because of the need for additional guard regions at

	728 either end of the sandbox.

	729

	730 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	731 \| Address \| Size \| Name \| Purpose \|

	732 +================+=======+===================+================================== ==================================+

	733 \| ``-0x2000`` \| 8KiB \| Bottom Guard \| Keeps negative-displacement load or store from escaping. \|

	734 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	735 \| ``0`` \| 64KiB \| Null Guard \| Catches null pointer dereferences , guards against kernel exploits. \|

	736 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	737 \| ``0x10000`` \| 64KiB \| Trampolines \| Up to 2048 unique syscall entry p oints. \|

	738 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	739 \| ``0x20000`` \| ~1GiB \| Untrusted Sandbox \| Contains untrusted code, followed by its heap/stack/memory. \|

	740 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	741 \| ``0x40000000`` \| 8KiB \| Top Guard \| Keeps positive-displacement load or store from escaping. \|

	742 +----------------+-------+-------------------+---------------------------------- ----------------------------------+

	743

	744 Within the trampolines, the untrusted program can call any address

	745 that's ``0 mod 16``. However, only even slots are used, so useful

	746 trampolines are always ``0 mod 32``. If the program calls an odd slot,

	747 it will fault, and the trusted runtime will shut it down.

	748

	749 .. Note::

	750 :class: note

	751

	752 This is a bit of speculative flexibility. While the current bundle

	753 size of Native Client on ARM is 16 bytes, we've considered the

	754 possibility of optional 32-byte bundles, to enable certain compiler

	755 improvements. While this option isn't available to untrusted programs

	756 today, we're trying to keep the system "32-byte clean".

	757

	758 Inside a Trampoline

	759 ^^^^^^^^^^^^^^^^^^^

	760

	761 When we introduced trampolines, we mentioned that they can do things

	762 that untrusted programs can't. To be more specific, trampolines can jump

	763 to locations outside the sandbox. On ARM, this is all they do. Here's a

	764 typical trampoline fragment on ARM:

	765

	766 .. naclcode::

	767

	768 ; Even trampoline bundle:

	769 push { r0-r3 } ; Save arguments that may be in registers.

	770 push { lr } ; Save the untrusted return address.

	771 ; (This is a separate step because it must be on top.)

	772 ldr r0, [pc, #4] ; Load the destination address from the next bundle.

	773 blx r0 ; Go!

	774

	775 ; The odd trampoline that immediately follows:

	776 bkpt 0x5be0 ; Prevent entry to this data bundle.

	777 .word address_of_routine

	778

	779 The only odd thing here is that we push the incoming value of ``lr``,

	780 and then use ``blx``--not ``bx``---to escape the sandbox. This is

	781 because, in practice, all trampolines jump to the same routine in the

	782 trusted runtime, called the syscall hook. It uses the return address

	783 produced by the final ``blx`` instruction to determine which trampoline

	784 was called.

	785

	786 Loose Ends

	787 ----------

	788

	789 Forbidden Instructions

	790 ^^^^^^^^^^^^^^^^^^^^^^

	791

	792 To complete the sandbox, the validator ensures that the program does not

	793 try to use certain forbidden instructions.

	794

	795 * We forbid instructions that directly interact with the operating

	796 system by going around the trusted runtime. We prevent this to limit

	797 the functionality of the untrusted program, and to ensure portability

	798 across operating systems.

	799 * We forbid instructions that change the processor's execution mode to

	800 Thumb, ThumbEE, or Jazelle. This would cause the code to be

	801 interpreted differently than the validator's original 32-bit ARM

	802 disassembly, so the validator results might be invalidated.

	803 * We forbid instructions that aren't available to user code (i.e. have

	804 to be used by an operating system kernel). This is purely out of

	805 paranoia, because the hardware should prevent the instructions from

	806 working. Essentially, we consider it "suspicious" if a program

	807 contains these instructions---it might be trying to exploit a hardware

	808 bug.

	809 * We forbid instructions, or variants of instructions, that are

	810 implementation-defined ("unpredictable") or deprecated in the ARMv7-A

	811 architecture manual.

	812 * Finally, we forbid a small number of instructions, such as ``setend``,

	813 purely out of paranoia. It's easier to loosen the validator's

	814 restrictions than to tighten them, so we err on the side of rejecting

	815 safe instructions.

	816

	817 If an instruction can't be decoded at all within the ARMv7-A instruction

	818 set specification, it is forbidden.

	819

	820 .. Note::

	821 :class: note

	822

	823 Here is a list of instructions currently forbidden for security

	824 reasons (that is, excluding deprecated or undefined instructions):

	825

	826 * ``BLX`` (immediate): always changes to Thumb mode.

	827 * ``BXJ``: always changes to Jazelle mode.

	828 * ``CPS``: not available to user code.

	829 * ``LDM``, exception return version: not available to user code.

	830 * ``LDM``, kernel version: not available to user code.

	831 * ``LDR*T`` (unprivileged load operations): theoretically harmless,

	832 but suspicious when found in user code. Use ``LDR`` instead.

	833 * ``MSR``, kernel version: not available to user code.

	834 * ``RFE``: not available to user code.

	835 * ``SETEND``: theoretically harmless, but suspicious when found in

	836 user code. May make some future validator extensions difficult.

	837 * ``SMC``: not available to user code.

	838 * ``SRS``: not available to user code.

	839 * ``STM``, kernel version: not available to user code.

	840 * ``STR*T`` (unprivileged store operations): theoretically harmless,

	841 but suspicious when found in user code. Use ``STR`` instead.

	842 * ``SVC``/``SWI``: allows direct operating system interaction.

	843 * Any unassigned hint instruction: difficult to reason about, so

	844 treated as suspicious.

	845

	846 More details are available in the `ARMv7 instruction table definition

	847 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trus ted/validator_arm/armv7.table>`_.

	848

	849 Coprocessors

	850 ^^^^^^^^^^^^

	851

	852 ARM has traditionally added new instruction set features through

	853 coprocessors. Coprocessors are accessed through a small set of

	854 instructions, and often have their own register files. Floating point

	855 and the NEON vector extensions are both implemented as coprocessors, as

	856 is the MMU.

	857

	858 We're confident that the side-effects of coprocessors in slots 10 and 11

	859 (that is, floating point, NEON, etc.) are well-understood. These are in

	860 the coprocessor space reserved by ARM Ltd. for their own extensions

	861 (``CP8``--\ ``CP15``), and are unlikely to change significantly. So, we

	862 allow untrusted code to use coprocessors 10 and 11, and we mandate the

	863 presence of at least VFPv3 and NEON/AdvancedSIMD. Multiprocessor

	864 Extension, VFPv4, FP16 and other extensions are allowed but not

	865 required, and may fail on processors that do not support them, it is

	866 therefore the program's responsibility to validate their availability

	867 before executing them.

	868

	869 We don't allow access to any other ARM-reserved coprocessor

	870 (``CP8``--\ ``CP9`` or ``CP12``--\ ``CP15``). It's possible that read

	871 access to ``CP15`` might be useful, and we might allow it in the

	872 future---but again, it's easier to loosen the restrictions than tighten

	873 them, so we ban it for now.

	874

	875 We do not, and probably never will, allow access to the vendor-specific

	876 coprocessor space, ``CP0``--\ ``CP7``. We're simply not confident in our

	877 ability to model the operations on these coprocessors, given that

	878 vendors often leave them poorly-specified. Unfortunately this eliminates

	879 some legacy floating point and vector implementations, but these are

	880 superceded on ARMv7-A parts anyway.

	881

	882 Validator Code

	883 ^^^^^^^^^^^^^^

	884

	885 By now you're itching to see the sandbox validator's code and dissect

	886 it. You'll have a disapointing read: at less that 500 lines of code

	887 `validator.cc

	888 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/validator.cc>`_

	889 is quite simple to understand and much shorter than this document. It's

	890 of course dependent on the `ARMv7 instruction table definition

	891 <http://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/truste d/validator_arm/armv7.table>`_,

	892 which teaches it about the ARMv7 instruction set.

OLD	NEW

« no previous file with comments | « native_client_sdk/src/doc/_book.yaml ('k') | native_client_sdk/src/doc/reference/sandbox_internals/index.rst » ('j') | no next file with comments »