OLD | NEW |
(Empty) | |
| 1 .. _x86-64-sandbox: |
| 2 |
| 3 ================================ |
| 4 NaCl SFI model on x86-64 systems |
| 5 ================================ |
| 6 |
| 7 .. contents:: |
| 8 :local: |
| 9 :backlinks: none |
| 10 :depth: 2 |
| 11 |
| 12 Summary |
| 13 ======= |
| 14 |
| 15 This document addresses the details of the Software Fault Isolation |
| 16 (SFI) model for executable code that can be run in Native Client on an |
| 17 x86-64 system. An overview of this model can be found in the paper: |
| 18 `Adapting Software Fault Isolation to Contemporary CPU Architectures |
| 19 <https://research.google.com/pubs/archive/35649.pdf>`_. |
| 20 The primary focus of the SFI model is a Windows x86-64 system but the |
| 21 same techniques can be applied to run identical x86-64 binaries on |
| 22 other x86-64 systems such as Linux, Mac, FreeBSD, etc, so the |
| 23 description of the SFI model tries to abstract away system |
| 24 dependencies when possible. |
| 25 |
| 26 Please note: throughout this document we use the AT&T notation for |
| 27 assembler syntax, in which the target operand appears last, e.g. ``mov |
| 28 src, dst``. |
| 29 |
| 30 Binary Format |
| 31 ============= |
| 32 |
| 33 The format of Native Client executable binaries is identical to the |
| 34 x86-64 ELF binary format (`[0] |
| 35 <http://en.wikipedia.org/wiki/Executable_and_Linkable_Format>`_, `[1] |
| 36 <http://www.sco.com/developers/devspecs/gabi41.pdf>`_, `[2] |
| 37 <http://www.sco.com/developers/gabi/latest/contents.html>`_, `[3] |
| 38 <http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf>`_) for |
| 39 Linux or BSD with a few extra requirements. The additional rules that |
| 40 a Native Client ELF binary must follow are: |
| 41 |
| 42 * The ELF magic OS ABI field must be 123. |
| 43 * The ELF magic OS ABI VERSION field must be 5. |
| 44 * The ELF e_flags field must be 0x200000 (32-byte alignment). |
| 45 * There must be exactly one PT_LOAD text segment. It must begin at |
| 46 0x20000 (128 kB) and be marked RX (no W). The contents of the text |
| 47 segment must follow :ref:`Text Segment Rules <x86-64-text-segment-rules>`. |
| 48 * There can be at most one PT_LOAD data segment marked R. |
| 49 * There can be at most one PT_LOAD data segment marked RW. |
| 50 * There can be at most one PT_GNU_STACK segment. It must be marked RW. |
| 51 * All segments must end before limit address (4 GiB). |
| 52 |
| 53 Runtime Invariants |
| 54 ================== |
| 55 |
| 56 To ensure fault isolation at runtime, the system must maintain a |
| 57 number of runtime *invariants* across the lifetime of the running |
| 58 program. Both the *Validator* and the *Service Runtime* are |
| 59 responsible for maintaining the invariants. See the paper for the |
| 60 rationale for the invariants: |
| 61 |
| 62 * ``RIP`` always points to valid instruction boundary (the validator must |
| 63 ensure this with direct jumps and direct calls). |
| 64 * ``R15`` (aka ``RBASE`` and ``RZP``) is never modified by code (the |
| 65 validator must ensure this). Low 32 bits of ``RZP`` are all zero |
| 66 (loader must ensure this). |
| 67 * ``RIP``, ``RBP`` and ``RSP`` are always in the **safe zone**: between |
| 68 ``R15`` and ``R15+4GiB``. |
| 69 |
| 70 * Exception: ``RSP`` and ``RBP`` are allowed to be in the range of |
| 71 ``0..4GiB`` inside *pseudo-instructions*: ``naclrestbp``, |
| 72 ``naclrestsp``, ``naclspadj``, ``naclasp``, ``naclssp``. |
| 73 |
| 74 * 84GiB are allocated for NaCl module (i.e. **untrusted region**): |
| 75 |
| 76 * ``R15-40GiB..R15`` and ``R15+4GIB..R15+44GiB`` are buffer zones with |
| 77 PROT_NONE flags. |
| 78 * The 4GB *safe zone* has pages with either PROT_WRITE or PROT_EXEC |
| 79 but must not have PROT_WRITE+PROT_EXEC pages. |
| 80 * All executable code in PROT_EXEC pages is validatable and |
| 81 guaranteed to obey the invariant. |
| 82 |
| 83 * Trampoline/springboard code is mapped to a non-writable region in |
| 84 the *untrusted 84GB region*; each trampoline/springboard is 32-byte |
| 85 aligned and fits within a single *bundle*. |
| 86 * The OS must not put any internal structures/code into the untrusted |
| 87 region at any time (not using OS dynamic linker, etc) |
| 88 |
| 89 .. _x86-64-text-segment-rules: |
| 90 |
| 91 Text Segment Rules |
| 92 ================== |
| 93 |
| 94 * The validation process must ensure that the text segment complies |
| 95 with the following rules. The validation process must complete |
| 96 successfully strictly before executing any instruction of the |
| 97 untrusted code. |
| 98 * The following instructions are illegal and must be rejected by the |
| 99 validator (the list is not exhaustive as the validator uses a |
| 100 whiteist, not a blacklist; this means there is a large but finite |
| 101 list of instructions the validator allows, not a small list of |
| 102 instructions the validator rejects): |
| 103 |
| 104 * any privileged instructions |
| 105 * ``mov`` to/from segment registers |
| 106 * ``int`` |
| 107 * ``pusha``/``popa`` (not dangerous but not needed for GCC) |
| 108 |
| 109 * There must be space for at least 32 bytes after the text segment and |
| 110 before the next segment in ELF (towards higher addresses) that ends |
| 111 strictly at a 64K boundary (a minimum page size for untrusted |
| 112 code). This space will be padded with HLT instructions as part of |
| 113 the validation process, along with the optional 64K page. |
| 114 * Neither instructions nor *pseudo-instructions* are permitted to span |
| 115 a 32-byte boundary. |
| 116 * The ELF entry address must be 32-byte aligned. |
| 117 * Direct ``CALL``/``JUMP`` targets: |
| 118 |
| 119 * must point to a valid instruction boundary |
| 120 * must not point into a *pseudo-instruction* |
| 121 * must not point between a *restricted register* (see below for |
| 122 definition) producer instruction and its corresponding restricted |
| 123 register consumer instruction. |
| 124 |
| 125 * ``CALL`` instructions must be 5 bytes before a 32-byte boundary, so |
| 126 that the return address will be 32-byte aligned. |
| 127 * Indirect call targets must be 32-byte aligned. Instead of indirect |
| 128 ``CALL``/``JMP`` x, use ``nacljmp`` and ``naclcall`` (see below for |
| 129 definitions of these *pseudo-instructions*) |
| 130 * All instructions that **read** or **write** from/to memory must use |
| 131 one of the four registers ``RZP``, ``RIP``, ``RBP`` or ``RSP`` as a |
| 132 base, restricted (see below) register index (multiplied by 0, 1, 2, |
| 133 4 or 8) and constant displacement (optional). |
| 134 |
| 135 * Exception to this rule: string instructions are allowed if used in |
| 136 following sequences (the sequences should not cross *bundle* |
| 137 boundaries; segment overrides are disallowed): |
| 138 |
| 139 .. naclcode:: |
| 140 :prettyprint: 0 |
| 141 |
| 142 mov %edi, %edi |
| 143 lea (%rZP,%rdi),%rdi |
| 144 [rep] stos ; other string instructions can be used here |
| 145 |
| 146 Note: this is identical to the *pseudo-instruction*: ``[rep] stos |
| 147 %?ax, %nacl:(%rdi),%rZP`` |
| 148 |
| 149 * An operand of a command is said to be a **restricted register** iff |
| 150 it is a register that is the target of a 32-bit move in the |
| 151 immediately-preceding command in the same *bundle* (consider the |
| 152 previous command as additional sandboxing prefix): |
| 153 |
| 154 .. naclcode:: |
| 155 :prettyprint: 0 |
| 156 |
| 157 ; any 32-bit register can be used here; the first operand is |
| 158 ; unrestricted but often is the same register |
| 159 mov ..., %eXX |
| 160 |
| 161 * Instructions capable of changing ``%RBP`` and ``%RSP`` are |
| 162 forbidden, except the instruction sequences in the whitelist below, |
| 163 which must not cross *bundle* boundaries: |
| 164 |
| 165 .. naclcode:: |
| 166 :prettyprint: 0 |
| 167 |
| 168 mov %rbp, %rsp |
| 169 mov %rsp, %rbp |
| 170 mov ..., %ebp |
| 171 ; restoration of %RBP from memory, register or stack - keeps the |
| 172 ; invariant intact |
| 173 add %rZP, %rbp |
| 174 mov ..., %esp |
| 175 ; restoration of %RSP from memory, register or stack - keeps the |
| 176 ; invariant intact |
| 177 add %rZP, %rsp |
| 178 lea xxx(%rbp), %esp |
| 179 add %rZP, %rsp ; restoration of %RSP from %RBP with adjust |
| 180 sub ..., %esp |
| 181 add %rZP, %rsp ; stack space allocation |
| 182 add ..., %esp |
| 183 add %rZP, %rsp ; stack space deallocation |
| 184 and $XX, %rsp ; alignment; XX must be between -128 and -1 |
| 185 pushq ... |
| 186 popq ... ; except pop %RSP, pop %RBP |
| 187 |
| 188 List of Pseudo-instructions |
| 189 =========================== |
| 190 |
| 191 Pseudo-instructions were introduced to let the compiler maintain the |
| 192 invariants without needing to know the code alignment rules. The |
| 193 assembler guarantees 32-bit alignment for all *pseudo-instructions* in |
| 194 the table below. In addition, to the pseudo-instructions, one |
| 195 pseudo-operand prefix is introduced: ``%nacl``. Presence of the |
| 196 ``%nacl`` operand prefix ensures that: |
| 197 |
| 198 * The instruction ``"%mov %eXX, %eXX"`` is added immediately before the |
| 199 actual command using prefix ``%nacl`` (where ``%eXX`` is a 32-bit |
| 200 part of the index register of the actual command, for example: in |
| 201 operand ``%nacl:(,%r11)``, the notation ``%eXX`` is referring to |
| 202 ``%r11d``) |
| 203 * The resulting sequence of two instructions does not cross the |
| 204 *bundle* boundary. |
| 205 |
| 206 For example, the instruction: |
| 207 |
| 208 .. naclcode:: |
| 209 :prettyprint: 0 |
| 210 |
| 211 mov %eax,%nacl:(%r15,%rdi,2) |
| 212 |
| 213 is translated by the assembler to: |
| 214 |
| 215 .. naclcode:: |
| 216 :prettyprint: 0 |
| 217 |
| 218 mov %edi,%edi |
| 219 mov %eax,(%r15,%rdi,2) |
| 220 |
| 221 The complete list of introduced *pseudo-instructions* is as follows: |
| 222 |
| 223 .. TODO(hamaji): Use rst's table instead of the raw HTML below. |
| 224 |
| 225 .. raw:: html |
| 226 |
| 227 <table border=1> |
| 228 <tbody> |
| 229 <tr> |
| 230 <td>Pseudo-instruction</td> |
| 231 <td>Is translated to<br/> |
| 232 </td> |
| 233 </tr> |
| 234 <tr> |
| 235 <td>[rep] cmps %nacl:(%rsi),%nacl:(%rdi),%rZP<br/> |
| 236 <i>(sandboxed cmps)</i><br/> |
| 237 </td> |
| 238 <td>mov %esi,%esi<br/> |
| 239 lea (%rZP,%rsi,1),%rsi<br/> |
| 240 mov %edi,%edi<br/> |
| 241 lea (%rZP,%rdi,1),%rdi<br/> |
| 242 [rep] cmps (%rsi),(%rdi)<i><br/> |
| 243 </i> |
| 244 </td> |
| 245 </tr> |
| 246 <tr> |
| 247 <td>[rep] movs %nacl:(%rsi),%nacl:(%rdi),%rZP<br/> |
| 248 <i>(sandboxed movs)</i><br/> |
| 249 </td> |
| 250 <td>mov %esi,%esi<br/> |
| 251 lea (%rZP,%rsi,1),%rsi<br/> |
| 252 mov %edi,%edi<br/> |
| 253 lea (%rZP,%rdi,1),%rdi<br/> |
| 254 [rep] movs (%rsi),(%rdi)<i><br/> |
| 255 </i> |
| 256 </td> |
| 257 </tr> |
| 258 <tr> |
| 259 <td>naclasp ...,%rZP<br/> |
| 260 <i>(sandboxed stack increment)</i></td> |
| 261 <td>add ...,%esp<br/> |
| 262 add %rZP,%rsp</td> |
| 263 </tr> |
| 264 <tr> |
| 265 <td>naclcall %eXX,%rZP<br/> |
| 266 <i>(sandboxed indirect call)</i></td> |
| 267 <td>and $-32, %eXX<br/> |
| 268 add %rZP, %rXX<br/> |
| 269 call *%rXX<br/> |
| 270 <i>Note: the assembler ensures all calls (including |
| 271 naclcall) will end at the bundle boundary.</i></td> |
| 272 </tr> |
| 273 <tr> |
| 274 <td>nacljmp %eXX,%rZP<br/> |
| 275 <i>(sandboxed indirect jump)</i></td> |
| 276 <td>and $-32,%eXX<br/> |
| 277 add %rZP,%rXX<br/> |
| 278 jmp *%rXX<br/> |
| 279 </td> |
| 280 </tr> |
| 281 <tr> |
| 282 <td>naclrestbp ...,%rZP<br/> |
| 283 <i>(sandboxed %ebp/rbp restore)</i></td> |
| 284 <td>mov ...,%ebp<br/> |
| 285 add %rZP,%rbp</td> |
| 286 </tr> |
| 287 <tr> |
| 288 <td>naclrestsp ...,%rZP |
| 289 <i>(sandboxed %esp/rsp restore)</i></td> |
| 290 <td>mov ...,%esp<br/> |
| 291 add %rZP,%rsp</td> |
| 292 </tr> |
| 293 <tr> |
| 294 <td>naclrestsp_noflags ...,%rZP |
| 295 <i>(sandboxed %esp/rsp restore)</i></td> |
| 296 <td>mov ...,%esp<br/> |
| 297 lea (%rsp,%rZP,1),%rsp</td> |
| 298 </tr> |
| 299 <tr> |
| 300 <td>naclspadj $N,%rZP<br/> |
| 301 <i>(sandboxed %esp/rsp restore from %rbp; incudes $N offset)</i></td> |
| 302 <td>lea N(%rbp),%esp<br/> |
| 303 add %rZP,%rsp</td> |
| 304 </tr> |
| 305 <tr> |
| 306 <td>naclssp ...,%rZP<br/> |
| 307 <i>(sandboxed stack decrement)</i></td> |
| 308 <td>sub ...,%esp<br/> |
| 309 add %rZP,%rsp</td> |
| 310 </tr> |
| 311 <tr> |
| 312 <td>[rep] scas %nacl:(%rdi),%?ax,%rZP<br/> |
| 313 <i>(sandboxed stos)</i></td> |
| 314 <td>mov %edi,%edi<br/> |
| 315 lea (%rZP,%rdi,1),%rdi<br/> |
| 316 [rep] scas (%rdi),%?ax<br/> |
| 317 </td> |
| 318 </tr> |
| 319 <tr> |
| 320 <td>[rep] stos %?ax,%nacl:(%rdi),%rZP<br/> |
| 321 <i>(sandboxed stos)</i></td> |
| 322 <td>mov %edi,%edi<br/> |
| 323 lea (%rZP,%rdi,1),%rdi<br/> |
| 324 [rep] stos %?ax,(%rdi)<br/> |
| 325 </td> |
| 326 </tr> |
| 327 </tbody> |
| 328 </table> |
OLD | NEW |