OLD | NEW |
---|---|
(Empty) | |
1 .. _x86-64-sandbox: | |
2 | |
3 ================================ | |
4 NaCl SFI model on x86-64 systems | |
5 ================================ | |
6 | |
7 .. contents:: | |
8 :local: | |
9 :backlinks: none | |
10 :depth: 2 | |
11 | |
12 Summary | |
13 ======= | |
14 | |
15 This document addresses the details of the Software Fault Isolation | |
16 (SFI) model for executable code that can be run in Native Client on an | |
17 x86-64 system. An overview of this model can be found in the paper: | |
18 `Adapting Software Fault Isolation to Contemporary CPU Architectures | |
19 <https://research.google.com/pubs/archive/35649.pdf>`_. | |
20 The primary focus of the SFI model is a Windows x86-64 system but the | |
21 same techniques can be applied to run identical x86-64 binaries on | |
22 other x86-64 systems such as Linux, Mac, FreeBSD, etc, so the | |
23 description of the SFI model tries to abstract away system | |
24 dependencies when possible. | |
25 | |
26 Please note: throughout this document we use the AT&T notation for | |
27 assembler syntax, in which the target operand appears last, e.g. ``mov | |
28 src, dst``. | |
29 | |
30 Binary Format | |
31 ============= | |
32 | |
33 The format of Native Client executable binaries is identical to the | |
34 x86-64 ELF binary format (`[0] | |
35 <http://en.wikipedia.org/wiki/Executable_and_Linkable_Format>`_, `[1] | |
36 <http://www.sco.com/developers/devspecs/gabi41.pdf>`_, `[2] | |
37 <http://www.sco.com/developers/gabi/latest/contents.html>`_, `[3] | |
38 <http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf>`_) for | |
39 Linux or BSD with a few extra requirements. The additional rules that | |
40 a Native Client ELF binary must follow are: | |
41 | |
42 * The ELF magic OS ABI field must be 123. | |
43 * The ELF magic OS ABI VERSION field must be 5. | |
44 * The ELF e_flags field must be 0x200000 (32-byte alignment). | |
45 * There must be exactly one PT_LOAD text segment. It must begin at | |
46 0x20000 (128 kB) and be marked RX (no W). The contents of the text | |
47 segment must follow :ref:`Text Segment Rules <x86-64-text-segment-rules>`. | |
48 * There can be at most one PT_LOAD data segment marked R. | |
49 * There can be at most one PT_LOAD data segment marked RW. | |
50 * There can be at most one PT_GNU_STACK segment. It must be marked RW. | |
51 * All segments must end before limit address (4 GiB). | |
52 | |
53 Runtime Invariants | |
54 ================== | |
55 | |
56 To ensure fault isolation at runtime, the system must maintain a | |
57 number of runtime *invariants* across the lifetime of the running | |
58 program. Both the *Validator* and the *Service Runtime* are | |
59 responsible for maintaining the invariants. See the paper for the | |
60 rationale for the invariants: | |
61 | |
62 * ``RIP`` always points to valid instruction boundary (the validator must | |
63 ensure this with direct jumps and direct calls). | |
64 * ``R15`` (aka ``RBASE`` and ``RZP``) is never modified by code (the | |
65 validator must ensure this). Low 32 bits of ``RZP`` are all zero | |
66 (loader must ensure this). | |
67 * ``RIP``, ``RBP`` and ``RSP`` are always in the **safe zone**: between | |
68 ``R15`` and ``R15+4GiB``. | |
69 | |
70 * Exception: ``RSP`` and ``RBP`` are allowed to be in the range of | |
71 ``0..4GiB`` inside *pseudo-instructions*: ``naclrestbp``, | |
72 ``naclrestsp``, ``naclspadj``, ``naclasp``, ``naclssp``. | |
73 | |
74 * 84GiB are allocated for NaCl module (i.e. **untrusted region**): | |
75 | |
76 * ``R15-40GiB..R15`` and ``R15+4GIB..R15+44GiB`` are buffer zones with | |
77 PROT_NONE flags. | |
78 * The 4GB *safe zone* has pages with either PROT_WRITE or PROT_EXEC | |
79 but must not have PROT_WRITE+PROT_EXEC pages. | |
80 * All executable code in PROT_EXEC pages is validatable and | |
81 guaranteed to obey the invariant. | |
82 | |
83 * Trampoline/springboard code is mapped to a non-writable region in | |
84 the *untrusted 84GB region*; each trampoline/springboard is 32-byte | |
85 aligned and fits within a single *bundle*. | |
86 * The OS must not put any internal structures/code into the untrusted | |
87 region at any time (not using OS dynamic linker, etc) | |
88 | |
89 .. _x86-64-text-segment-rules: | |
90 | |
91 Text Segment Rules | |
92 ================== | |
93 | |
94 * The validation process must ensure that the text segment complies | |
95 with the following rules. The validation process must complete | |
96 successfully strictly before executing any instruction of the | |
97 untrusted code. | |
98 * The following instructions are illegal and must be rejected by the | |
99 validator (the list is not exhaustive as the validator uses a | |
100 whiteist, not a blacklist; this means there is a large but finite | |
101 list of instructions the validator allows, not a small list of | |
102 instructions the validator rejects): | |
103 | |
104 * any privileged instructions | |
105 * ``mov`` to/from segment registers | |
106 * ``int`` | |
107 * ``pusha``/``popa`` (not dangerous but not needed for GCC) | |
108 | |
109 * There must be space for at least 32 bytes after the text segment and | |
110 before the next segment in ELF (towards higher addresses) that ends | |
111 strictly at a 64K boundary (a minimum page size for untrusted | |
112 code). This space will be padded with HLT instructions as part of | |
113 the validation process, along with the optional 64K page. | |
114 * Neither instructions nor *pseudo-instructions* are permitted to span | |
115 a 32-byte boundary. | |
116 * The ELF entry address must be 32-byte aligned. | |
117 * Direct ``CALL``/``JUMP`` targets: | |
118 | |
119 * must point to a valid instruction boundary | |
120 * must not point into a *pseudo-instruction* | |
121 * must not point between a *restricted register* (see below for | |
122 definition) producer instruction and its corresponding restricted | |
123 register consumer instruction. | |
124 | |
125 * ``CALL`` instructions must be 5 bytes before a 32-byte boundary, so | |
126 that the return address will be 32-byte aligned. | |
127 * Indirect call targets must be 32-byte aligned. Instead of indirect | |
128 ``CALL``/``JMP`` x, use ``nacljmp`` and ``naclcall`` (see below for | |
129 definitions of these *pseudo-instructions*) | |
130 * All instructions that **read** or **write** from/to memory must use | |
131 one of the four registers ``RZP``, ``RIP``, ``RBP`` or ``RSP`` as a | |
132 base, restricted (see below) register index (multiplied by 0, 1, 2, | |
133 4 or 8) and constant displacement (optional). | |
134 | |
135 * Exception to this rule: string instructions are allowed if used in | |
136 following sequences (the sequences should not cross *bundle* | |
137 boundaries; segment overrides are disallowed): | |
138 | |
139 .. naclcode:: | |
140 :prettyprint: 0 | |
141 | |
142 mov %edi, %edi | |
143 lea (%rZP,%rdi),%rdi | |
144 [rep] stos ; (other string instructions can be used here) | |
JF
2014/06/13 06:56:05
Can you remove the parenthesis here.
hamaji
2014/06/13 14:55:25
Done.
| |
145 | |
146 Note: this is identical to the *pseudo-instruction*: ``[rep] stos | |
147 %?ax, %nacl:(%rdi),%rZP`` | |
148 | |
149 * An operand of a command is said to be a **restricted register** iff | |
150 it is a register that is the target of a 32-bit move in the | |
151 immediately-preceding command in the same *bundle* (consider the | |
152 previous command as additional sandboxing prefix): | |
153 | |
154 .. naclcode:: | |
155 :prettyprint: 0 | |
156 | |
157 ; any 32-bit register can be used here; the first operand is | |
158 ; unrestricted but often is the same register) | |
JF
2014/06/13 06:56:05
Same.
hamaji
2014/06/13 14:55:25
Done.
| |
159 mov ..., %eXX | |
160 | |
161 * Instructions capable of changing ``%RBP`` and ``%RSP`` are | |
162 forbidden, except the instruction sequences in the whitelist below, | |
163 which must not cross *bundle* boundaries: | |
164 | |
165 .. naclcode:: | |
166 :prettyprint: 0 | |
167 | |
168 mov %rbp, %rsp | |
169 mov %rsp, %rbp | |
170 mov ..., %ebp | |
171 add %rZP, %rbp ; (restoration of %RBP from memory, register or stack - keep s the invariant intact) | |
172 mov ..., %esp | |
173 add %rZP, %rsp ; (restoration of %RSP from memory, register or stack - keep s the invariant intact) | |
JF
2014/06/13 06:56:05
Could you line-wrap the two lines above, and remov
hamaji
2014/06/13 14:55:25
Done.
| |
174 lea xxx(%rbp), %esp | |
175 add %rZP, %rsp ; (restoration of %RSP from %RBP with adjust) | |
176 sub ..., %esp | |
177 add %rZP, %rsp ; (stack space allocation) | |
178 add ..., %esp | |
179 add %rZP, %rsp ; (stack space deallocation) | |
180 and $XX, %rsp ; (alignment; XX must be between -128 and -1) | |
181 pushq ... | |
182 popq ... ; (except pop %RSP, pop %RBP) | |
183 | |
184 List of Pseudo-instructions | |
185 =========================== | |
186 | |
187 Pseudo-instructions were introduced to let the compiler maintain the | |
188 invariants without needing to know the code alignment rules. The | |
189 assembler guarantees 32-bit alignment for all *pseudo-instructions* in | |
190 the table below. In addition, to the pseudo-instructions, one | |
191 pseudo-operand prefix is introduced: ``%nacl``. Presence of the | |
192 ``%nacl`` operand prefix ensures that: | |
193 | |
194 * The instruction ``"%mov %eXX, %eXX"`` is added immediately before the | |
195 actual command using prefix ``%nacl`` (where ``%eXX`` is a 32-bit | |
196 part of the index register of the actual command, for example: in | |
197 operand ``%nacl:(,%r11)``, the notation ``%eXX`` is referring to | |
198 ``%r11d``) | |
199 * The resulting sequence of two instructions does not cross the | |
200 *bundle* boundary. | |
201 | |
202 For example, the instruction: | |
203 | |
204 .. naclcode:: | |
205 :prettyprint: 0 | |
206 | |
207 mov %eax,%nacl:(%r15,%rdi,2) | |
208 | |
209 is translated by the assembler to: | |
210 | |
211 .. naclcode:: | |
212 :prettyprint: 0 | |
213 | |
214 mov %edi,%edi | |
215 mov %eax,(%r15,%rdi,2) | |
216 | |
217 The complete list of introduced *pseudo-instructions* is as follows: | |
218 | |
219 .. TODO(hamaji): Use rst's table instead of the raw HTML below. | |
220 | |
221 .. raw:: html | |
222 | |
223 <table border=1> | |
224 <tbody> | |
225 <tr> | |
226 <td>Pseudo-instruction</td> | |
227 <td>Is translated to<br/> | |
228 </td> | |
229 </tr> | |
230 <tr> | |
231 <td>[rep] cmps %nacl:(%rsi),%nacl:(%rdi),%rZP<br/> | |
232 <i>(sandboxed cmps)</i><br/> | |
233 </td> | |
234 <td>mov %esi,%esi<br/> | |
235 lea (%rZP,%rsi,1),%rsi<br/> | |
236 mov %edi,%edi<br/> | |
237 lea (%rZP,%rdi,1),%rdi<br/> | |
238 [rep] cmps (%rsi),(%rdi)<i><br/> | |
239 </i> | |
240 </td> | |
241 </tr> | |
242 <tr> | |
243 <td>[rep] movs %nacl:(%rsi),%nacl:(%rdi),%rZP<br/> | |
244 <i>(sandboxed movs)</i><br/> | |
245 </td> | |
246 <td>mov %esi,%esi<br/> | |
247 lea (%rZP,%rsi,1),%rsi<br/> | |
248 mov %edi,%edi<br/> | |
249 lea (%rZP,%rdi,1),%rdi<br/> | |
250 [rep] movs (%rsi),(%rdi)<i><br/> | |
251 </i> | |
252 </td> | |
253 </tr> | |
254 <tr> | |
255 <td>naclasp ...,%rZP<br/> | |
256 <i>(sandboxed stack increment)</i></td> | |
257 <td>add ...,%esp<br/> | |
258 add %rZP,%rsp</td> | |
259 </tr> | |
260 <tr> | |
261 <td>naclcall %eXX,%rZP<br/> | |
262 <i>(sandboxed indirect call)</i></td> | |
263 <td>and $-32, %eXX<br/> | |
264 add %rZP, %rXX<br/> | |
265 call *%rXX<br/> | |
266 <i>Note: the assembler ensures all calls (including | |
267 naclcall) will end at the bundle boundary.</i></td> | |
268 </tr> | |
269 <tr> | |
270 <td>nacljmp %eXX,%rZP<br/> | |
271 <i>(sandboxed indirect jump)</i></td> | |
272 <td>and $-32,%eXX<br/> | |
273 add %rZP,%rXX<br/> | |
274 jmp *%rXX<br/> | |
275 </td> | |
276 </tr> | |
277 <tr> | |
278 <td>naclrestbp ...,%rZP<br/> | |
279 <i>(sandboxed %ebp/rbp restore)</i></td> | |
280 <td>mov ...,%ebp<br/> | |
281 add %rZP,%rbp</td> | |
282 </tr> | |
283 <tr> | |
284 <td>naclrestsp ...,%rZP | |
285 <i>(sandboxed %esp/rsp restore)</i></td> | |
286 <td>mov ...,%esp<br/> | |
287 add %rZP,%rsp</td> | |
288 </tr> | |
289 <tr> | |
290 <td>naclrestsp_noflags ...,%rZP | |
291 <i>(sandboxed %esp/rsp restore)</i></td> | |
292 <td>mov ...,%esp<br/> | |
293 lea (%rsp,%rZP,1),%rsp</td> | |
294 </tr> | |
295 <tr> | |
296 <td>naclspadj $N,%rZP<br/> | |
297 <i>(sandboxed %esp/rsp restore from %rbp; incudes $N offset)</i></td> | |
298 <td>lea N(%rbp),%esp<br/> | |
299 add %rZP,%rsp</td> | |
300 </tr> | |
301 <tr> | |
302 <td>naclssp ...,%rZP<br/> | |
303 <i>(sandboxed stack decrement)</i></td> | |
304 <td>sub ...,%esp<br/> | |
305 add %rZP,%rsp</td> | |
306 </tr> | |
307 <tr> | |
308 <td>[rep] scas %nacl:(%rdi),%?ax,%rZP<br/> | |
309 <i>(sandboxed stos)</i></td> | |
310 <td>mov %edi,%edi<br/> | |
311 lea (%rZP,%rdi,1),%rdi<br/> | |
312 [rep] scas (%rdi),%?ax<br/> | |
313 </td> | |
314 </tr> | |
315 <tr> | |
316 <td>[rep] stos %?ax,%nacl:(%rdi),%rZP<br/> | |
317 <i>(sandboxed stos)</i></td> | |
318 <td>mov %edi,%edi<br/> | |
319 lea (%rZP,%rdi,1),%rdi<br/> | |
320 [rep] stos %?ax,(%rdi)<br/> | |
321 </td> | |
322 </tr> | |
323 </tbody> | |
324 </table> | |
OLD | NEW |