| Index: native_client_sdk/src/doc/reference/sandbox_internals/x86-64-sandbox.rst
|
| diff --git a/native_client_sdk/src/doc/reference/sandbox_internals/x86-64-sandbox.rst b/native_client_sdk/src/doc/reference/sandbox_internals/x86-64-sandbox.rst
|
| new file mode 100644
|
| index 0000000000000000000000000000000000000000..7061d9378ddf6b347b113c3efdb0ff2dae50d91d
|
| --- /dev/null
|
| +++ b/native_client_sdk/src/doc/reference/sandbox_internals/x86-64-sandbox.rst
|
| @@ -0,0 +1,328 @@
|
| +.. _x86-64-sandbox:
|
| +
|
| +================================
|
| +NaCl SFI model on x86-64 systems
|
| +================================
|
| +
|
| +.. contents::
|
| + :local:
|
| + :backlinks: none
|
| + :depth: 2
|
| +
|
| +Summary
|
| +=======
|
| +
|
| +This document addresses the details of the Software Fault Isolation
|
| +(SFI) model for executable code that can be run in Native Client on an
|
| +x86-64 system. An overview of this model can be found in the paper:
|
| +`Adapting Software Fault Isolation to Contemporary CPU Architectures
|
| +<https://research.google.com/pubs/archive/35649.pdf>`_.
|
| +The primary focus of the SFI model is a Windows x86-64 system but the
|
| +same techniques can be applied to run identical x86-64 binaries on
|
| +other x86-64 systems such as Linux, Mac, FreeBSD, etc, so the
|
| +description of the SFI model tries to abstract away system
|
| +dependencies when possible.
|
| +
|
| +Please note: throughout this document we use the AT&T notation for
|
| +assembler syntax, in which the target operand appears last, e.g. ``mov
|
| +src, dst``.
|
| +
|
| +Binary Format
|
| +=============
|
| +
|
| +The format of Native Client executable binaries is identical to the
|
| +x86-64 ELF binary format (`[0]
|
| +<http://en.wikipedia.org/wiki/Executable_and_Linkable_Format>`_, `[1]
|
| +<http://www.sco.com/developers/devspecs/gabi41.pdf>`_, `[2]
|
| +<http://www.sco.com/developers/gabi/latest/contents.html>`_, `[3]
|
| +<http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf>`_) for
|
| +Linux or BSD with a few extra requirements. The additional rules that
|
| +a Native Client ELF binary must follow are:
|
| +
|
| +* The ELF magic OS ABI field must be 123.
|
| +* The ELF magic OS ABI VERSION field must be 5.
|
| +* The ELF e_flags field must be 0x200000 (32-byte alignment).
|
| +* There must be exactly one PT_LOAD text segment. It must begin at
|
| + 0x20000 (128 kB) and be marked RX (no W). The contents of the text
|
| + segment must follow :ref:`Text Segment Rules <x86-64-text-segment-rules>`.
|
| +* There can be at most one PT_LOAD data segment marked R.
|
| +* There can be at most one PT_LOAD data segment marked RW.
|
| +* There can be at most one PT_GNU_STACK segment. It must be marked RW.
|
| +* All segments must end before limit address (4 GiB).
|
| +
|
| +Runtime Invariants
|
| +==================
|
| +
|
| +To ensure fault isolation at runtime, the system must maintain a
|
| +number of runtime *invariants* across the lifetime of the running
|
| +program. Both the *Validator* and the *Service Runtime* are
|
| +responsible for maintaining the invariants. See the paper for the
|
| +rationale for the invariants:
|
| +
|
| +* ``RIP`` always points to valid instruction boundary (the validator must
|
| + ensure this with direct jumps and direct calls).
|
| +* ``R15`` (aka ``RBASE`` and ``RZP``) is never modified by code (the
|
| + validator must ensure this). Low 32 bits of ``RZP`` are all zero
|
| + (loader must ensure this).
|
| +* ``RIP``, ``RBP`` and ``RSP`` are always in the **safe zone**: between
|
| + ``R15`` and ``R15+4GiB``.
|
| +
|
| + * Exception: ``RSP`` and ``RBP`` are allowed to be in the range of
|
| + ``0..4GiB`` inside *pseudo-instructions*: ``naclrestbp``,
|
| + ``naclrestsp``, ``naclspadj``, ``naclasp``, ``naclssp``.
|
| +
|
| +* 84GiB are allocated for NaCl module (i.e. **untrusted region**):
|
| +
|
| + * ``R15-40GiB..R15`` and ``R15+4GIB..R15+44GiB`` are buffer zones with
|
| + PROT_NONE flags.
|
| + * The 4GB *safe zone* has pages with either PROT_WRITE or PROT_EXEC
|
| + but must not have PROT_WRITE+PROT_EXEC pages.
|
| + * All executable code in PROT_EXEC pages is validatable and
|
| + guaranteed to obey the invariant.
|
| +
|
| +* Trampoline/springboard code is mapped to a non-writable region in
|
| + the *untrusted 84GB region*; each trampoline/springboard is 32-byte
|
| + aligned and fits within a single *bundle*.
|
| +* The OS must not put any internal structures/code into the untrusted
|
| + region at any time (not using OS dynamic linker, etc)
|
| +
|
| +.. _x86-64-text-segment-rules:
|
| +
|
| +Text Segment Rules
|
| +==================
|
| +
|
| +* The validation process must ensure that the text segment complies
|
| + with the following rules. The validation process must complete
|
| + successfully strictly before executing any instruction of the
|
| + untrusted code.
|
| +* The following instructions are illegal and must be rejected by the
|
| + validator (the list is not exhaustive as the validator uses a
|
| + whiteist, not a blacklist; this means there is a large but finite
|
| + list of instructions the validator allows, not a small list of
|
| + instructions the validator rejects):
|
| +
|
| + * any privileged instructions
|
| + * ``mov`` to/from segment registers
|
| + * ``int``
|
| + * ``pusha``/``popa`` (not dangerous but not needed for GCC)
|
| +
|
| +* There must be space for at least 32 bytes after the text segment and
|
| + before the next segment in ELF (towards higher addresses) that ends
|
| + strictly at a 64K boundary (a minimum page size for untrusted
|
| + code). This space will be padded with HLT instructions as part of
|
| + the validation process, along with the optional 64K page.
|
| +* Neither instructions nor *pseudo-instructions* are permitted to span
|
| + a 32-byte boundary.
|
| +* The ELF entry address must be 32-byte aligned.
|
| +* Direct ``CALL``/``JUMP`` targets:
|
| +
|
| + * must point to a valid instruction boundary
|
| + * must not point into a *pseudo-instruction*
|
| + * must not point between a *restricted register* (see below for
|
| + definition) producer instruction and its corresponding restricted
|
| + register consumer instruction.
|
| +
|
| +* ``CALL`` instructions must be 5 bytes before a 32-byte boundary, so
|
| + that the return address will be 32-byte aligned.
|
| +* Indirect call targets must be 32-byte aligned. Instead of indirect
|
| + ``CALL``/``JMP`` x, use ``nacljmp`` and ``naclcall`` (see below for
|
| + definitions of these *pseudo-instructions*)
|
| +* All instructions that **read** or **write** from/to memory must use
|
| + one of the four registers ``RZP``, ``RIP``, ``RBP`` or ``RSP`` as a
|
| + base, restricted (see below) register index (multiplied by 0, 1, 2,
|
| + 4 or 8) and constant displacement (optional).
|
| +
|
| + * Exception to this rule: string instructions are allowed if used in
|
| + following sequences (the sequences should not cross *bundle*
|
| + boundaries; segment overrides are disallowed):
|
| +
|
| + .. naclcode::
|
| + :prettyprint: 0
|
| +
|
| + mov %edi, %edi
|
| + lea (%rZP,%rdi),%rdi
|
| + [rep] stos ; other string instructions can be used here
|
| +
|
| + Note: this is identical to the *pseudo-instruction*: ``[rep] stos
|
| + %?ax, %nacl:(%rdi),%rZP``
|
| +
|
| +* An operand of a command is said to be a **restricted register** iff
|
| + it is a register that is the target of a 32-bit move in the
|
| + immediately-preceding command in the same *bundle* (consider the
|
| + previous command as additional sandboxing prefix):
|
| +
|
| + .. naclcode::
|
| + :prettyprint: 0
|
| +
|
| + ; any 32-bit register can be used here; the first operand is
|
| + ; unrestricted but often is the same register
|
| + mov ..., %eXX
|
| +
|
| +* Instructions capable of changing ``%RBP`` and ``%RSP`` are
|
| + forbidden, except the instruction sequences in the whitelist below,
|
| + which must not cross *bundle* boundaries:
|
| +
|
| + .. naclcode::
|
| + :prettyprint: 0
|
| +
|
| + mov %rbp, %rsp
|
| + mov %rsp, %rbp
|
| + mov ..., %ebp
|
| + ; restoration of %RBP from memory, register or stack - keeps the
|
| + ; invariant intact
|
| + add %rZP, %rbp
|
| + mov ..., %esp
|
| + ; restoration of %RSP from memory, register or stack - keeps the
|
| + ; invariant intact
|
| + add %rZP, %rsp
|
| + lea xxx(%rbp), %esp
|
| + add %rZP, %rsp ; restoration of %RSP from %RBP with adjust
|
| + sub ..., %esp
|
| + add %rZP, %rsp ; stack space allocation
|
| + add ..., %esp
|
| + add %rZP, %rsp ; stack space deallocation
|
| + and $XX, %rsp ; alignment; XX must be between -128 and -1
|
| + pushq ...
|
| + popq ... ; except pop %RSP, pop %RBP
|
| +
|
| +List of Pseudo-instructions
|
| +===========================
|
| +
|
| +Pseudo-instructions were introduced to let the compiler maintain the
|
| +invariants without needing to know the code alignment rules. The
|
| +assembler guarantees 32-bit alignment for all *pseudo-instructions* in
|
| +the table below. In addition, to the pseudo-instructions, one
|
| +pseudo-operand prefix is introduced: ``%nacl``. Presence of the
|
| +``%nacl`` operand prefix ensures that:
|
| +
|
| +* The instruction ``"%mov %eXX, %eXX"`` is added immediately before the
|
| + actual command using prefix ``%nacl`` (where ``%eXX`` is a 32-bit
|
| + part of the index register of the actual command, for example: in
|
| + operand ``%nacl:(,%r11)``, the notation ``%eXX`` is referring to
|
| + ``%r11d``)
|
| +* The resulting sequence of two instructions does not cross the
|
| + *bundle* boundary.
|
| +
|
| +For example, the instruction:
|
| +
|
| +.. naclcode::
|
| + :prettyprint: 0
|
| +
|
| + mov %eax,%nacl:(%r15,%rdi,2)
|
| +
|
| +is translated by the assembler to:
|
| +
|
| +.. naclcode::
|
| + :prettyprint: 0
|
| +
|
| + mov %edi,%edi
|
| + mov %eax,(%r15,%rdi,2)
|
| +
|
| +The complete list of introduced *pseudo-instructions* is as follows:
|
| +
|
| +.. TODO(hamaji): Use rst's table instead of the raw HTML below.
|
| +
|
| +.. raw:: html
|
| +
|
| + <table border=1>
|
| + <tbody>
|
| + <tr>
|
| + <td>Pseudo-instruction</td>
|
| + <td>Is translated to<br/>
|
| + </td>
|
| + </tr>
|
| + <tr>
|
| + <td>[rep] cmps %nacl:(%rsi),%nacl:(%rdi),%rZP<br/>
|
| + <i>(sandboxed cmps)</i><br/>
|
| + </td>
|
| + <td>mov %esi,%esi<br/>
|
| + lea (%rZP,%rsi,1),%rsi<br/>
|
| + mov %edi,%edi<br/>
|
| + lea (%rZP,%rdi,1),%rdi<br/>
|
| + [rep] cmps (%rsi),(%rdi)<i><br/>
|
| + </i>
|
| + </td>
|
| + </tr>
|
| + <tr>
|
| + <td>[rep] movs %nacl:(%rsi),%nacl:(%rdi),%rZP<br/>
|
| + <i>(sandboxed movs)</i><br/>
|
| + </td>
|
| + <td>mov %esi,%esi<br/>
|
| + lea (%rZP,%rsi,1),%rsi<br/>
|
| + mov %edi,%edi<br/>
|
| + lea (%rZP,%rdi,1),%rdi<br/>
|
| + [rep] movs (%rsi),(%rdi)<i><br/>
|
| + </i>
|
| + </td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclasp ...,%rZP<br/>
|
| + <i>(sandboxed stack increment)</i></td>
|
| + <td>add ...,%esp<br/>
|
| + add %rZP,%rsp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclcall %eXX,%rZP<br/>
|
| + <i>(sandboxed indirect call)</i></td>
|
| + <td>and $-32, %eXX<br/>
|
| + add %rZP, %rXX<br/>
|
| + call *%rXX<br/>
|
| + <i>Note: the assembler ensures all calls (including
|
| + naclcall) will end at the bundle boundary.</i></td>
|
| + </tr>
|
| + <tr>
|
| + <td>nacljmp %eXX,%rZP<br/>
|
| + <i>(sandboxed indirect jump)</i></td>
|
| + <td>and $-32,%eXX<br/>
|
| + add %rZP,%rXX<br/>
|
| + jmp *%rXX<br/>
|
| + </td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclrestbp ...,%rZP<br/>
|
| + <i>(sandboxed %ebp/rbp restore)</i></td>
|
| + <td>mov ...,%ebp<br/>
|
| + add %rZP,%rbp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclrestsp ...,%rZP
|
| + <i>(sandboxed %esp/rsp restore)</i></td>
|
| + <td>mov ...,%esp<br/>
|
| + add %rZP,%rsp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclrestsp_noflags ...,%rZP
|
| + <i>(sandboxed %esp/rsp restore)</i></td>
|
| + <td>mov ...,%esp<br/>
|
| + lea (%rsp,%rZP,1),%rsp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclspadj $N,%rZP<br/>
|
| + <i>(sandboxed %esp/rsp restore from %rbp; incudes $N offset)</i></td>
|
| + <td>lea N(%rbp),%esp<br/>
|
| + add %rZP,%rsp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>naclssp ...,%rZP<br/>
|
| + <i>(sandboxed stack decrement)</i></td>
|
| + <td>sub ...,%esp<br/>
|
| + add %rZP,%rsp</td>
|
| + </tr>
|
| + <tr>
|
| + <td>[rep] scas %nacl:(%rdi),%?ax,%rZP<br/>
|
| + <i>(sandboxed stos)</i></td>
|
| + <td>mov %edi,%edi<br/>
|
| + lea (%rZP,%rdi,1),%rdi<br/>
|
| + [rep] scas (%rdi),%?ax<br/>
|
| + </td>
|
| + </tr>
|
| + <tr>
|
| + <td>[rep] stos %?ax,%nacl:(%rdi),%rZP<br/>
|
| + <i>(sandboxed stos)</i></td>
|
| + <td>mov %edi,%edi<br/>
|
| + lea (%rZP,%rdi,1),%rdi<br/>
|
| + [rep] stos %?ax,(%rdi)<br/>
|
| + </td>
|
| + </tr>
|
| + </tbody>
|
| + </table>
|
|
|