| Index: src/trusted/validator/x86/decoder/README
|
| diff --git a/src/trusted/validator/x86/decoder/README b/src/trusted/validator/x86/decoder/README
|
| deleted file mode 100644
|
| index 68e70a9e71f8d453bdbe84daf2d36b48f9407bd5..0000000000000000000000000000000000000000
|
| --- a/src/trusted/validator/x86/decoder/README
|
| +++ /dev/null
|
| @@ -1,440 +0,0 @@
|
| -This directory implements an x86 decoder, from a table of modeled
|
| -instructions.
|
| -
|
| -Note: Currently, this decoder is only used in the x86-64
|
| -validator. However, the plan is to move the x86-32 validator to also
|
| -use this decoder. See
|
| -http://code.google.com/p/nativeclient/issues/detail?id=2154 for more
|
| -details.
|
| -
|
| -ncopcode_desc.{h,c}
|
| -
|
| - Defines modeled instructions.
|
| -
|
| -nc_decode_tables.h
|
| -
|
| - Defines the structure of the generated table of modeled
|
| - instructions.
|
| -
|
| -nc_inst_state.{h,c}
|
| -
|
| - Defines how to access parsed x86 instructions, and the x86
|
| - instruction parser.
|
| -
|
| -nc_inst_state_statics.c
|
| -
|
| - Static routines that should be in nc_inst_state.c, but are
|
| - included instead. This separation allows our testing code in
|
| - nc_inst_state_tests.cc to test the static routines.
|
| -
|
| -nc_inst_iter.{h,c}
|
| -
|
| - Defines an iterator that walks the memory block and parses the
|
| - instructions in the memory block.
|
| -
|
| -nc_inst_state_internal.h
|
| -
|
| - Defines the structures used to hold parsed instructions, and
|
| - an iterator to parse such instructions in a memory block.
|
| -
|
| -ncop_exps.{h,c}
|
| -
|
| - Defines an expression tree (of arguments) that is (optionally)
|
| - generated after the instruction is parsed.
|
| -
|
| -nc_inst_trans.{h,c}
|
| -
|
| - Defines a translator that takes the data in a parsed
|
| - instruction, and generates the corresponding expression trees
|
| - defined in ncop_exps.h
|
| -
|
| -ncopcode_insts.enum
|
| -
|
| - Defines the names of instructions recognized by the decoder.
|
| -
|
| -ncopcode_prefix.enum
|
| -
|
| - Defines the set of opcode/prefix bytes, besides the last
|
| - matched byte, that are allowed in x86 instructions.
|
| -
|
| -ncopcode_ocpcode_flags.enum
|
| -
|
| - Defines the set of bit flags used to define how to decode an
|
| - x86 instruction.
|
| -
|
| -ncopcode_operand_kind.enum
|
| -
|
| - Defines the different categories of operands, typically
|
| - corresponding to operand argument descriptors like $E and $G.
|
| -
|
| -ncopcode_operand_flag.enum
|
| -
|
| - Defines the set of bit flags used to define additional
|
| - information about operands of an instructions (such as its
|
| - set/usage).
|
| -
|
| -ncop_expr_node_kind.enum
|
| -
|
| - Defines a label used to define each kind of expression node in
|
| - a translated argument.
|
| -
|
| -ncop_expr_node_flag.enum
|
| -
|
| - Defines a set of bit flags used to define additional
|
| - information about each node in a translated argument (such as
|
| - set/usage).
|
| -
|
| -Modeled Instructions
|
| ---------------------
|
| -
|
| -Textual version(s) of the instructions understood by the validator can
|
| -be found in the following files.
|
| -
|
| - native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt
|
| -
|
| - Defines the set of instructions understood by the (full)
|
| - decoder.
|
| -
|
| - native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_insts.txt
|
| -
|
| - Defines the set of (partial) isntructions understood by the
|
| - validator decoder.
|
| -
|
| -There are two types of modeled instructions. The first is "hard coded".
|
| -The second is based on an optional prefix, and an opcode
|
| -sequence.
|
| -
|
| -Hard coded modeled instructions represent explicit byte sequences that
|
| -will be recognized. An example is as follows:
|
| -
|
| - --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 ---
|
| - 1f 386
|
| - Nop
|
| -
|
| -The first line (between the "---" markers) defines the sequence of
|
| -bytes that will be explicitly recognized as an instruction if that
|
| -(exact) sequence is found.
|
| -
|
| -The second line starts with a the opcode value associated with this
|
| -instruction (1f in this case). It is then followed by a the
|
| -instruction set the matched instruction is in (386 in this case. The
|
| -full set of cases can be found in enum NaClInstType defined in file
|
| -../x86_insts.h).
|
| -
|
| -The third line describes the instruction that is assumed to be
|
| -accepted by that sequence of bytes. In this case (and most cases) it
|
| -is the nop instruction.
|
| -
|
| -If no hard coded instructions match the bytes to be decoded, the more
|
| -general form is used. This form is based on an optional prefix, and an
|
| -opcode sequence. An example is as follows:
|
| -
|
| - --- 6f ---
|
| - 0f 6f MMX OpcodeUsesModRm
|
| - Movq $Pq, $Qq
|
| - Mmx_G_Operand OpSet OpDest
|
| - Mmx_E_Operand OpUse
|
| -
|
| -The first line defines the opcode value matched, and is surrounded by
|
| -"---" marks. Below this marker line are one (or more) instructions
|
| -that can be matched by the same opcode (and optional prefix).
|
| -
|
| -The second line defines an optional prefix and the opcode sequence (0f
|
| -6f in this case). For more details on this sequence, see "Opcode
|
| -Sequences" below.
|
| -
|
| -The optional prefix and opcode sequence is followed by the instruction
|
| -set the instruction is in (MMX in this case. The full set of cases can
|
| -be found in enum NaClInstType defined in file ../x86_insts.h).
|
| -
|
| -The rest of second line are the set of instruction flags that define
|
| -what additional bytes are necessary, and what conditions must be met
|
| -for the instruction to be decoded. If any condition is not met, the
|
| -next instruction in the list is tried. This process is continued until
|
| -a match is found, or none of the instructions apply.
|
| -
|
| -The set of instruction flags that are accepted are defined in file
|
| -ncopcode_opcode_flags.enum.
|
| -
|
| -The third line defines the instruction that is being decoded. It
|
| -follows AMD's (and Intel's) syntax for instructions. If an argument is
|
| -enclosed in curly braces, it represents an implicit argument (i.e. one
|
| -that is used by the instruction but not part of the corresponding
|
| -assembly instruction).
|
| -
|
| -The forms for valid arguments are defined in section "Instruction
|
| -Arguments" below.
|
| -
|
| -The remaining lines of the instructions define the actual rules that
|
| -will be used to extract that argument from the decoded instruction.
|
| -The first element on the line defines the kind of the operand, and
|
| -is specified in file ncopcode_operand_kind.enum. The remaining elements
|
| -on the line are flags associated with that argument, are specified in
|
| -file ncopcode_operand_flags.enum.
|
| -
|
| -Opcode Sequences
|
| -----------------
|
| -
|
| -Each instruction is defined by an opcode sequence, that can be
|
| -prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a
|
| -multi-byte opcode sequence). An example opcode sequence is:
|
| -
|
| - 0f f6
|
| -
|
| -Opcode sequences can also be buried in the modrm byte, or the opcode
|
| -byte. To clarify this, additional modifiers may be added to the end
|
| -of the sequence.
|
| -
|
| -If the sequence is followed by a "/ n", then n defines the value that
|
| -must be in the reg field of the modrm byte. If the sequence is
|
| -followed by a "/ n / m", then n defines the value that must be in the
|
| -reg field of the modrm byte, while m defines the value that must by in
|
| -the r/m feild if the modrm byte.
|
| -
|
| -If the sequence is followed by a "- rN", then the instruction is one
|
| -that encodes a register selection as part of the opcode. Fegister N
|
| -(0..7) is to be used by the instruction.
|
| -
|
| -If the opcode sequence is of the form "... 0F 0F XX", then XX appears
|
| -as the last byte of the instruction, rather than the next byte after
|
| -the two 0F bytes. This allows us to recognize E3DNOW instructions.
|
| -
|
| -Instruction Arguments
|
| ----------------------
|
| -
|
| -The modeled instructions specify what assembly instructions are
|
| -recognized by the decoder. The form used is based on the AMD (R)
|
| -document 24594-Rev.3.14-September 2007, "AMD64 Architecture
|
| -Programmer's manual Volume 3: General-Purpose and System
|
| -Instructions", and Intel (R) docuements 253666-030US - March 2009,
|
| -"Intel 654 and IA-32 Architectures Software Developer's Manual,
|
| -Volume2A: Instruction Set Reference, A-M" and 253667-030US - March
|
| -2009, "Intel 654 and IA-32 Architectures Software Developer's Manual,
|
| -Volume2B: Instruction Set Reference, N-Z". In particular, it tries to
|
| -follow the print forms defined by AMD's "Appendex section A.1 -
|
| -Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To
|
| -Abbreviations". These forms are summarized here. For more detailed
|
| -information see
|
| -native_client/src/trusted/validator_x86/ncdecode_forms.h.
|
| -
|
| -A print form describes an argument. If the operand is implicit (i.e.
|
| -it defines a register/memory value effected by the instruction, but is
|
| -not part of the assembly form) it is enclosed in curly braces. If the
|
| -print form corresponds to a register. the register is specified by
|
| -proceeding the name with the "%" prefix.
|
| -
|
| -All other print forms define a set of possible arguments. It begins
|
| -with the character '$', and is followed by a name. The name consists
|
| -of a FORM, followed by a size specification.
|
| -
|
| -Valid FORM's are (note: as mentioned above, these forms follow the
|
| -conventions of both AMD and Intel):
|
| -
|
| - A - Far pointer is encoded in the instruction.
|
| -
|
| - C - Control register specified by the ModRM reg field.
|
| -
|
| - D - Debug register specified by the ModRM reg field.
|
| -
|
| - E - General purpose register or memory operand specified by the
|
| - ModRm byte. Memory addresses can be computed from a segment
|
| - register, SIB byte, and/or displacement.
|
| -
|
| - F - rFLAGS register.
|
| -
|
| - G - General purpose register specified by the ModRm reg field.
|
| -
|
| - I - Immediate value.
|
| -
|
| - J - The instruction includes a relative offset that is added to
|
| - the rIP register.
|
| -
|
| - M - A memory operand specified by the ModRM byte.
|
| -
|
| - O - The offset of an operand is encoded in the
|
| - instruction. There is no ModRm byte in the
|
| - instruction. Complex addressing using the SIB byte cannot be
|
| - done.
|
| -
|
| - P - 64-bit MMX register specified by the ModRM reg field.
|
| -
|
| - PR - 64 bit MMX register specified by the ModRM r/m field. The
|
| - ModRM mod field must be 11b.
|
| -
|
| - Q - 64 bit MMX register or memory operand specified by the ModRM
|
| - byte. Memory addresses can be computed from a segment
|
| - register, SIB byte, and/or displacement.
|
| -
|
| - R - General purpose register specified by the ModRM r/m
|
| - field. The ModeRm mod field must be 11b.
|
| -
|
| - S - Segment register specified by the ModRM reg field.
|
| -
|
| - U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register.
|
| -
|
| - V - 128-bit XMM register specified by the ModRM reg field.
|
| -
|
| - VR - 128-bit XMM register specified by the ModRM r/m field. The
|
| - ModRM mod field must be 11b.
|
| -
|
| - W - 128 Xmm register or memory operand specified by the ModRm
|
| - Byte. Memory addresses can be computed from a segment
|
| - register, SIB byte, and/or displacement.
|
| -
|
| - X - A memory operand addressed by the DS.rSI registers. Used in
|
| - string instructions.
|
| -
|
| - Y - A memory operand addressed by the ES.rDI registers. Used in string
|
| - instructions.
|
| -
|
| - r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and
|
| - the optional registers r8-r15 if REX.b is set, based on the
|
| - register value embedded in the opcode.
|
| -
|
| - SG - segment address defined by a G expression and the segment
|
| - register in the corresponding mnemonic (lds, les, lfs, lgs,
|
| - lss).
|
| -
|
| - rAX - The register AX, EAX, or RAX, depending on SIZE.
|
| -
|
| - rBP - The register BP, EBP, or RBP, depending on SIZE.
|
| -
|
| - rBX - The register BX, EBX, or RBX, depending on SIZE.
|
| -
|
| - rCX - The register CX, ECX, or RCX, depending on SIZE.
|
| -
|
| - rDI - The register DI, EDI, or RDI, depending on SIZE.
|
| -
|
| - rDX - The register DX, EDX, or RDX, depending on SIZE.
|
| -
|
| - rSI - The register SI, ESI, or RSI, depending on SIZE.
|
| -
|
| - rSP - The register SP, ESP, or RSP, depending on SIZE.
|
| -
|
| -Note: r8 is not in the manuals cited above. It has been added to deal
|
| -with instructions with an embedded register in the opcode. In such
|
| -cases, this value allows a single defining call to be used (within a
|
| -for loop), rather than writing eight separate rules (one for each
|
| -possible register value).
|
| -
|
| -
|
| -Valid SIZEs are (note: as mentioned above, these forms follow the
|
| -conventions of both AMD and Intel):
|
| -
|
| - a - Two 16-bit or 32-bit memory operands, depending on the
|
| - effective operand size. Used in the BOUND instruction.
|
| -
|
| - b - A byte, irrespective of the effective operand size.
|
| -
|
| - d - A doubleword (32-bits), irrespective of the effective operand size.
|
| -
|
| - dq - A douible-quadword (128 bits), irrespective of the effective
|
| - operand size.
|
| -
|
| - p - A 32-bit or 48-bit far pointer, depending on the effective
|
| - operand size.
|
| -
|
| - pd - A 128-bit double-precision floating point vector operand
|
| - (packed double).
|
| -
|
| - pi - A 64-bit MMX operand (packed integer).
|
| -
|
| - ps - A 138-bit single precision floating point vector operand
|
| - (packed single).
|
| -
|
| - q - A quadword, irrespective of the effective operand size.
|
| -
|
| - s - A 6-byte or 10-byte pseudo-descriptor.
|
| -
|
| - sd - A scalar double-precision floating point operand (scalar
|
| - double).
|
| -
|
| - si - A scalar doubleword (32-bit) integer operand (scalar
|
| - integer).
|
| -
|
| - ss - A scalar single-precision floating-point operand (scalar
|
| - single).
|
| -
|
| - w - A word, irrespective of the effective operand size.
|
| -
|
| - v - A word, doubleword, or quadword, depending on the effective
|
| - operand size.
|
| -
|
| - va - A word, doubleword, or quadword, depending on the effective
|
| - address size.
|
| -
|
| - vw - A word only when the effective operand size matches.
|
| -
|
| - vd - A doubleword only when the effective operand size matches.
|
| -
|
| -
|
| - vq - A quadword only when the effective operand size matches.
|
| -
|
| - w - A word, irrespective of the effective operand size.
|
| -
|
| - z - A word if the effective operand size is 16 bits, or a
|
| - doubleword if the effective operand size is 32 or 64 bits.
|
| -
|
| - zw - A word only when the effective operand size matches.
|
| -
|
| - zd - A doubleword only when the effective operand size is 32 or
|
| - 64 bits.
|
| -
|
| -Note: vw, vd, vq, zw, and zd are not in the manuals cited
|
| -above. However, they have been added so that sub-variants of an v/z
|
| -instruction (not specified in the manual) can be specified.
|
| -
|
| -Note: The AMD manual uses some slash notations (such as d/q) which isn't
|
| -explicitly defined. In general, we allow such notation as specified in
|
| -the AMD manual. Depending on the use, it can mean any of the following:
|
| -
|
| - (1) In 32-bit mode, d is used. In 64-bit mode, q is used.
|
| -
|
| - (2) only 32-bit or 64-bit values are allowed.
|
| -
|
| -In addition, when the nmemonic name changes based on which value is
|
| -chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to
|
| -denote the 64 bit case.
|
| -
|
| -In addition, this code adds the following special print forms:
|
| -
|
| - One - The literal constant 1.
|
| -
|
| -Debugging
|
| ----------
|
| -
|
| -Many of the source files contain #define DEBUGGING flags. When
|
| -DEBUGGING is set to 1, additional debugging print messages are
|
| -compiled into the code. Unfortunately, by default, these message
|
| -frequently call routines that are not compiled into corresponding
|
| -executables (such as ncval and ncdis). To add the additional routines,
|
| -edit file
|
| -
|
| - native_client/site_scons/site_tools/library_deps.py
|
| -
|
| -For x86-32, edit lines
|
| -
|
| - # When turning on the DEBUGGING flag in the x86-32 validator
|
| - # or decoder, add the following:
|
| - #'nc_opcode_modeling_verbose_x86_32',
|
| -
|
| -to
|
| -
|
| - # When turning on the DEBUGGING flag in the x86-32 validator
|
| - # or decoder, add the following:
|
| - 'nc_opcode_modeling_verbose_x86_32',
|
| -
|
| -For x86-64, edit lines
|
| -
|
| - # When turning on the DEBUGGING flag in the x86-64 validator
|
| - # or decoder, add the following:
|
| - # 'nc_opcode_modeling_verbose_x86_64',
|
| -
|
| -to
|
| -
|
| - # When turning on the DEBUGGING flag in the x86-64 validator
|
| - # or decoder, add the following:
|
| - 'nc_opcode_modeling_verbose_x86_64',
|
| -
|
| -These changes will make sure that the corresponding print routines are
|
| -added to the executables during link time.
|
|
|