src/trusted/validator/x86/decoder/README - Issue 625923004: Delete old x86 validator.

Unified Diff: src/trusted/validator/x86/decoder/README

Issue 625923004: Delete old x86 validator. (Closed) Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client

Patch Set: rebase master Created 6 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: src/trusted/validator/x86/decoder/README

diff --git a/src/trusted/validator/x86/decoder/README b/src/trusted/validator/x86/decoder/README

deleted file mode 100644

index 68e70a9e71f8d453bdbe84daf2d36b48f9407bd5..0000000000000000000000000000000000000000

--- a/src/trusted/validator/x86/decoder/README

+++ /dev/null

@@ -1,440 +0,0 @@

-This directory implements an x86 decoder, from a table of modeled

-instructions.

-Note: Currently, this decoder is only used in the x86-64

-validator. However, the plan is to move the x86-32 validator to also

-use this decoder. See

-http://code.google.com/p/nativeclient/issues/detail?id=2154 for more

-details.

-ncopcode_desc.{h,c}

- Defines modeled instructions.

-nc_decode_tables.h

- Defines the structure of the generated table of modeled

- instructions.

-nc_inst_state.{h,c}

- Defines how to access parsed x86 instructions, and the x86

- instruction parser.

-nc_inst_state_statics.c

- Static routines that should be in nc_inst_state.c, but are

- included instead. This separation allows our testing code in

- nc_inst_state_tests.cc to test the static routines.

-nc_inst_iter.{h,c}

- Defines an iterator that walks the memory block and parses the

- instructions in the memory block.

-nc_inst_state_internal.h

- Defines the structures used to hold parsed instructions, and

- an iterator to parse such instructions in a memory block.

-ncop_exps.{h,c}

- Defines an expression tree (of arguments) that is (optionally)

- generated after the instruction is parsed.

-nc_inst_trans.{h,c}

- Defines a translator that takes the data in a parsed

- instruction, and generates the corresponding expression trees

- defined in ncop_exps.h

-ncopcode_insts.enum

- Defines the names of instructions recognized by the decoder.

-ncopcode_prefix.enum

- Defines the set of opcode/prefix bytes, besides the last

- matched byte, that are allowed in x86 instructions.

-ncopcode_ocpcode_flags.enum

- Defines the set of bit flags used to define how to decode an

- x86 instruction.

-ncopcode_operand_kind.enum

- Defines the different categories of operands, typically

- corresponding to operand argument descriptors like $E and $G.

-ncopcode_operand_flag.enum

- Defines the set of bit flags used to define additional

- information about operands of an instructions (such as its

- set/usage).

-ncop_expr_node_kind.enum

- Defines a label used to define each kind of expression node in

- a translated argument.

-ncop_expr_node_flag.enum

- Defines a set of bit flags used to define additional

- information about each node in a translated argument (such as

- set/usage).

-Modeled Instructions

---------------------

-Textual version(s) of the instructions understood by the validator can

-be found in the following files.

- native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt

- Defines the set of instructions understood by the (full)

- decoder.

- native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_insts.txt

- Defines the set of (partial) isntructions understood by the

- validator decoder.

-There are two types of modeled instructions. The first is "hard coded".

-The second is based on an optional prefix, and an opcode

-sequence.

-Hard coded modeled instructions represent explicit byte sequences that

-will be recognized. An example is as follows:

- --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 ---

- 1f 386

- Nop

-The first line (between the "---" markers) defines the sequence of

-bytes that will be explicitly recognized as an instruction if that

-(exact) sequence is found.

-The second line starts with a the opcode value associated with this

-instruction (1f in this case). It is then followed by a the

-instruction set the matched instruction is in (386 in this case. The

-full set of cases can be found in enum NaClInstType defined in file

-../x86_insts.h).

-The third line describes the instruction that is assumed to be

-accepted by that sequence of bytes. In this case (and most cases) it

-is the nop instruction.

-If no hard coded instructions match the bytes to be decoded, the more

-general form is used. This form is based on an optional prefix, and an

-opcode sequence. An example is as follows:

- --- 6f ---

- 0f 6f MMX OpcodeUsesModRm

- Movq $Pq, $Qq

- Mmx_G_Operand OpSet OpDest

- Mmx_E_Operand OpUse

-The first line defines the opcode value matched, and is surrounded by

-"---" marks. Below this marker line are one (or more) instructions

-that can be matched by the same opcode (and optional prefix).

-The second line defines an optional prefix and the opcode sequence (0f

-6f in this case). For more details on this sequence, see "Opcode

-Sequences" below.

-The optional prefix and opcode sequence is followed by the instruction

-set the instruction is in (MMX in this case. The full set of cases can

-be found in enum NaClInstType defined in file ../x86_insts.h).

-The rest of second line are the set of instruction flags that define

-what additional bytes are necessary, and what conditions must be met

-for the instruction to be decoded. If any condition is not met, the

-next instruction in the list is tried. This process is continued until

-a match is found, or none of the instructions apply.

-The set of instruction flags that are accepted are defined in file

-ncopcode_opcode_flags.enum.

-The third line defines the instruction that is being decoded. It

-follows AMD's (and Intel's) syntax for instructions. If an argument is

-enclosed in curly braces, it represents an implicit argument (i.e. one

-that is used by the instruction but not part of the corresponding

-assembly instruction).

-The forms for valid arguments are defined in section "Instruction

-Arguments" below.

-The remaining lines of the instructions define the actual rules that

-will be used to extract that argument from the decoded instruction.

-The first element on the line defines the kind of the operand, and

-is specified in file ncopcode_operand_kind.enum. The remaining elements

-on the line are flags associated with that argument, are specified in

-file ncopcode_operand_flags.enum.

-Opcode Sequences

-----------------

-Each instruction is defined by an opcode sequence, that can be

-prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a

-multi-byte opcode sequence). An example opcode sequence is:

- 0f f6

-Opcode sequences can also be buried in the modrm byte, or the opcode

-byte. To clarify this, additional modifiers may be added to the end

-of the sequence.

-If the sequence is followed by a "/ n", then n defines the value that

-must be in the reg field of the modrm byte. If the sequence is

-followed by a "/ n / m", then n defines the value that must be in the

-reg field of the modrm byte, while m defines the value that must by in

-the r/m feild if the modrm byte.

-If the sequence is followed by a "- rN", then the instruction is one

-that encodes a register selection as part of the opcode. Fegister N

-(0..7) is to be used by the instruction.

-If the opcode sequence is of the form "... 0F 0F XX", then XX appears

-as the last byte of the instruction, rather than the next byte after

-the two 0F bytes. This allows us to recognize E3DNOW instructions.

-Instruction Arguments

----------------------

-The modeled instructions specify what assembly instructions are

-recognized by the decoder. The form used is based on the AMD (R)

-document 24594-Rev.3.14-September 2007, "AMD64 Architecture

-Programmer's manual Volume 3: General-Purpose and System

-Instructions", and Intel (R) docuements 253666-030US - March 2009,

-"Intel 654 and IA-32 Architectures Software Developer's Manual,

-Volume2A: Instruction Set Reference, A-M" and 253667-030US - March

-2009, "Intel 654 and IA-32 Architectures Software Developer's Manual,

-Volume2B: Instruction Set Reference, N-Z". In particular, it tries to

-follow the print forms defined by AMD's "Appendex section A.1 -

-Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To

-Abbreviations". These forms are summarized here. For more detailed

-information see

-native_client/src/trusted/validator_x86/ncdecode_forms.h.

-A print form describes an argument. If the operand is implicit (i.e.

-it defines a register/memory value effected by the instruction, but is

-not part of the assembly form) it is enclosed in curly braces. If the

-print form corresponds to a register. the register is specified by

-proceeding the name with the "%" prefix.

-All other print forms define a set of possible arguments. It begins

-with the character '$', and is followed by a name. The name consists

-of a FORM, followed by a size specification.

-Valid FORM's are (note: as mentioned above, these forms follow the

-conventions of both AMD and Intel):

- A - Far pointer is encoded in the instruction.

- C - Control register specified by the ModRM reg field.

- D - Debug register specified by the ModRM reg field.

- E - General purpose register or memory operand specified by the

- ModRm byte. Memory addresses can be computed from a segment

- register, SIB byte, and/or displacement.

- F - rFLAGS register.

- G - General purpose register specified by the ModRm reg field.

- I - Immediate value.

- J - The instruction includes a relative offset that is added to

- the rIP register.

- M - A memory operand specified by the ModRM byte.

- O - The offset of an operand is encoded in the

- instruction. There is no ModRm byte in the

- instruction. Complex addressing using the SIB byte cannot be

- done.

- P - 64-bit MMX register specified by the ModRM reg field.

- PR - 64 bit MMX register specified by the ModRM r/m field. The

- ModRM mod field must be 11b.

- Q - 64 bit MMX register or memory operand specified by the ModRM

- byte. Memory addresses can be computed from a segment

- register, SIB byte, and/or displacement.

- R - General purpose register specified by the ModRM r/m

- field. The ModeRm mod field must be 11b.

- S - Segment register specified by the ModRM reg field.

- U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register.

- V - 128-bit XMM register specified by the ModRM reg field.

- VR - 128-bit XMM register specified by the ModRM r/m field. The

- ModRM mod field must be 11b.

- W - 128 Xmm register or memory operand specified by the ModRm

- Byte. Memory addresses can be computed from a segment

- register, SIB byte, and/or displacement.

- X - A memory operand addressed by the DS.rSI registers. Used in

- string instructions.

- Y - A memory operand addressed by the ES.rDI registers. Used in string

- instructions.

- r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and

- the optional registers r8-r15 if REX.b is set, based on the

- register value embedded in the opcode.

- SG - segment address defined by a G expression and the segment

- register in the corresponding mnemonic (lds, les, lfs, lgs,

- lss).

- rAX - The register AX, EAX, or RAX, depending on SIZE.

- rBP - The register BP, EBP, or RBP, depending on SIZE.

- rBX - The register BX, EBX, or RBX, depending on SIZE.

- rCX - The register CX, ECX, or RCX, depending on SIZE.

- rDI - The register DI, EDI, or RDI, depending on SIZE.

- rDX - The register DX, EDX, or RDX, depending on SIZE.

- rSI - The register SI, ESI, or RSI, depending on SIZE.

- rSP - The register SP, ESP, or RSP, depending on SIZE.

-Note: r8 is not in the manuals cited above. It has been added to deal

-with instructions with an embedded register in the opcode. In such

-cases, this value allows a single defining call to be used (within a

-for loop), rather than writing eight separate rules (one for each

-possible register value).

-Valid SIZEs are (note: as mentioned above, these forms follow the

-conventions of both AMD and Intel):

- a - Two 16-bit or 32-bit memory operands, depending on the

- effective operand size. Used in the BOUND instruction.

- b - A byte, irrespective of the effective operand size.

- d - A doubleword (32-bits), irrespective of the effective operand size.

- dq - A douible-quadword (128 bits), irrespective of the effective

- operand size.

- p - A 32-bit or 48-bit far pointer, depending on the effective

- operand size.

- pd - A 128-bit double-precision floating point vector operand

- (packed double).

- pi - A 64-bit MMX operand (packed integer).

- ps - A 138-bit single precision floating point vector operand

- (packed single).

- q - A quadword, irrespective of the effective operand size.

- s - A 6-byte or 10-byte pseudo-descriptor.

- sd - A scalar double-precision floating point operand (scalar

- double).

- si - A scalar doubleword (32-bit) integer operand (scalar

- integer).

- ss - A scalar single-precision floating-point operand (scalar

- single).

- w - A word, irrespective of the effective operand size.

- v - A word, doubleword, or quadword, depending on the effective

- operand size.

- va - A word, doubleword, or quadword, depending on the effective

- address size.

- vw - A word only when the effective operand size matches.

- vd - A doubleword only when the effective operand size matches.

- vq - A quadword only when the effective operand size matches.

- w - A word, irrespective of the effective operand size.

- z - A word if the effective operand size is 16 bits, or a

- doubleword if the effective operand size is 32 or 64 bits.

- zw - A word only when the effective operand size matches.

- zd - A doubleword only when the effective operand size is 32 or

- 64 bits.

-Note: vw, vd, vq, zw, and zd are not in the manuals cited

-above. However, they have been added so that sub-variants of an v/z

-instruction (not specified in the manual) can be specified.

-Note: The AMD manual uses some slash notations (such as d/q) which isn't

-explicitly defined. In general, we allow such notation as specified in

-the AMD manual. Depending on the use, it can mean any of the following:

- (1) In 32-bit mode, d is used. In 64-bit mode, q is used.

- (2) only 32-bit or 64-bit values are allowed.

-In addition, when the nmemonic name changes based on which value is

-chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to

-denote the 64 bit case.

-In addition, this code adds the following special print forms:

- One - The literal constant 1.

-Debugging

----------

-Many of the source files contain #define DEBUGGING flags. When

-DEBUGGING is set to 1, additional debugging print messages are

-compiled into the code. Unfortunately, by default, these message

-frequently call routines that are not compiled into corresponding

-executables (such as ncval and ncdis). To add the additional routines,

-edit file

- native_client/site_scons/site_tools/library_deps.py

-For x86-32, edit lines

- # When turning on the DEBUGGING flag in the x86-32 validator

- # or decoder, add the following:

- #'nc_opcode_modeling_verbose_x86_32',

-to

- # When turning on the DEBUGGING flag in the x86-32 validator

- # or decoder, add the following:

- 'nc_opcode_modeling_verbose_x86_32',

-For x86-64, edit lines

- # When turning on the DEBUGGING flag in the x86-64 validator

- # or decoder, add the following:

- # 'nc_opcode_modeling_verbose_x86_64',

-to

- # When turning on the DEBUGGING flag in the x86-64 validator

- # or decoder, add the following:

- 'nc_opcode_modeling_verbose_x86_64',

-These changes will make sure that the corresponding print routines are

-added to the executables during link time.

« no previous file with comments | « src/trusted/validator/x86/README ('k') | src/trusted/validator/x86/decoder/gen/nc_opcode_table_32.h » ('j') | no next file with comments »