| OLD | NEW |
| (Empty) |
| 1 This directory implements an x86 decoder, from a table of modeled | |
| 2 instructions. | |
| 3 | |
| 4 Note: Currently, this decoder is only used in the x86-64 | |
| 5 validator. However, the plan is to move the x86-32 validator to also | |
| 6 use this decoder. See | |
| 7 http://code.google.com/p/nativeclient/issues/detail?id=2154 for more | |
| 8 details. | |
| 9 | |
| 10 ncopcode_desc.{h,c} | |
| 11 | |
| 12 Defines modeled instructions. | |
| 13 | |
| 14 nc_decode_tables.h | |
| 15 | |
| 16 Defines the structure of the generated table of modeled | |
| 17 instructions. | |
| 18 | |
| 19 nc_inst_state.{h,c} | |
| 20 | |
| 21 Defines how to access parsed x86 instructions, and the x86 | |
| 22 instruction parser. | |
| 23 | |
| 24 nc_inst_state_statics.c | |
| 25 | |
| 26 Static routines that should be in nc_inst_state.c, but are | |
| 27 included instead. This separation allows our testing code in | |
| 28 nc_inst_state_tests.cc to test the static routines. | |
| 29 | |
| 30 nc_inst_iter.{h,c} | |
| 31 | |
| 32 Defines an iterator that walks the memory block and parses the | |
| 33 instructions in the memory block. | |
| 34 | |
| 35 nc_inst_state_internal.h | |
| 36 | |
| 37 Defines the structures used to hold parsed instructions, and | |
| 38 an iterator to parse such instructions in a memory block. | |
| 39 | |
| 40 ncop_exps.{h,c} | |
| 41 | |
| 42 Defines an expression tree (of arguments) that is (optionally) | |
| 43 generated after the instruction is parsed. | |
| 44 | |
| 45 nc_inst_trans.{h,c} | |
| 46 | |
| 47 Defines a translator that takes the data in a parsed | |
| 48 instruction, and generates the corresponding expression trees | |
| 49 defined in ncop_exps.h | |
| 50 | |
| 51 ncopcode_insts.enum | |
| 52 | |
| 53 Defines the names of instructions recognized by the decoder. | |
| 54 | |
| 55 ncopcode_prefix.enum | |
| 56 | |
| 57 Defines the set of opcode/prefix bytes, besides the last | |
| 58 matched byte, that are allowed in x86 instructions. | |
| 59 | |
| 60 ncopcode_ocpcode_flags.enum | |
| 61 | |
| 62 Defines the set of bit flags used to define how to decode an | |
| 63 x86 instruction. | |
| 64 | |
| 65 ncopcode_operand_kind.enum | |
| 66 | |
| 67 Defines the different categories of operands, typically | |
| 68 corresponding to operand argument descriptors like $E and $G. | |
| 69 | |
| 70 ncopcode_operand_flag.enum | |
| 71 | |
| 72 Defines the set of bit flags used to define additional | |
| 73 information about operands of an instructions (such as its | |
| 74 set/usage). | |
| 75 | |
| 76 ncop_expr_node_kind.enum | |
| 77 | |
| 78 Defines a label used to define each kind of expression node in | |
| 79 a translated argument. | |
| 80 | |
| 81 ncop_expr_node_flag.enum | |
| 82 | |
| 83 Defines a set of bit flags used to define additional | |
| 84 information about each node in a translated argument (such as | |
| 85 set/usage). | |
| 86 | |
| 87 Modeled Instructions | |
| 88 -------------------- | |
| 89 | |
| 90 Textual version(s) of the instructions understood by the validator can | |
| 91 be found in the following files. | |
| 92 | |
| 93 native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt | |
| 94 | |
| 95 Defines the set of instructions understood by the (full) | |
| 96 decoder. | |
| 97 | |
| 98 native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_ins
ts.txt | |
| 99 | |
| 100 Defines the set of (partial) isntructions understood by the | |
| 101 validator decoder. | |
| 102 | |
| 103 There are two types of modeled instructions. The first is "hard coded". | |
| 104 The second is based on an optional prefix, and an opcode | |
| 105 sequence. | |
| 106 | |
| 107 Hard coded modeled instructions represent explicit byte sequences that | |
| 108 will be recognized. An example is as follows: | |
| 109 | |
| 110 --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 --- | |
| 111 1f 386 | |
| 112 Nop | |
| 113 | |
| 114 The first line (between the "---" markers) defines the sequence of | |
| 115 bytes that will be explicitly recognized as an instruction if that | |
| 116 (exact) sequence is found. | |
| 117 | |
| 118 The second line starts with a the opcode value associated with this | |
| 119 instruction (1f in this case). It is then followed by a the | |
| 120 instruction set the matched instruction is in (386 in this case. The | |
| 121 full set of cases can be found in enum NaClInstType defined in file | |
| 122 ../x86_insts.h). | |
| 123 | |
| 124 The third line describes the instruction that is assumed to be | |
| 125 accepted by that sequence of bytes. In this case (and most cases) it | |
| 126 is the nop instruction. | |
| 127 | |
| 128 If no hard coded instructions match the bytes to be decoded, the more | |
| 129 general form is used. This form is based on an optional prefix, and an | |
| 130 opcode sequence. An example is as follows: | |
| 131 | |
| 132 --- 6f --- | |
| 133 0f 6f MMX OpcodeUsesModRm | |
| 134 Movq $Pq, $Qq | |
| 135 Mmx_G_Operand OpSet OpDest | |
| 136 Mmx_E_Operand OpUse | |
| 137 | |
| 138 The first line defines the opcode value matched, and is surrounded by | |
| 139 "---" marks. Below this marker line are one (or more) instructions | |
| 140 that can be matched by the same opcode (and optional prefix). | |
| 141 | |
| 142 The second line defines an optional prefix and the opcode sequence (0f | |
| 143 6f in this case). For more details on this sequence, see "Opcode | |
| 144 Sequences" below. | |
| 145 | |
| 146 The optional prefix and opcode sequence is followed by the instruction | |
| 147 set the instruction is in (MMX in this case. The full set of cases can | |
| 148 be found in enum NaClInstType defined in file ../x86_insts.h). | |
| 149 | |
| 150 The rest of second line are the set of instruction flags that define | |
| 151 what additional bytes are necessary, and what conditions must be met | |
| 152 for the instruction to be decoded. If any condition is not met, the | |
| 153 next instruction in the list is tried. This process is continued until | |
| 154 a match is found, or none of the instructions apply. | |
| 155 | |
| 156 The set of instruction flags that are accepted are defined in file | |
| 157 ncopcode_opcode_flags.enum. | |
| 158 | |
| 159 The third line defines the instruction that is being decoded. It | |
| 160 follows AMD's (and Intel's) syntax for instructions. If an argument is | |
| 161 enclosed in curly braces, it represents an implicit argument (i.e. one | |
| 162 that is used by the instruction but not part of the corresponding | |
| 163 assembly instruction). | |
| 164 | |
| 165 The forms for valid arguments are defined in section "Instruction | |
| 166 Arguments" below. | |
| 167 | |
| 168 The remaining lines of the instructions define the actual rules that | |
| 169 will be used to extract that argument from the decoded instruction. | |
| 170 The first element on the line defines the kind of the operand, and | |
| 171 is specified in file ncopcode_operand_kind.enum. The remaining elements | |
| 172 on the line are flags associated with that argument, are specified in | |
| 173 file ncopcode_operand_flags.enum. | |
| 174 | |
| 175 Opcode Sequences | |
| 176 ---------------- | |
| 177 | |
| 178 Each instruction is defined by an opcode sequence, that can be | |
| 179 prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a | |
| 180 multi-byte opcode sequence). An example opcode sequence is: | |
| 181 | |
| 182 0f f6 | |
| 183 | |
| 184 Opcode sequences can also be buried in the modrm byte, or the opcode | |
| 185 byte. To clarify this, additional modifiers may be added to the end | |
| 186 of the sequence. | |
| 187 | |
| 188 If the sequence is followed by a "/ n", then n defines the value that | |
| 189 must be in the reg field of the modrm byte. If the sequence is | |
| 190 followed by a "/ n / m", then n defines the value that must be in the | |
| 191 reg field of the modrm byte, while m defines the value that must by in | |
| 192 the r/m feild if the modrm byte. | |
| 193 | |
| 194 If the sequence is followed by a "- rN", then the instruction is one | |
| 195 that encodes a register selection as part of the opcode. Fegister N | |
| 196 (0..7) is to be used by the instruction. | |
| 197 | |
| 198 If the opcode sequence is of the form "... 0F 0F XX", then XX appears | |
| 199 as the last byte of the instruction, rather than the next byte after | |
| 200 the two 0F bytes. This allows us to recognize E3DNOW instructions. | |
| 201 | |
| 202 Instruction Arguments | |
| 203 --------------------- | |
| 204 | |
| 205 The modeled instructions specify what assembly instructions are | |
| 206 recognized by the decoder. The form used is based on the AMD (R) | |
| 207 document 24594-Rev.3.14-September 2007, "AMD64 Architecture | |
| 208 Programmer's manual Volume 3: General-Purpose and System | |
| 209 Instructions", and Intel (R) docuements 253666-030US - March 2009, | |
| 210 "Intel 654 and IA-32 Architectures Software Developer's Manual, | |
| 211 Volume2A: Instruction Set Reference, A-M" and 253667-030US - March | |
| 212 2009, "Intel 654 and IA-32 Architectures Software Developer's Manual, | |
| 213 Volume2B: Instruction Set Reference, N-Z". In particular, it tries to | |
| 214 follow the print forms defined by AMD's "Appendex section A.1 - | |
| 215 Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To | |
| 216 Abbreviations". These forms are summarized here. For more detailed | |
| 217 information see | |
| 218 native_client/src/trusted/validator_x86/ncdecode_forms.h. | |
| 219 | |
| 220 A print form describes an argument. If the operand is implicit (i.e. | |
| 221 it defines a register/memory value effected by the instruction, but is | |
| 222 not part of the assembly form) it is enclosed in curly braces. If the | |
| 223 print form corresponds to a register. the register is specified by | |
| 224 proceeding the name with the "%" prefix. | |
| 225 | |
| 226 All other print forms define a set of possible arguments. It begins | |
| 227 with the character '$', and is followed by a name. The name consists | |
| 228 of a FORM, followed by a size specification. | |
| 229 | |
| 230 Valid FORM's are (note: as mentioned above, these forms follow the | |
| 231 conventions of both AMD and Intel): | |
| 232 | |
| 233 A - Far pointer is encoded in the instruction. | |
| 234 | |
| 235 C - Control register specified by the ModRM reg field. | |
| 236 | |
| 237 D - Debug register specified by the ModRM reg field. | |
| 238 | |
| 239 E - General purpose register or memory operand specified by the | |
| 240 ModRm byte. Memory addresses can be computed from a segment | |
| 241 register, SIB byte, and/or displacement. | |
| 242 | |
| 243 F - rFLAGS register. | |
| 244 | |
| 245 G - General purpose register specified by the ModRm reg field. | |
| 246 | |
| 247 I - Immediate value. | |
| 248 | |
| 249 J - The instruction includes a relative offset that is added to | |
| 250 the rIP register. | |
| 251 | |
| 252 M - A memory operand specified by the ModRM byte. | |
| 253 | |
| 254 O - The offset of an operand is encoded in the | |
| 255 instruction. There is no ModRm byte in the | |
| 256 instruction. Complex addressing using the SIB byte cannot be | |
| 257 done. | |
| 258 | |
| 259 P - 64-bit MMX register specified by the ModRM reg field. | |
| 260 | |
| 261 PR - 64 bit MMX register specified by the ModRM r/m field. The | |
| 262 ModRM mod field must be 11b. | |
| 263 | |
| 264 Q - 64 bit MMX register or memory operand specified by the ModRM | |
| 265 byte. Memory addresses can be computed from a segment | |
| 266 register, SIB byte, and/or displacement. | |
| 267 | |
| 268 R - General purpose register specified by the ModRM r/m | |
| 269 field. The ModeRm mod field must be 11b. | |
| 270 | |
| 271 S - Segment register specified by the ModRM reg field. | |
| 272 | |
| 273 U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register. | |
| 274 | |
| 275 V - 128-bit XMM register specified by the ModRM reg field. | |
| 276 | |
| 277 VR - 128-bit XMM register specified by the ModRM r/m field. The | |
| 278 ModRM mod field must be 11b. | |
| 279 | |
| 280 W - 128 Xmm register or memory operand specified by the ModRm | |
| 281 Byte. Memory addresses can be computed from a segment | |
| 282 register, SIB byte, and/or displacement. | |
| 283 | |
| 284 X - A memory operand addressed by the DS.rSI registers. Used in | |
| 285 string instructions. | |
| 286 | |
| 287 Y - A memory operand addressed by the ES.rDI registers. Used in string | |
| 288 instructions. | |
| 289 | |
| 290 r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and | |
| 291 the optional registers r8-r15 if REX.b is set, based on the | |
| 292 register value embedded in the opcode. | |
| 293 | |
| 294 SG - segment address defined by a G expression and the segment | |
| 295 register in the corresponding mnemonic (lds, les, lfs, lgs, | |
| 296 lss). | |
| 297 | |
| 298 rAX - The register AX, EAX, or RAX, depending on SIZE. | |
| 299 | |
| 300 rBP - The register BP, EBP, or RBP, depending on SIZE. | |
| 301 | |
| 302 rBX - The register BX, EBX, or RBX, depending on SIZE. | |
| 303 | |
| 304 rCX - The register CX, ECX, or RCX, depending on SIZE. | |
| 305 | |
| 306 rDI - The register DI, EDI, or RDI, depending on SIZE. | |
| 307 | |
| 308 rDX - The register DX, EDX, or RDX, depending on SIZE. | |
| 309 | |
| 310 rSI - The register SI, ESI, or RSI, depending on SIZE. | |
| 311 | |
| 312 rSP - The register SP, ESP, or RSP, depending on SIZE. | |
| 313 | |
| 314 Note: r8 is not in the manuals cited above. It has been added to deal | |
| 315 with instructions with an embedded register in the opcode. In such | |
| 316 cases, this value allows a single defining call to be used (within a | |
| 317 for loop), rather than writing eight separate rules (one for each | |
| 318 possible register value). | |
| 319 | |
| 320 | |
| 321 Valid SIZEs are (note: as mentioned above, these forms follow the | |
| 322 conventions of both AMD and Intel): | |
| 323 | |
| 324 a - Two 16-bit or 32-bit memory operands, depending on the | |
| 325 effective operand size. Used in the BOUND instruction. | |
| 326 | |
| 327 b - A byte, irrespective of the effective operand size. | |
| 328 | |
| 329 d - A doubleword (32-bits), irrespective of the effective operand size. | |
| 330 | |
| 331 dq - A douible-quadword (128 bits), irrespective of the effective | |
| 332 operand size. | |
| 333 | |
| 334 p - A 32-bit or 48-bit far pointer, depending on the effective | |
| 335 operand size. | |
| 336 | |
| 337 pd - A 128-bit double-precision floating point vector operand | |
| 338 (packed double). | |
| 339 | |
| 340 pi - A 64-bit MMX operand (packed integer). | |
| 341 | |
| 342 ps - A 138-bit single precision floating point vector operand | |
| 343 (packed single). | |
| 344 | |
| 345 q - A quadword, irrespective of the effective operand size. | |
| 346 | |
| 347 s - A 6-byte or 10-byte pseudo-descriptor. | |
| 348 | |
| 349 sd - A scalar double-precision floating point operand (scalar | |
| 350 double). | |
| 351 | |
| 352 si - A scalar doubleword (32-bit) integer operand (scalar | |
| 353 integer). | |
| 354 | |
| 355 ss - A scalar single-precision floating-point operand (scalar | |
| 356 single). | |
| 357 | |
| 358 w - A word, irrespective of the effective operand size. | |
| 359 | |
| 360 v - A word, doubleword, or quadword, depending on the effective | |
| 361 operand size. | |
| 362 | |
| 363 va - A word, doubleword, or quadword, depending on the effective | |
| 364 address size. | |
| 365 | |
| 366 vw - A word only when the effective operand size matches. | |
| 367 | |
| 368 vd - A doubleword only when the effective operand size matches. | |
| 369 | |
| 370 | |
| 371 vq - A quadword only when the effective operand size matches. | |
| 372 | |
| 373 w - A word, irrespective of the effective operand size. | |
| 374 | |
| 375 z - A word if the effective operand size is 16 bits, or a | |
| 376 doubleword if the effective operand size is 32 or 64 bits. | |
| 377 | |
| 378 zw - A word only when the effective operand size matches. | |
| 379 | |
| 380 zd - A doubleword only when the effective operand size is 32 or | |
| 381 64 bits. | |
| 382 | |
| 383 Note: vw, vd, vq, zw, and zd are not in the manuals cited | |
| 384 above. However, they have been added so that sub-variants of an v/z | |
| 385 instruction (not specified in the manual) can be specified. | |
| 386 | |
| 387 Note: The AMD manual uses some slash notations (such as d/q) which isn't | |
| 388 explicitly defined. In general, we allow such notation as specified in | |
| 389 the AMD manual. Depending on the use, it can mean any of the following: | |
| 390 | |
| 391 (1) In 32-bit mode, d is used. In 64-bit mode, q is used. | |
| 392 | |
| 393 (2) only 32-bit or 64-bit values are allowed. | |
| 394 | |
| 395 In addition, when the nmemonic name changes based on which value is | |
| 396 chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to | |
| 397 denote the 64 bit case. | |
| 398 | |
| 399 In addition, this code adds the following special print forms: | |
| 400 | |
| 401 One - The literal constant 1. | |
| 402 | |
| 403 Debugging | |
| 404 --------- | |
| 405 | |
| 406 Many of the source files contain #define DEBUGGING flags. When | |
| 407 DEBUGGING is set to 1, additional debugging print messages are | |
| 408 compiled into the code. Unfortunately, by default, these message | |
| 409 frequently call routines that are not compiled into corresponding | |
| 410 executables (such as ncval and ncdis). To add the additional routines, | |
| 411 edit file | |
| 412 | |
| 413 native_client/site_scons/site_tools/library_deps.py | |
| 414 | |
| 415 For x86-32, edit lines | |
| 416 | |
| 417 # When turning on the DEBUGGING flag in the x86-32 validator | |
| 418 # or decoder, add the following: | |
| 419 #'nc_opcode_modeling_verbose_x86_32', | |
| 420 | |
| 421 to | |
| 422 | |
| 423 # When turning on the DEBUGGING flag in the x86-32 validator | |
| 424 # or decoder, add the following: | |
| 425 'nc_opcode_modeling_verbose_x86_32', | |
| 426 | |
| 427 For x86-64, edit lines | |
| 428 | |
| 429 # When turning on the DEBUGGING flag in the x86-64 validator | |
| 430 # or decoder, add the following: | |
| 431 # 'nc_opcode_modeling_verbose_x86_64', | |
| 432 | |
| 433 to | |
| 434 | |
| 435 # When turning on the DEBUGGING flag in the x86-64 validator | |
| 436 # or decoder, add the following: | |
| 437 'nc_opcode_modeling_verbose_x86_64', | |
| 438 | |
| 439 These changes will make sure that the corresponding print routines are | |
| 440 added to the executables during link time. | |
| OLD | NEW |