OLD | NEW |
| (Empty) |
1 This directory implements an x86 decoder, from a table of modeled | |
2 instructions. | |
3 | |
4 Note: Currently, this decoder is only used in the x86-64 | |
5 validator. However, the plan is to move the x86-32 validator to also | |
6 use this decoder. See | |
7 http://code.google.com/p/nativeclient/issues/detail?id=2154 for more | |
8 details. | |
9 | |
10 ncopcode_desc.{h,c} | |
11 | |
12 Defines modeled instructions. | |
13 | |
14 nc_decode_tables.h | |
15 | |
16 Defines the structure of the generated table of modeled | |
17 instructions. | |
18 | |
19 nc_inst_state.{h,c} | |
20 | |
21 Defines how to access parsed x86 instructions, and the x86 | |
22 instruction parser. | |
23 | |
24 nc_inst_state_statics.c | |
25 | |
26 Static routines that should be in nc_inst_state.c, but are | |
27 included instead. This separation allows our testing code in | |
28 nc_inst_state_tests.cc to test the static routines. | |
29 | |
30 nc_inst_iter.{h,c} | |
31 | |
32 Defines an iterator that walks the memory block and parses the | |
33 instructions in the memory block. | |
34 | |
35 nc_inst_state_internal.h | |
36 | |
37 Defines the structures used to hold parsed instructions, and | |
38 an iterator to parse such instructions in a memory block. | |
39 | |
40 ncop_exps.{h,c} | |
41 | |
42 Defines an expression tree (of arguments) that is (optionally) | |
43 generated after the instruction is parsed. | |
44 | |
45 nc_inst_trans.{h,c} | |
46 | |
47 Defines a translator that takes the data in a parsed | |
48 instruction, and generates the corresponding expression trees | |
49 defined in ncop_exps.h | |
50 | |
51 ncopcode_insts.enum | |
52 | |
53 Defines the names of instructions recognized by the decoder. | |
54 | |
55 ncopcode_prefix.enum | |
56 | |
57 Defines the set of opcode/prefix bytes, besides the last | |
58 matched byte, that are allowed in x86 instructions. | |
59 | |
60 ncopcode_ocpcode_flags.enum | |
61 | |
62 Defines the set of bit flags used to define how to decode an | |
63 x86 instruction. | |
64 | |
65 ncopcode_operand_kind.enum | |
66 | |
67 Defines the different categories of operands, typically | |
68 corresponding to operand argument descriptors like $E and $G. | |
69 | |
70 ncopcode_operand_flag.enum | |
71 | |
72 Defines the set of bit flags used to define additional | |
73 information about operands of an instructions (such as its | |
74 set/usage). | |
75 | |
76 ncop_expr_node_kind.enum | |
77 | |
78 Defines a label used to define each kind of expression node in | |
79 a translated argument. | |
80 | |
81 ncop_expr_node_flag.enum | |
82 | |
83 Defines a set of bit flags used to define additional | |
84 information about each node in a translated argument (such as | |
85 set/usage). | |
86 | |
87 Modeled Instructions | |
88 -------------------- | |
89 | |
90 Textual version(s) of the instructions understood by the validator can | |
91 be found in the following files. | |
92 | |
93 native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt | |
94 | |
95 Defines the set of instructions understood by the (full) | |
96 decoder. | |
97 | |
98 native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_ins
ts.txt | |
99 | |
100 Defines the set of (partial) isntructions understood by the | |
101 validator decoder. | |
102 | |
103 There are two types of modeled instructions. The first is "hard coded". | |
104 The second is based on an optional prefix, and an opcode | |
105 sequence. | |
106 | |
107 Hard coded modeled instructions represent explicit byte sequences that | |
108 will be recognized. An example is as follows: | |
109 | |
110 --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 --- | |
111 1f 386 | |
112 Nop | |
113 | |
114 The first line (between the "---" markers) defines the sequence of | |
115 bytes that will be explicitly recognized as an instruction if that | |
116 (exact) sequence is found. | |
117 | |
118 The second line starts with a the opcode value associated with this | |
119 instruction (1f in this case). It is then followed by a the | |
120 instruction set the matched instruction is in (386 in this case. The | |
121 full set of cases can be found in enum NaClInstType defined in file | |
122 ../x86_insts.h). | |
123 | |
124 The third line describes the instruction that is assumed to be | |
125 accepted by that sequence of bytes. In this case (and most cases) it | |
126 is the nop instruction. | |
127 | |
128 If no hard coded instructions match the bytes to be decoded, the more | |
129 general form is used. This form is based on an optional prefix, and an | |
130 opcode sequence. An example is as follows: | |
131 | |
132 --- 6f --- | |
133 0f 6f MMX OpcodeUsesModRm | |
134 Movq $Pq, $Qq | |
135 Mmx_G_Operand OpSet OpDest | |
136 Mmx_E_Operand OpUse | |
137 | |
138 The first line defines the opcode value matched, and is surrounded by | |
139 "---" marks. Below this marker line are one (or more) instructions | |
140 that can be matched by the same opcode (and optional prefix). | |
141 | |
142 The second line defines an optional prefix and the opcode sequence (0f | |
143 6f in this case). For more details on this sequence, see "Opcode | |
144 Sequences" below. | |
145 | |
146 The optional prefix and opcode sequence is followed by the instruction | |
147 set the instruction is in (MMX in this case. The full set of cases can | |
148 be found in enum NaClInstType defined in file ../x86_insts.h). | |
149 | |
150 The rest of second line are the set of instruction flags that define | |
151 what additional bytes are necessary, and what conditions must be met | |
152 for the instruction to be decoded. If any condition is not met, the | |
153 next instruction in the list is tried. This process is continued until | |
154 a match is found, or none of the instructions apply. | |
155 | |
156 The set of instruction flags that are accepted are defined in file | |
157 ncopcode_opcode_flags.enum. | |
158 | |
159 The third line defines the instruction that is being decoded. It | |
160 follows AMD's (and Intel's) syntax for instructions. If an argument is | |
161 enclosed in curly braces, it represents an implicit argument (i.e. one | |
162 that is used by the instruction but not part of the corresponding | |
163 assembly instruction). | |
164 | |
165 The forms for valid arguments are defined in section "Instruction | |
166 Arguments" below. | |
167 | |
168 The remaining lines of the instructions define the actual rules that | |
169 will be used to extract that argument from the decoded instruction. | |
170 The first element on the line defines the kind of the operand, and | |
171 is specified in file ncopcode_operand_kind.enum. The remaining elements | |
172 on the line are flags associated with that argument, are specified in | |
173 file ncopcode_operand_flags.enum. | |
174 | |
175 Opcode Sequences | |
176 ---------------- | |
177 | |
178 Each instruction is defined by an opcode sequence, that can be | |
179 prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a | |
180 multi-byte opcode sequence). An example opcode sequence is: | |
181 | |
182 0f f6 | |
183 | |
184 Opcode sequences can also be buried in the modrm byte, or the opcode | |
185 byte. To clarify this, additional modifiers may be added to the end | |
186 of the sequence. | |
187 | |
188 If the sequence is followed by a "/ n", then n defines the value that | |
189 must be in the reg field of the modrm byte. If the sequence is | |
190 followed by a "/ n / m", then n defines the value that must be in the | |
191 reg field of the modrm byte, while m defines the value that must by in | |
192 the r/m feild if the modrm byte. | |
193 | |
194 If the sequence is followed by a "- rN", then the instruction is one | |
195 that encodes a register selection as part of the opcode. Fegister N | |
196 (0..7) is to be used by the instruction. | |
197 | |
198 If the opcode sequence is of the form "... 0F 0F XX", then XX appears | |
199 as the last byte of the instruction, rather than the next byte after | |
200 the two 0F bytes. This allows us to recognize E3DNOW instructions. | |
201 | |
202 Instruction Arguments | |
203 --------------------- | |
204 | |
205 The modeled instructions specify what assembly instructions are | |
206 recognized by the decoder. The form used is based on the AMD (R) | |
207 document 24594-Rev.3.14-September 2007, "AMD64 Architecture | |
208 Programmer's manual Volume 3: General-Purpose and System | |
209 Instructions", and Intel (R) docuements 253666-030US - March 2009, | |
210 "Intel 654 and IA-32 Architectures Software Developer's Manual, | |
211 Volume2A: Instruction Set Reference, A-M" and 253667-030US - March | |
212 2009, "Intel 654 and IA-32 Architectures Software Developer's Manual, | |
213 Volume2B: Instruction Set Reference, N-Z". In particular, it tries to | |
214 follow the print forms defined by AMD's "Appendex section A.1 - | |
215 Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To | |
216 Abbreviations". These forms are summarized here. For more detailed | |
217 information see | |
218 native_client/src/trusted/validator_x86/ncdecode_forms.h. | |
219 | |
220 A print form describes an argument. If the operand is implicit (i.e. | |
221 it defines a register/memory value effected by the instruction, but is | |
222 not part of the assembly form) it is enclosed in curly braces. If the | |
223 print form corresponds to a register. the register is specified by | |
224 proceeding the name with the "%" prefix. | |
225 | |
226 All other print forms define a set of possible arguments. It begins | |
227 with the character '$', and is followed by a name. The name consists | |
228 of a FORM, followed by a size specification. | |
229 | |
230 Valid FORM's are (note: as mentioned above, these forms follow the | |
231 conventions of both AMD and Intel): | |
232 | |
233 A - Far pointer is encoded in the instruction. | |
234 | |
235 C - Control register specified by the ModRM reg field. | |
236 | |
237 D - Debug register specified by the ModRM reg field. | |
238 | |
239 E - General purpose register or memory operand specified by the | |
240 ModRm byte. Memory addresses can be computed from a segment | |
241 register, SIB byte, and/or displacement. | |
242 | |
243 F - rFLAGS register. | |
244 | |
245 G - General purpose register specified by the ModRm reg field. | |
246 | |
247 I - Immediate value. | |
248 | |
249 J - The instruction includes a relative offset that is added to | |
250 the rIP register. | |
251 | |
252 M - A memory operand specified by the ModRM byte. | |
253 | |
254 O - The offset of an operand is encoded in the | |
255 instruction. There is no ModRm byte in the | |
256 instruction. Complex addressing using the SIB byte cannot be | |
257 done. | |
258 | |
259 P - 64-bit MMX register specified by the ModRM reg field. | |
260 | |
261 PR - 64 bit MMX register specified by the ModRM r/m field. The | |
262 ModRM mod field must be 11b. | |
263 | |
264 Q - 64 bit MMX register or memory operand specified by the ModRM | |
265 byte. Memory addresses can be computed from a segment | |
266 register, SIB byte, and/or displacement. | |
267 | |
268 R - General purpose register specified by the ModRM r/m | |
269 field. The ModeRm mod field must be 11b. | |
270 | |
271 S - Segment register specified by the ModRM reg field. | |
272 | |
273 U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register. | |
274 | |
275 V - 128-bit XMM register specified by the ModRM reg field. | |
276 | |
277 VR - 128-bit XMM register specified by the ModRM r/m field. The | |
278 ModRM mod field must be 11b. | |
279 | |
280 W - 128 Xmm register or memory operand specified by the ModRm | |
281 Byte. Memory addresses can be computed from a segment | |
282 register, SIB byte, and/or displacement. | |
283 | |
284 X - A memory operand addressed by the DS.rSI registers. Used in | |
285 string instructions. | |
286 | |
287 Y - A memory operand addressed by the ES.rDI registers. Used in string | |
288 instructions. | |
289 | |
290 r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and | |
291 the optional registers r8-r15 if REX.b is set, based on the | |
292 register value embedded in the opcode. | |
293 | |
294 SG - segment address defined by a G expression and the segment | |
295 register in the corresponding mnemonic (lds, les, lfs, lgs, | |
296 lss). | |
297 | |
298 rAX - The register AX, EAX, or RAX, depending on SIZE. | |
299 | |
300 rBP - The register BP, EBP, or RBP, depending on SIZE. | |
301 | |
302 rBX - The register BX, EBX, or RBX, depending on SIZE. | |
303 | |
304 rCX - The register CX, ECX, or RCX, depending on SIZE. | |
305 | |
306 rDI - The register DI, EDI, or RDI, depending on SIZE. | |
307 | |
308 rDX - The register DX, EDX, or RDX, depending on SIZE. | |
309 | |
310 rSI - The register SI, ESI, or RSI, depending on SIZE. | |
311 | |
312 rSP - The register SP, ESP, or RSP, depending on SIZE. | |
313 | |
314 Note: r8 is not in the manuals cited above. It has been added to deal | |
315 with instructions with an embedded register in the opcode. In such | |
316 cases, this value allows a single defining call to be used (within a | |
317 for loop), rather than writing eight separate rules (one for each | |
318 possible register value). | |
319 | |
320 | |
321 Valid SIZEs are (note: as mentioned above, these forms follow the | |
322 conventions of both AMD and Intel): | |
323 | |
324 a - Two 16-bit or 32-bit memory operands, depending on the | |
325 effective operand size. Used in the BOUND instruction. | |
326 | |
327 b - A byte, irrespective of the effective operand size. | |
328 | |
329 d - A doubleword (32-bits), irrespective of the effective operand size. | |
330 | |
331 dq - A douible-quadword (128 bits), irrespective of the effective | |
332 operand size. | |
333 | |
334 p - A 32-bit or 48-bit far pointer, depending on the effective | |
335 operand size. | |
336 | |
337 pd - A 128-bit double-precision floating point vector operand | |
338 (packed double). | |
339 | |
340 pi - A 64-bit MMX operand (packed integer). | |
341 | |
342 ps - A 138-bit single precision floating point vector operand | |
343 (packed single). | |
344 | |
345 q - A quadword, irrespective of the effective operand size. | |
346 | |
347 s - A 6-byte or 10-byte pseudo-descriptor. | |
348 | |
349 sd - A scalar double-precision floating point operand (scalar | |
350 double). | |
351 | |
352 si - A scalar doubleword (32-bit) integer operand (scalar | |
353 integer). | |
354 | |
355 ss - A scalar single-precision floating-point operand (scalar | |
356 single). | |
357 | |
358 w - A word, irrespective of the effective operand size. | |
359 | |
360 v - A word, doubleword, or quadword, depending on the effective | |
361 operand size. | |
362 | |
363 va - A word, doubleword, or quadword, depending on the effective | |
364 address size. | |
365 | |
366 vw - A word only when the effective operand size matches. | |
367 | |
368 vd - A doubleword only when the effective operand size matches. | |
369 | |
370 | |
371 vq - A quadword only when the effective operand size matches. | |
372 | |
373 w - A word, irrespective of the effective operand size. | |
374 | |
375 z - A word if the effective operand size is 16 bits, or a | |
376 doubleword if the effective operand size is 32 or 64 bits. | |
377 | |
378 zw - A word only when the effective operand size matches. | |
379 | |
380 zd - A doubleword only when the effective operand size is 32 or | |
381 64 bits. | |
382 | |
383 Note: vw, vd, vq, zw, and zd are not in the manuals cited | |
384 above. However, they have been added so that sub-variants of an v/z | |
385 instruction (not specified in the manual) can be specified. | |
386 | |
387 Note: The AMD manual uses some slash notations (such as d/q) which isn't | |
388 explicitly defined. In general, we allow such notation as specified in | |
389 the AMD manual. Depending on the use, it can mean any of the following: | |
390 | |
391 (1) In 32-bit mode, d is used. In 64-bit mode, q is used. | |
392 | |
393 (2) only 32-bit or 64-bit values are allowed. | |
394 | |
395 In addition, when the nmemonic name changes based on which value is | |
396 chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to | |
397 denote the 64 bit case. | |
398 | |
399 In addition, this code adds the following special print forms: | |
400 | |
401 One - The literal constant 1. | |
402 | |
403 Debugging | |
404 --------- | |
405 | |
406 Many of the source files contain #define DEBUGGING flags. When | |
407 DEBUGGING is set to 1, additional debugging print messages are | |
408 compiled into the code. Unfortunately, by default, these message | |
409 frequently call routines that are not compiled into corresponding | |
410 executables (such as ncval and ncdis). To add the additional routines, | |
411 edit file | |
412 | |
413 native_client/site_scons/site_tools/library_deps.py | |
414 | |
415 For x86-32, edit lines | |
416 | |
417 # When turning on the DEBUGGING flag in the x86-32 validator | |
418 # or decoder, add the following: | |
419 #'nc_opcode_modeling_verbose_x86_32', | |
420 | |
421 to | |
422 | |
423 # When turning on the DEBUGGING flag in the x86-32 validator | |
424 # or decoder, add the following: | |
425 'nc_opcode_modeling_verbose_x86_32', | |
426 | |
427 For x86-64, edit lines | |
428 | |
429 # When turning on the DEBUGGING flag in the x86-64 validator | |
430 # or decoder, add the following: | |
431 # 'nc_opcode_modeling_verbose_x86_64', | |
432 | |
433 to | |
434 | |
435 # When turning on the DEBUGGING flag in the x86-64 validator | |
436 # or decoder, add the following: | |
437 'nc_opcode_modeling_verbose_x86_64', | |
438 | |
439 These changes will make sure that the corresponding print routines are | |
440 added to the executables during link time. | |
OLD | NEW |