Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(6)

Side by Side Diff: src/trusted/validator/x86/decoder/README

Issue 625923004: Delete old x86 validator. (Closed) Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client
Patch Set: rebase master Created 6 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
OLDNEW
(Empty)
1 This directory implements an x86 decoder, from a table of modeled
2 instructions.
3
4 Note: Currently, this decoder is only used in the x86-64
5 validator. However, the plan is to move the x86-32 validator to also
6 use this decoder. See
7 http://code.google.com/p/nativeclient/issues/detail?id=2154 for more
8 details.
9
10 ncopcode_desc.{h,c}
11
12 Defines modeled instructions.
13
14 nc_decode_tables.h
15
16 Defines the structure of the generated table of modeled
17 instructions.
18
19 nc_inst_state.{h,c}
20
21 Defines how to access parsed x86 instructions, and the x86
22 instruction parser.
23
24 nc_inst_state_statics.c
25
26 Static routines that should be in nc_inst_state.c, but are
27 included instead. This separation allows our testing code in
28 nc_inst_state_tests.cc to test the static routines.
29
30 nc_inst_iter.{h,c}
31
32 Defines an iterator that walks the memory block and parses the
33 instructions in the memory block.
34
35 nc_inst_state_internal.h
36
37 Defines the structures used to hold parsed instructions, and
38 an iterator to parse such instructions in a memory block.
39
40 ncop_exps.{h,c}
41
42 Defines an expression tree (of arguments) that is (optionally)
43 generated after the instruction is parsed.
44
45 nc_inst_trans.{h,c}
46
47 Defines a translator that takes the data in a parsed
48 instruction, and generates the corresponding expression trees
49 defined in ncop_exps.h
50
51 ncopcode_insts.enum
52
53 Defines the names of instructions recognized by the decoder.
54
55 ncopcode_prefix.enum
56
57 Defines the set of opcode/prefix bytes, besides the last
58 matched byte, that are allowed in x86 instructions.
59
60 ncopcode_ocpcode_flags.enum
61
62 Defines the set of bit flags used to define how to decode an
63 x86 instruction.
64
65 ncopcode_operand_kind.enum
66
67 Defines the different categories of operands, typically
68 corresponding to operand argument descriptors like $E and $G.
69
70 ncopcode_operand_flag.enum
71
72 Defines the set of bit flags used to define additional
73 information about operands of an instructions (such as its
74 set/usage).
75
76 ncop_expr_node_kind.enum
77
78 Defines a label used to define each kind of expression node in
79 a translated argument.
80
81 ncop_expr_node_flag.enum
82
83 Defines a set of bit flags used to define additional
84 information about each node in a translated argument (such as
85 set/usage).
86
87 Modeled Instructions
88 --------------------
89
90 Textual version(s) of the instructions understood by the validator can
91 be found in the following files.
92
93 native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt
94
95 Defines the set of instructions understood by the (full)
96 decoder.
97
98 native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_ins ts.txt
99
100 Defines the set of (partial) isntructions understood by the
101 validator decoder.
102
103 There are two types of modeled instructions. The first is "hard coded".
104 The second is based on an optional prefix, and an opcode
105 sequence.
106
107 Hard coded modeled instructions represent explicit byte sequences that
108 will be recognized. An example is as follows:
109
110 --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 ---
111 1f 386
112 Nop
113
114 The first line (between the "---" markers) defines the sequence of
115 bytes that will be explicitly recognized as an instruction if that
116 (exact) sequence is found.
117
118 The second line starts with a the opcode value associated with this
119 instruction (1f in this case). It is then followed by a the
120 instruction set the matched instruction is in (386 in this case. The
121 full set of cases can be found in enum NaClInstType defined in file
122 ../x86_insts.h).
123
124 The third line describes the instruction that is assumed to be
125 accepted by that sequence of bytes. In this case (and most cases) it
126 is the nop instruction.
127
128 If no hard coded instructions match the bytes to be decoded, the more
129 general form is used. This form is based on an optional prefix, and an
130 opcode sequence. An example is as follows:
131
132 --- 6f ---
133 0f 6f MMX OpcodeUsesModRm
134 Movq $Pq, $Qq
135 Mmx_G_Operand OpSet OpDest
136 Mmx_E_Operand OpUse
137
138 The first line defines the opcode value matched, and is surrounded by
139 "---" marks. Below this marker line are one (or more) instructions
140 that can be matched by the same opcode (and optional prefix).
141
142 The second line defines an optional prefix and the opcode sequence (0f
143 6f in this case). For more details on this sequence, see "Opcode
144 Sequences" below.
145
146 The optional prefix and opcode sequence is followed by the instruction
147 set the instruction is in (MMX in this case. The full set of cases can
148 be found in enum NaClInstType defined in file ../x86_insts.h).
149
150 The rest of second line are the set of instruction flags that define
151 what additional bytes are necessary, and what conditions must be met
152 for the instruction to be decoded. If any condition is not met, the
153 next instruction in the list is tried. This process is continued until
154 a match is found, or none of the instructions apply.
155
156 The set of instruction flags that are accepted are defined in file
157 ncopcode_opcode_flags.enum.
158
159 The third line defines the instruction that is being decoded. It
160 follows AMD's (and Intel's) syntax for instructions. If an argument is
161 enclosed in curly braces, it represents an implicit argument (i.e. one
162 that is used by the instruction but not part of the corresponding
163 assembly instruction).
164
165 The forms for valid arguments are defined in section "Instruction
166 Arguments" below.
167
168 The remaining lines of the instructions define the actual rules that
169 will be used to extract that argument from the decoded instruction.
170 The first element on the line defines the kind of the operand, and
171 is specified in file ncopcode_operand_kind.enum. The remaining elements
172 on the line are flags associated with that argument, are specified in
173 file ncopcode_operand_flags.enum.
174
175 Opcode Sequences
176 ----------------
177
178 Each instruction is defined by an opcode sequence, that can be
179 prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a
180 multi-byte opcode sequence). An example opcode sequence is:
181
182 0f f6
183
184 Opcode sequences can also be buried in the modrm byte, or the opcode
185 byte. To clarify this, additional modifiers may be added to the end
186 of the sequence.
187
188 If the sequence is followed by a "/ n", then n defines the value that
189 must be in the reg field of the modrm byte. If the sequence is
190 followed by a "/ n / m", then n defines the value that must be in the
191 reg field of the modrm byte, while m defines the value that must by in
192 the r/m feild if the modrm byte.
193
194 If the sequence is followed by a "- rN", then the instruction is one
195 that encodes a register selection as part of the opcode. Fegister N
196 (0..7) is to be used by the instruction.
197
198 If the opcode sequence is of the form "... 0F 0F XX", then XX appears
199 as the last byte of the instruction, rather than the next byte after
200 the two 0F bytes. This allows us to recognize E3DNOW instructions.
201
202 Instruction Arguments
203 ---------------------
204
205 The modeled instructions specify what assembly instructions are
206 recognized by the decoder. The form used is based on the AMD (R)
207 document 24594-Rev.3.14-September 2007, "AMD64 Architecture
208 Programmer's manual Volume 3: General-Purpose and System
209 Instructions", and Intel (R) docuements 253666-030US - March 2009,
210 "Intel 654 and IA-32 Architectures Software Developer's Manual,
211 Volume2A: Instruction Set Reference, A-M" and 253667-030US - March
212 2009, "Intel 654 and IA-32 Architectures Software Developer's Manual,
213 Volume2B: Instruction Set Reference, N-Z". In particular, it tries to
214 follow the print forms defined by AMD's "Appendex section A.1 -
215 Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To
216 Abbreviations". These forms are summarized here. For more detailed
217 information see
218 native_client/src/trusted/validator_x86/ncdecode_forms.h.
219
220 A print form describes an argument. If the operand is implicit (i.e.
221 it defines a register/memory value effected by the instruction, but is
222 not part of the assembly form) it is enclosed in curly braces. If the
223 print form corresponds to a register. the register is specified by
224 proceeding the name with the "%" prefix.
225
226 All other print forms define a set of possible arguments. It begins
227 with the character '$', and is followed by a name. The name consists
228 of a FORM, followed by a size specification.
229
230 Valid FORM's are (note: as mentioned above, these forms follow the
231 conventions of both AMD and Intel):
232
233 A - Far pointer is encoded in the instruction.
234
235 C - Control register specified by the ModRM reg field.
236
237 D - Debug register specified by the ModRM reg field.
238
239 E - General purpose register or memory operand specified by the
240 ModRm byte. Memory addresses can be computed from a segment
241 register, SIB byte, and/or displacement.
242
243 F - rFLAGS register.
244
245 G - General purpose register specified by the ModRm reg field.
246
247 I - Immediate value.
248
249 J - The instruction includes a relative offset that is added to
250 the rIP register.
251
252 M - A memory operand specified by the ModRM byte.
253
254 O - The offset of an operand is encoded in the
255 instruction. There is no ModRm byte in the
256 instruction. Complex addressing using the SIB byte cannot be
257 done.
258
259 P - 64-bit MMX register specified by the ModRM reg field.
260
261 PR - 64 bit MMX register specified by the ModRM r/m field. The
262 ModRM mod field must be 11b.
263
264 Q - 64 bit MMX register or memory operand specified by the ModRM
265 byte. Memory addresses can be computed from a segment
266 register, SIB byte, and/or displacement.
267
268 R - General purpose register specified by the ModRM r/m
269 field. The ModeRm mod field must be 11b.
270
271 S - Segment register specified by the ModRM reg field.
272
273 U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register.
274
275 V - 128-bit XMM register specified by the ModRM reg field.
276
277 VR - 128-bit XMM register specified by the ModRM r/m field. The
278 ModRM mod field must be 11b.
279
280 W - 128 Xmm register or memory operand specified by the ModRm
281 Byte. Memory addresses can be computed from a segment
282 register, SIB byte, and/or displacement.
283
284 X - A memory operand addressed by the DS.rSI registers. Used in
285 string instructions.
286
287 Y - A memory operand addressed by the ES.rDI registers. Used in string
288 instructions.
289
290 r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and
291 the optional registers r8-r15 if REX.b is set, based on the
292 register value embedded in the opcode.
293
294 SG - segment address defined by a G expression and the segment
295 register in the corresponding mnemonic (lds, les, lfs, lgs,
296 lss).
297
298 rAX - The register AX, EAX, or RAX, depending on SIZE.
299
300 rBP - The register BP, EBP, or RBP, depending on SIZE.
301
302 rBX - The register BX, EBX, or RBX, depending on SIZE.
303
304 rCX - The register CX, ECX, or RCX, depending on SIZE.
305
306 rDI - The register DI, EDI, or RDI, depending on SIZE.
307
308 rDX - The register DX, EDX, or RDX, depending on SIZE.
309
310 rSI - The register SI, ESI, or RSI, depending on SIZE.
311
312 rSP - The register SP, ESP, or RSP, depending on SIZE.
313
314 Note: r8 is not in the manuals cited above. It has been added to deal
315 with instructions with an embedded register in the opcode. In such
316 cases, this value allows a single defining call to be used (within a
317 for loop), rather than writing eight separate rules (one for each
318 possible register value).
319
320
321 Valid SIZEs are (note: as mentioned above, these forms follow the
322 conventions of both AMD and Intel):
323
324 a - Two 16-bit or 32-bit memory operands, depending on the
325 effective operand size. Used in the BOUND instruction.
326
327 b - A byte, irrespective of the effective operand size.
328
329 d - A doubleword (32-bits), irrespective of the effective operand size.
330
331 dq - A douible-quadword (128 bits), irrespective of the effective
332 operand size.
333
334 p - A 32-bit or 48-bit far pointer, depending on the effective
335 operand size.
336
337 pd - A 128-bit double-precision floating point vector operand
338 (packed double).
339
340 pi - A 64-bit MMX operand (packed integer).
341
342 ps - A 138-bit single precision floating point vector operand
343 (packed single).
344
345 q - A quadword, irrespective of the effective operand size.
346
347 s - A 6-byte or 10-byte pseudo-descriptor.
348
349 sd - A scalar double-precision floating point operand (scalar
350 double).
351
352 si - A scalar doubleword (32-bit) integer operand (scalar
353 integer).
354
355 ss - A scalar single-precision floating-point operand (scalar
356 single).
357
358 w - A word, irrespective of the effective operand size.
359
360 v - A word, doubleword, or quadword, depending on the effective
361 operand size.
362
363 va - A word, doubleword, or quadword, depending on the effective
364 address size.
365
366 vw - A word only when the effective operand size matches.
367
368 vd - A doubleword only when the effective operand size matches.
369
370
371 vq - A quadword only when the effective operand size matches.
372
373 w - A word, irrespective of the effective operand size.
374
375 z - A word if the effective operand size is 16 bits, or a
376 doubleword if the effective operand size is 32 or 64 bits.
377
378 zw - A word only when the effective operand size matches.
379
380 zd - A doubleword only when the effective operand size is 32 or
381 64 bits.
382
383 Note: vw, vd, vq, zw, and zd are not in the manuals cited
384 above. However, they have been added so that sub-variants of an v/z
385 instruction (not specified in the manual) can be specified.
386
387 Note: The AMD manual uses some slash notations (such as d/q) which isn't
388 explicitly defined. In general, we allow such notation as specified in
389 the AMD manual. Depending on the use, it can mean any of the following:
390
391 (1) In 32-bit mode, d is used. In 64-bit mode, q is used.
392
393 (2) only 32-bit or 64-bit values are allowed.
394
395 In addition, when the nmemonic name changes based on which value is
396 chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to
397 denote the 64 bit case.
398
399 In addition, this code adds the following special print forms:
400
401 One - The literal constant 1.
402
403 Debugging
404 ---------
405
406 Many of the source files contain #define DEBUGGING flags. When
407 DEBUGGING is set to 1, additional debugging print messages are
408 compiled into the code. Unfortunately, by default, these message
409 frequently call routines that are not compiled into corresponding
410 executables (such as ncval and ncdis). To add the additional routines,
411 edit file
412
413 native_client/site_scons/site_tools/library_deps.py
414
415 For x86-32, edit lines
416
417 # When turning on the DEBUGGING flag in the x86-32 validator
418 # or decoder, add the following:
419 #'nc_opcode_modeling_verbose_x86_32',
420
421 to
422
423 # When turning on the DEBUGGING flag in the x86-32 validator
424 # or decoder, add the following:
425 'nc_opcode_modeling_verbose_x86_32',
426
427 For x86-64, edit lines
428
429 # When turning on the DEBUGGING flag in the x86-64 validator
430 # or decoder, add the following:
431 # 'nc_opcode_modeling_verbose_x86_64',
432
433 to
434
435 # When turning on the DEBUGGING flag in the x86-64 validator
436 # or decoder, add the following:
437 'nc_opcode_modeling_verbose_x86_64',
438
439 These changes will make sure that the corresponding print routines are
440 added to the executables during link time.
OLDNEW
« no previous file with comments | « src/trusted/validator/x86/README ('k') | src/trusted/validator/x86/decoder/gen/nc_opcode_table_32.h » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698