src/trusted/validator/x86/decoder/README - Issue 625923004: Delete old x86 validator.

Side by Side Diff: src/trusted/validator/x86/decoder/README

Issue 625923004: Delete old x86 validator. (Closed) Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client

Patch Set: rebase master Created 6 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
	(Empty)
1 This directory implements an x86 decoder, from a table of modeled

2 instructions.

3

4 Note: Currently, this decoder is only used in the x86-64

5 validator. However, the plan is to move the x86-32 validator to also

6 use this decoder. See

7 http://code.google.com/p/nativeclient/issues/detail?id=2154 for more

8 details.

9

10 ncopcode_desc.{h,c}

11

12 Defines modeled instructions.

13

14 nc_decode_tables.h

15

16 Defines the structure of the generated table of modeled

17 instructions.

18

19 nc_inst_state.{h,c}

20

21 Defines how to access parsed x86 instructions, and the x86

22 instruction parser.

23

24 nc_inst_state_statics.c

25

26 Static routines that should be in nc_inst_state.c, but are

27 included instead. This separation allows our testing code in

28 nc_inst_state_tests.cc to test the static routines.

29

30 nc_inst_iter.{h,c}

31

32 Defines an iterator that walks the memory block and parses the

33 instructions in the memory block.

34

35 nc_inst_state_internal.h

36

37 Defines the structures used to hold parsed instructions, and

38 an iterator to parse such instructions in a memory block.

39

40 ncop_exps.{h,c}

41

42 Defines an expression tree (of arguments) that is (optionally)

43 generated after the instruction is parsed.

44

45 nc_inst_trans.{h,c}

46

47 Defines a translator that takes the data in a parsed

48 instruction, and generates the corresponding expression trees

49 defined in ncop_exps.h

50

51 ncopcode_insts.enum

52

53 Defines the names of instructions recognized by the decoder.

54

55 ncopcode_prefix.enum

56

57 Defines the set of opcode/prefix bytes, besides the last

58 matched byte, that are allowed in x86 instructions.

59

60 ncopcode_ocpcode_flags.enum

61

62 Defines the set of bit flags used to define how to decode an

63 x86 instruction.

64

65 ncopcode_operand_kind.enum

66

67 Defines the different categories of operands, typically

68 corresponding to operand argument descriptors like $E and $G.

69

70 ncopcode_operand_flag.enum

71

72 Defines the set of bit flags used to define additional

73 information about operands of an instructions (such as its

74 set/usage).

75

76 ncop_expr_node_kind.enum

77

78 Defines a label used to define each kind of expression node in

79 a translated argument.

80

81 ncop_expr_node_flag.enum

82

83 Defines a set of bit flags used to define additional

84 information about each node in a translated argument (such as

85 set/usage).

86

87 Modeled Instructions

88 --------------------

89

90 Textual version(s) of the instructions understood by the validator can

91 be found in the following files.

92

93 native_client/src/trusted/validator_x86/testdata/64/modeled_insts.txt

94

95 Defines the set of instructions understood by the (full)

96 decoder.

97

98 native_client/src/trusted/validator_x86/testdata/64/ncval_reg_sfi_modeled_ins ts.txt

99

100 Defines the set of (partial) isntructions understood by the

101 validator decoder.

102

103 There are two types of modeled instructions. The first is "hard coded".

104 The second is based on an optional prefix, and an opcode

105 sequence.

106

107 Hard coded modeled instructions represent explicit byte sequences that

108 will be recognized. An example is as follows:

109

110 --- 66 66 66 2e 0f 1f 84 00 00 00 00 00 ---

111 1f 386

112 Nop

113

114 The first line (between the "---" markers) defines the sequence of

115 bytes that will be explicitly recognized as an instruction if that

116 (exact) sequence is found.

117

118 The second line starts with a the opcode value associated with this

119 instruction (1f in this case). It is then followed by a the

120 instruction set the matched instruction is in (386 in this case. The

121 full set of cases can be found in enum NaClInstType defined in file

122 ../x86_insts.h).

123

124 The third line describes the instruction that is assumed to be

125 accepted by that sequence of bytes. In this case (and most cases) it

126 is the nop instruction.

127

128 If no hard coded instructions match the bytes to be decoded, the more

129 general form is used. This form is based on an optional prefix, and an

130 opcode sequence. An example is as follows:

131

132 --- 6f ---

133 0f 6f MMX OpcodeUsesModRm

134 Movq $Pq, $Qq

135 Mmx_G_Operand OpSet OpDest

136 Mmx_E_Operand OpUse

137

138 The first line defines the opcode value matched, and is surrounded by

139 "---" marks. Below this marker line are one (or more) instructions

140 that can be matched by the same opcode (and optional prefix).

141

142 The second line defines an optional prefix and the opcode sequence (0f

143 6f in this case). For more details on this sequence, see "Opcode

144 Sequences" below.

145

146 The optional prefix and opcode sequence is followed by the instruction

147 set the instruction is in (MMX in this case. The full set of cases can

148 be found in enum NaClInstType defined in file ../x86_insts.h).

149

150 The rest of second line are the set of instruction flags that define

151 what additional bytes are necessary, and what conditions must be met

152 for the instruction to be decoded. If any condition is not met, the

153 next instruction in the list is tried. This process is continued until

154 a match is found, or none of the instructions apply.

155

156 The set of instruction flags that are accepted are defined in file

157 ncopcode_opcode_flags.enum.

158

159 The third line defines the instruction that is being decoded. It

160 follows AMD's (and Intel's) syntax for instructions. If an argument is

161 enclosed in curly braces, it represents an implicit argument (i.e. one

162 that is used by the instruction but not part of the corresponding

163 assembly instruction).

164

165 The forms for valid arguments are defined in section "Instruction

166 Arguments" below.

167

168 The remaining lines of the instructions define the actual rules that

169 will be used to extract that argument from the decoded instruction.

170 The first element on the line defines the kind of the operand, and

171 is specified in file ncopcode_operand_kind.enum. The remaining elements

172 on the line are flags associated with that argument, are specified in

173 file ncopcode_operand_flags.enum.

174

175 Opcode Sequences

176 ----------------

177

178 Each instruction is defined by an opcode sequence, that can be

179 prefixed by an optional prefix (i.e. 66, f2, or F3 if it is a

180 multi-byte opcode sequence). An example opcode sequence is:

181

182 0f f6

183

184 Opcode sequences can also be buried in the modrm byte, or the opcode

185 byte. To clarify this, additional modifiers may be added to the end

186 of the sequence.

187

188 If the sequence is followed by a "/ n", then n defines the value that

189 must be in the reg field of the modrm byte. If the sequence is

190 followed by a "/ n / m", then n defines the value that must be in the

191 reg field of the modrm byte, while m defines the value that must by in

192 the r/m feild if the modrm byte.

193

194 If the sequence is followed by a "- rN", then the instruction is one

195 that encodes a register selection as part of the opcode. Fegister N

196 (0..7) is to be used by the instruction.

197

198 If the opcode sequence is of the form "... 0F 0F XX", then XX appears

199 as the last byte of the instruction, rather than the next byte after

200 the two 0F bytes. This allows us to recognize E3DNOW instructions.

201

202 Instruction Arguments

203 ---------------------

204

205 The modeled instructions specify what assembly instructions are

206 recognized by the decoder. The form used is based on the AMD (R)

207 document 24594-Rev.3.14-September 2007, "AMD64 Architecture

208 Programmer's manual Volume 3: General-Purpose and System

209 Instructions", and Intel (R) docuements 253666-030US - March 2009,

210 "Intel 654 and IA-32 Architectures Software Developer's Manual,

211 Volume2A: Instruction Set Reference, A-M" and 253667-030US - March

212 2009, "Intel 654 and IA-32 Architectures Software Developer's Manual,

213 Volume2B: Instruction Set Reference, N-Z". In particular, it tries to

214 follow the print forms defined by AMD's "Appendex section A.1 -

215 Opcode-Syntax Notation", or Intel's "Appendix Section A.2 - Key To

216 Abbreviations". These forms are summarized here. For more detailed

217 information see

218 native_client/src/trusted/validator_x86/ncdecode_forms.h.

219

220 A print form describes an argument. If the operand is implicit (i.e.

221 it defines a register/memory value effected by the instruction, but is

222 not part of the assembly form) it is enclosed in curly braces. If the

223 print form corresponds to a register. the register is specified by

224 proceeding the name with the "%" prefix.

225

226 All other print forms define a set of possible arguments. It begins

227 with the character '$', and is followed by a name. The name consists

228 of a FORM, followed by a size specification.

229

230 Valid FORM's are (note: as mentioned above, these forms follow the

231 conventions of both AMD and Intel):

232

233 A - Far pointer is encoded in the instruction.

234

235 C - Control register specified by the ModRM reg field.

236

237 D - Debug register specified by the ModRM reg field.

238

239 E - General purpose register or memory operand specified by the

240 ModRm byte. Memory addresses can be computed from a segment

241 register, SIB byte, and/or displacement.

242

243 F - rFLAGS register.

244

245 G - General purpose register specified by the ModRm reg field.

246

247 I - Immediate value.

248

249 J - The instruction includes a relative offset that is added to

250 the rIP register.

251

252 M - A memory operand specified by the ModRM byte.

253

254 O - The offset of an operand is encoded in the

255 instruction. There is no ModRm byte in the

256 instruction. Complex addressing using the SIB byte cannot be

257 done.

258

259 P - 64-bit MMX register specified by the ModRM reg field.

260

261 PR - 64 bit MMX register specified by the ModRM r/m field. The

262 ModRM mod field must be 11b.

263

264 Q - 64 bit MMX register or memory operand specified by the ModRM

265 byte. Memory addresses can be computed from a segment

266 register, SIB byte, and/or displacement.

267

268 R - General purpose register specified by the ModRM r/m

269 field. The ModeRm mod field must be 11b.

270

271 S - Segment register specified by the ModRM reg field.

272

273 U - The R/Mfield of the ModR/M byte selects a 128-bit XMM register.

274

275 V - 128-bit XMM register specified by the ModRM reg field.

276

277 VR - 128-bit XMM register specified by the ModRM r/m field. The

278 ModRM mod field must be 11b.

279

280 W - 128 Xmm register or memory operand specified by the ModRm

281 Byte. Memory addresses can be computed from a segment

282 register, SIB byte, and/or displacement.

283

284 X - A memory operand addressed by the DS.rSI registers. Used in

285 string instructions.

286

287 Y - A memory operand addressed by the ES.rDI registers. Used in string

288 instructions.

289

290 r8 - The 8 registers rAX, rCX, rDX, rBX, rSP, rBP, rSI, rDI, and

291 the optional registers r8-r15 if REX.b is set, based on the

292 register value embedded in the opcode.

293

294 SG - segment address defined by a G expression and the segment

295 register in the corresponding mnemonic (lds, les, lfs, lgs,

296 lss).

297

298 rAX - The register AX, EAX, or RAX, depending on SIZE.

299

300 rBP - The register BP, EBP, or RBP, depending on SIZE.

301

302 rBX - The register BX, EBX, or RBX, depending on SIZE.

303

304 rCX - The register CX, ECX, or RCX, depending on SIZE.

305

306 rDI - The register DI, EDI, or RDI, depending on SIZE.

307

308 rDX - The register DX, EDX, or RDX, depending on SIZE.

309

310 rSI - The register SI, ESI, or RSI, depending on SIZE.

311

312 rSP - The register SP, ESP, or RSP, depending on SIZE.

313

314 Note: r8 is not in the manuals cited above. It has been added to deal

315 with instructions with an embedded register in the opcode. In such

316 cases, this value allows a single defining call to be used (within a

317 for loop), rather than writing eight separate rules (one for each

318 possible register value).

319

320

321 Valid SIZEs are (note: as mentioned above, these forms follow the

322 conventions of both AMD and Intel):

323

324 a - Two 16-bit or 32-bit memory operands, depending on the

325 effective operand size. Used in the BOUND instruction.

326

327 b - A byte, irrespective of the effective operand size.

328

329 d - A doubleword (32-bits), irrespective of the effective operand size.

330

331 dq - A douible-quadword (128 bits), irrespective of the effective

332 operand size.

333

334 p - A 32-bit or 48-bit far pointer, depending on the effective

335 operand size.

336

337 pd - A 128-bit double-precision floating point vector operand

338 (packed double).

339

340 pi - A 64-bit MMX operand (packed integer).

341

342 ps - A 138-bit single precision floating point vector operand

343 (packed single).

344

345 q - A quadword, irrespective of the effective operand size.

346

347 s - A 6-byte or 10-byte pseudo-descriptor.

348

349 sd - A scalar double-precision floating point operand (scalar

350 double).

351

352 si - A scalar doubleword (32-bit) integer operand (scalar

353 integer).

354

355 ss - A scalar single-precision floating-point operand (scalar

356 single).

357

358 w - A word, irrespective of the effective operand size.

359

360 v - A word, doubleword, or quadword, depending on the effective

361 operand size.

362

363 va - A word, doubleword, or quadword, depending on the effective

364 address size.

365

366 vw - A word only when the effective operand size matches.

367

368 vd - A doubleword only when the effective operand size matches.

369

370

371 vq - A quadword only when the effective operand size matches.

372

373 w - A word, irrespective of the effective operand size.

374

375 z - A word if the effective operand size is 16 bits, or a

376 doubleword if the effective operand size is 32 or 64 bits.

377

378 zw - A word only when the effective operand size matches.

379

380 zd - A doubleword only when the effective operand size is 32 or

381 64 bits.

382

383 Note: vw, vd, vq, zw, and zd are not in the manuals cited

384 above. However, they have been added so that sub-variants of an v/z

385 instruction (not specified in the manual) can be specified.

386

387 Note: The AMD manual uses some slash notations (such as d/q) which isn't

388 explicitly defined. In general, we allow such notation as specified in

389 the AMD manual. Depending on the use, it can mean any of the following:

390

391 (1) In 32-bit mode, d is used. In 64-bit mode, q is used.

392

393 (2) only 32-bit or 64-bit values are allowed.

394

395 In addition, when the nmemonic name changes based on which value is

396 chosen in d/q, we use d/q/d to denote the 32-bit case, and d/q/q to

397 denote the 64 bit case.

398

399 In addition, this code adds the following special print forms:

400

401 One - The literal constant 1.

402

403 Debugging

404 ---------

405

406 Many of the source files contain #define DEBUGGING flags. When

407 DEBUGGING is set to 1, additional debugging print messages are

408 compiled into the code. Unfortunately, by default, these message

409 frequently call routines that are not compiled into corresponding

410 executables (such as ncval and ncdis). To add the additional routines,

411 edit file

412

413 native_client/site_scons/site_tools/library_deps.py

414

415 For x86-32, edit lines

416

417 # When turning on the DEBUGGING flag in the x86-32 validator

418 # or decoder, add the following:

419 #'nc_opcode_modeling_verbose_x86_32',

420

421 to

422

423 # When turning on the DEBUGGING flag in the x86-32 validator

424 # or decoder, add the following:

425 'nc_opcode_modeling_verbose_x86_32',

426

427 For x86-64, edit lines

428

429 # When turning on the DEBUGGING flag in the x86-64 validator

430 # or decoder, add the following:

431 # 'nc_opcode_modeling_verbose_x86_64',

432

433 to

434

435 # When turning on the DEBUGGING flag in the x86-64 validator

436 # or decoder, add the following:

437 'nc_opcode_modeling_verbose_x86_64',

438

439 These changes will make sure that the corresponding print routines are

440 added to the executables during link time.

OLD	NEW

« no previous file with comments | « src/trusted/validator/x86/README ('k') | src/trusted/validator/x86/decoder/gen/nc_opcode_table_32.h » ('j') | no next file with comments »