Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(6)

Side by Side Diff: src/trusted/validator_ragel/compress_regular_instructions.py

Issue 49183002: Regular instructions golden file test. Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client/
Patch Set: Created 7 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « no previous file | src/trusted/validator_ragel/testdata/32bit_regular.golden » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Copyright (c) 2013 The Native Client Authors. All rights reserved.
2 # Use of this source code is governed by a BSD-style license that can be
3 # found in the LICENSE file.
4
5 """
6 Traverse the validator's DFA, collect all "normal" instruction and then
7 compress output. Note: "anybyte fields" (immediates and displacements)
8 are always filled with zeros. Otherwise processing of sextillions (sic!)
9 of possibilities will take too long.
10
11 The following compression rules are present:
12
13 1. Compress ModR/M (+SIB & displacement).
14 Instruction: 00 00 %al,(%rax)
halyavin 2013/11/01 08:17:49 add %al,(%rax) or may be even add %al,(%eax).
khim 2013/11/05 15:04:36 Done.
15 ...
16 Instruction: 00 ff add %bh,%bh
17 becomes
18 Instruction: 00 XX add [%al..%bh],[%al..%bh or memory]
19
20 Only applies if all possibilities are accepted by validator.
21
22 1a. Compress ModR/M (+SIB & displacement) memory-only.
23 Instruction: f0 01 00 lock add %eax,(%eax)
24 ...
25 Instruction: f0 01 bf 00 00 00 00 lock add %edi,0x0(%edi)
26 becomes
27 Instruction: f0 01 XX lock add [%eax..edi],[memory]
28
29 Only applies if all possibile memory accesses are accepted by validator.
halyavin 2013/11/01 08:17:49 possible
khim 2013/11/05 15:04:36 Done.
30
31 1b. Compress ModR/M (+SIB & displacement) register only
32 Instruction: 66 0f 50 c0 movmskpd %xmm0,%eax
33 ...
34 Instruction: 66 0f 50 ff movmskpd %xmm7,%edi
35 becomes
36 Instruction: 66 0f 50 XX movmskpd [%xmm0..%xmm7],[%eax..edi]
37
38 Only applies if all possible register accesses are accepted by validator.
39
40 2. Compress ModR/M (+SIB & displacement) with opcode extension.
41 Instruction: 0f 90 00 seto (%eax)
42 ...
43 Instruction: 0f 90 c7 seto %bh
44 becomes
45 Instruction: 0f 90 XX/0 seto [%al..%bh or memory]
46
47 Only applies if all possibilities are accepted by validator.
48
49 2a. Compress ModR/M (+SIB & displacement) memory-only with opcode extension.
50 Instruction: f0 ff 00 lock incl (%eax)
51 ...
52 Instruction: f0 ff 84 ff 00 00 00 00 lock incl 0x0(%edi,%edi,8)
53 becomes
54 Instruction: f0 ff XX/1 lock decl [memory]
55
56 Only applies if all possibile memory accesses are accepted by validator.
57
58 2b. Compress ModR/M (+SIB & displacement) register-only with opcode extension.
59 Instruction: 0f 71 d0 00 psrlw $0x0,%mm0
60 ...
61 Instruction: 0f 71 d7 00 psrlw $0x0,%mm7
62 becomes
63 Instruction: 66 0f 71 XX/2 00 psrlw $0x0,[%xmm0..%xmm7]
halyavin 2013/11/01 08:17:49 Shouldn't it be %mm0..%mm7?
khim 2013/11/05 15:04:36 Done.
64
65 Only applies if all possible register accesses are accepted by validator.
66
67 3. Compress register-in-opcode.
68 Instruction: d9 c0 fld %st(0)
69 ...
70 Instruction: d9 c7 fld %st(7)
71 becomes
72 Instruction: Instruction: d9 c[0..7] fld [%st(0)..%st(7)]
73
74 Only applies if all possible register accesses are accepted by validator.
75
76 4. Special compressor for "set" instruction.
77 Instruction: 0f 90 XX/0 seto [%al..%bh or memory]
78 ...
79 Instruction: 0f 90 XX/1 seto [%al..%bh or memory]
halyavin 2013/11/01 08:17:49 I don't understand what is the difference between
khim 2013/11/05 15:04:36 There are no difference. "set" ignores "reg" field
80 becomes
81 Instruction: 0f 90 XX seto [%al..%bh or memory]
82 """
83
84 import itertools
85 import lockfile
bsy 2013/10/31 22:51:54 unused imports. style nits.
khim 2013/11/05 15:04:36 Done.
86 import multiprocessing
87 import optparse
88 import os
89 import re
90 import subprocess
91 import sys
92 import tempfile
93 import traceback
94
95 import dfa_parser
96 import dfa_traversal
97 import objdump_parser
98 import validator
99 import spec
100
101
102 # Register names in 'natual' order (as defined by IA32/x86-64 ABI)
bsy 2013/10/31 22:51:54 are these equivalence classes? used for... compre
khimg 2013/10/31 23:08:49 These are just names of registers. They are used..
103 REGISTERS = {
104 'al': [ 'al', 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh' ],
105 'spl': [ 'al', 'cl', 'dl', 'bl', 'spl', 'bpl', 'sil', 'dil' ],
106 'ax': [ 'ax', 'cx', 'dx', 'bx', 'sp', 'bp', 'si', 'di' ],
107 'eax': [ 'eax', 'ecx', 'edx', 'ebx', 'esp', 'ebp', 'esi', 'edi' ],
108 'rax': [ 'rax', 'rcx', 'rdx', 'rbx', 'rsp', 'rbp', 'rsi', 'rdi' ],
109 'r8b': [ 'r{}b'.format(N) for N in range(8,16) ],
110 'r8w': [ 'r{}w'.format(N) for N in range(8,16) ],
111 'r8d': [ 'r{}d'.format(N) for N in range(8,16) ],
112 'r8': [ 'r{}'.format(N) for N in range(8,16) ],
113 'mm0': [ 'mm{}'.format(N) for N in range(8) ],
114 'st(0)': [ 'st({})'.format(N) for N in range(8) ],
115 'xmm0': [ 'xmm{}'.format(N) for N in range(8) ],
116 'xmm8': [ 'xmm{}'.format(N) for N in range(8,16) ],
117 'ymm0': [ 'ymm{}'.format(N) for N in range(8) ],
118 'ymm8': [ 'ymm{}'.format(N) for N in range(8,16) ]
119 }
120
121
122 NOP = 0x90
123
124
125 def PadToBundleSize(bytes):
126 assert len(bytes) <= validator.BUNDLE_SIZE
127 return bytes + [NOP] * (validator.BUNDLE_SIZE - len(bytes))
128
129
130 ACCEPTABLE_X86_64_INPUTS = {
131 0x00001: 'input_rr=%eax',
132 0x00002: 'input_rr=%ecx',
133 0x00004: 'input_rr=%edx',
134 0x00008: 'input_rr=%ebx',
135 0x00010: 'input_rr=%esp',
136 0x00020: 'input_rr=%ebp',
137 0x00040: 'input_rr=%esi',
138 0x00080: 'input_rr=%edi',
139 0x00100: 'input_rr=%r8d',
140 0x00200: 'input_rr=%r9d',
141 0x00400: 'input_rr=%r10d',
142 0x00800: 'input_rr=%r11d',
143 0x01000: 'input_rr=%r12d',
144 0x02000: 'input_rr=%r13d',
145 0x04000: 'input_rr=%r14d',
146 0x08000: 'input_rr=%r15d',
147 0x1ffcf: 'input_rr=any_nonspecial'
148 }
149
150
151 def ValidateInstruction(instruction, validator_inst):
152 bundle = ''.join(map(chr, PadToBundleSize(instruction)))
153 if options.bitness == 32:
154 result = validator_inst.ValidateChunk(bundle, bitness=32)
155 return result
156 else:
157 valid_inputs = 0
158 known_final_rr = False
159 bit_position = 1
160 for bit, initial_rr in enumerate(validator.ALL_REGISTERS + [None]):
bsy 2013/10/31 22:51:54 bit is not used, since bit_position = 1 << bit is
khim 2013/11/05 15:04:36 Because I forgot to remove "bit" when I've switche
161 valid, final_rr = validator_inst.ValidateAndGetFinalRestrictedRegister(
162 bundle, len(instruction), initial_rr)
163 if valid:
164 valid_inputs |= bit_position
165 # "None" here means there are no restricted register, "False" means we
166 # have no seen anything yet.
167 assert known_final_rr is False or known_final_rr == final_rr
halyavin 2013/11/01 08:17:49 We can use valid_inputs == 0 instead of known_fina
khim 2013/11/05 15:04:36 Done.
168 known_final_rr = final_rr
169 bit_position += bit_position
170 # If nothing is accepted then instruction is not valid. Easy and simple.
171 if valid_inputs == 0: return False
172 # Format output register
173 if known_final_rr is None:
halyavin 2013/11/01 08:17:49 Possible improvement: we can extract converting re
khim 2013/11/05 15:04:36 I'm not sure two-line function used just once will
174 output_rr = 'output_rr=None'
175 elif known_final_rr < validator.REG_R8:
176 output_rr = 'output_rr=%' + REGISTERS['eax'][known_final_rr]
bsy 2013/10/31 22:51:54 why not a single 16 element array?
khimg 2013/10/31 23:08:49 Because this information will eventually go into t
177 else:
178 output_rr = 'output_rr=%' + REGISTERS['r8d'][known_final_rr - 8]
bsy 2013/10/31 22:51:54 Since 2nd half of returned tuple from ValidateAndG
khimg 2013/10/31 23:08:49 Uhm... You are looking on said post-condition, rig
179 return [ACCEPTABLE_X86_64_INPUTS[valid_inputs], output_rr]
180
181
182 class WorkerState(object):
183 def __init__(self, prefix, validator):
184 self.total_instructions = 0
185 self.num_valid = 0
186 self.validator = validator
187 self.output = set()
188
189 def ReceiveInstruction(self, bytes):
190 self.total_instructions += 1
191 result = ValidateInstruction(bytes, self.validator)
192 if result is not False:
193 self.num_valid += 1
194 dis = self.validator.DisassembleChunk(
195 ''.join(map(chr, bytes)),
196 bitness=options.bitness)
197 for line_nr in xrange(len(dis)):
198 dis[line_nr] = str(dis[line_nr])
199 assert dis[line_nr][0:17] == 'Instruction(0x' + str(line_nr) + ': '
200 assert dis[line_nr][-1:] == ')'
201 dis[line_nr] = dis[line_nr][17:-1]
202 if '(%rip)' in dis[0]:
203 dis[0] = re.sub(' # 0x[ 0-9]*', '', dis[0])
204 dis[0] = 'Instruction: ' + dis[0]
205 if result is not True:
206 dis += result
207 self.output.add('; '.join(dis))
208
209 # Compressor has three slots: regex (which picks apart given instruction),
210 # subst (which is used to denote compressed version) and replacements (which
211 # are used to generate set of instructions from a given code).
212 #
213 # Example compressor:
214 # regex = '.*?[0-9a-fA-F]([0-7]) \\w* (%e(?:[abcd]x|[sb]p|[sd]i)).*()'
halyavin 2013/11/01 08:34:29 ".*?" I don't understand to what expression "?" ap
khim 2013/11/05 15:04:36 *? is non-greedy *, look it up in python's manual
215 # subst = ('[0-7]', '[%eax..%esi]', ' # register in opcode')
bsy 2013/10/31 22:51:54 7 is %edi, so this example is confusing.
khimg 2013/10/31 23:08:49 It's a typo :-( Sorry for confusion.
216 # replacements = ((0, '%eax'), (1, '%ecx'), (2, '%edx'), (3, '%ebx')
217 # (4, '%esp'), (5, '%ebp'), (6, '%esi'), (7, '%edi')
halyavin 2013/11/01 08:34:29 Add square brackets: replacements = [(0, '%eax'),.
khim 2013/11/05 15:04:36 It's a tuple, why would I need square brackets her
218 #
219 # When faced with instriuction '40 inc %eax' it will capture the following
220 # pieces of said instruction: '4[0] inc [%eax]'.
221 #
222 # Then it will produce the following eigth instructions:
halyavin 2013/11/01 08:17:49 eight
khim 2013/11/05 15:04:36 Done.
223 # '40 inc %eax'
224 # '41 inc %ecx'
225 # '42 inc %edx'
226 # '43 inc %ebx'
227 # '44 inc %esp'
228 # '45 inc %ebp'
229 # '46 inc %esi'
230 # '47 inc %edi'
231 #
232 # If all these instructions can be found in a set of instructions then
233 # compressor will remove them from said set and will insert one replacement
234 # "compressed instruction" '4[0-7] inc [%eax..%esi] # register in opcode'.
bsy 2013/10/31 22:51:54 why isn't this [%eax..%edi]?
khimg 2013/10/31 23:08:49 Copy-paste of a typo :-( Will fix that.
235 #
236 # Note that last group is only used in the replacement. It's used to grab marks
237 # added by previous compressors and to replace them with a new mark.
238 class Compressor(object):
239 __slots__ = [
240 'regex',
241 'subst',
242 'replacements'
243 ]
244
245 def __init__(self, regex, subst, replacements=None):
246 self.regex = regex
247 self.subst = subst
248 self.replacements = [] if replacements is None else replacements
249
250
251 def Compressed(instructions):
252 for instruction in sorted(instructions):
253 for compressor in compressors:
254 match = compressor.regex.match(instruction)
255 if match:
256 pos = 0
257 format = ''
halyavin 2013/11/01 08:34:29 Rename to format_str to avoid confusion with the f
khim 2013/11/05 15:04:36 Done, although I'm not sure how can there be any c
258 for group in range(1, len(match.groups())):
259 format += instruction[pos:match.start(group)] + '{}'
260 pos = match.end(group)
261 format += instruction[pos:match.start(len(match.groups()))]
262 subset = set()
263 for replacement in compressor.replacements:
264 subset.add(format.format(*replacement))
265 if subset <= instructions:
266 instructions -= subset
267 instructions.add((format + '{}').format(*compressor.subst))
268 return Compressed(instructions)
269 return instructions
270
271
272 def Worker((prefix, state_index)):
273 worker_state = WorkerState(prefix, worker_validator)
274
275 try:
276 dfa_traversal.TraverseTree(
277 dfa.states[state_index],
278 final_callback=worker_state.ReceiveInstruction,
279 prefix=prefix,
280 anyfield=0)
281 if (prefix[0] != 0x0f or prefix[1] != 0x0f): # Skip 3DNow! instructions
282 worker_state.output = Compressed(set(worker_state.output))
283 except Exception as e:
284 traceback.print_exc() # because multiprocessing imap swallows traceback
285 raise
286
287 return (
288 prefix,
289 worker_state.total_instructions,
290 worker_state.num_valid,
291 worker_state.output)
292
293
294 def ParseOptions():
295 parser = optparse.OptionParser(usage='%prog [options] xmlfile')
296
297 parser.add_option('--bitness',
298 type=int,
299 help='The subarchitecture: 32 or 64')
300 parser.add_option('--validator_dll',
301 help='Path to librdfa_validator_dll')
302 parser.add_option('--decoder_dll',
303 help='Path to librdfa_decoder_dll')
304
305 options, args = parser.parse_args()
306
307 if options.bitness not in [32, 64]:
308 parser.error('specify --bitness 32 or --bitness 64')
309
310 if len(args) != 1:
311 parser.error('specify one xml file')
312
313 (xml_file,) = args
314
315 return options, xml_file
316
317
318 # Version suitable for use in regular expressions
319 REGISTERS_RE = REGISTERS.copy()
320 REGISTERS_RE['st(0)'] = [ 'st\\({}\\)'.format(N) for N in range(8) ]
321 REGISTERS_RE['st\\(0\\)'] = REGISTERS_RE['st(0)']
322
323 # Index names in 'natual' order (as defined by IA32/x86-64 ABI)
324 INDEXES = {
325 'eax': [ 'eax', 'ecx', 'edx', 'ebx', 'eiz', 'ebp', 'esi', 'edi' ],
326 'rax': [ 'rax', 'rcx', 'rdx', 'rbx', 'riz', 'rbp', 'rsi', 'rdi' ],
327 'r8': [ 'r8', 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15' ]
328 }
329 # Register which can not be used as base in 64-bit mode in all incarnations
330 X86_64_BASE_REGISTERS = set([
331 '%spl', '%bpl', '%r15b', '%sp', '%bp', '%r15w',
332 '%esp', '%ebp', '%r15d', '%rsp', '%rbp', '%r15',
333 '%rip'
334 ])
335
336 def AddModRM_Compressor(regex, subst, subst_register, subst_memory,
337 reg=None, rm=None, rm_to_reg=False, start_byte=0,
338 index_r8=False, input_rr=True, output_rr=False):
339 """Adds three compressors to the list of compressors:
340 main_compressors (register <-> register or memory instructions)
341 register_compressors (register <-> register instructions)
342 memory_compressors (regsiter <-> memory instructions)
343
344 Args:
345 regex: regular expressions for the compressor
346 subst: replacement for register <-> register or memory instructions
347 subst_register: replacement for register <-> register instructions
348 subst_memory: replacement for regsiter <-> memory instructions
349 reg: reg operand kind (see REGISTERS array) or None
350 rm: rm operand kind (see REGISTERS array)
351 rm_to_reg: three-state selector
352 True - instruction uses rm as source, reg as destination
halyavin 2013/11/01 14:08:57 "rm_to_reg", "reg_to_rm", "xchg".
khim 2013/11/06 14:25:15 Done.
353 False - instruction uses reg as source, rm as destination
354 None - instruction either uses both symmetrically (e.g. test or xchg)
355 start_byte: first valid byte ModR/M byte (used when reg is None)
356 input_rr: True if instruction accesses memory
357 output_rr: three-state selector
358 True - instruction can be used to produce "restricted register"
halyavin 2013/11/01 14:08:57 "sandboxing writes", "no GP register writes", "non
khim 2013/11/06 14:25:15 Done.
359 False - instruction does not affect it's operands (e.g. test)
360 None - instruction can damage output but can not be used to restrict it
361 Internal:
362 index_r8: must be called in False position (used to create two compressors
363 in 64-bit mode with index == %rax..%rdi or index == %r8..%r14)
364 Returns:
365 None
366 """
367
368 if options.bitness == 32:
369 base = 'eax'
370 index = 'eax'
371 expanded_regex = re.sub('{RR_NOTES}', '', regex)
372 else:
373 base = 'r8' if rm[0:2] == 'r8' else 'rax'
374 index = 'r8' if index_r8 else 'rax'
375 input = 'r8d' if index_r8 else 'eax'
376 if output_rr:
377 output_regs = reg if rm_to_reg else rm
378 assert output_regs in ('eax', 'r8d')
379 expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{'+ input +
380 '}|any_nonspecial)); output_rr=(%{' + output_regs + '}|None)', regex)
381 else:
382 expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{' + input +
383 '}|any_nonspecial)); output_rr=(None)', regex)
384 if 'RM_BYTE' in regex:
385 address_regex = '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?\\))'
386 else:
387 address_regex = (
388 '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?(?:,(?:%{' + index + '}))?'
389 '(?:,(?:1|2|4|8))?\\))')
390
391 # We need to process either modrm or reg
392 assert rm is not None or reg is not None
393 # If both modrm and reg are given then ModR/M
394 assert reg is None or start_byte == 0
395 # Replace RM_BYTE placeholders.
396 # Handle only cases without displacement.
397 expanded_regex = re.sub('{RM_BYTE}', '[0-9a-fA-F][0-9a-fA-F]', expanded_regex)
398 expanded_regex = re.sub('{RM_BYTE/0}', '[048cC][0-7]', expanded_regex)
399 expanded_regex = re.sub('{RM_BYTE/1}', '[048cC][89a-fA-F]', expanded_regex)
400 expanded_regex = re.sub('{RM_BYTE/2}', '[159dD][0-7]', expanded_regex)
401 expanded_regex = re.sub('{RM_BYTE/3}', '[159dD][89a-fA-F]', expanded_regex)
402 expanded_regex = re.sub('{RM_BYTE/4}', '[26aAeE][0-7]', expanded_regex)
403 expanded_regex = re.sub('{RM_BYTE/5}', '[26aAeE][89a-fA-F]', expanded_regex)
404 expanded_regex = re.sub('{RM_BYTE/6}', '[37bBfF][0-7]', expanded_regex)
405 expanded_regex = re.sub('{RM_BYTE/7}', '[37bBfF][89a-fA-F]', expanded_regex)
406 register_regex = expanded_regex
407 # Replace RM_SIB_BYTES placeholders.
408 # Handle only cases without displacement.
409 expanded_regex = re.sub(
410 '{RM_SIB_BYTES}', '[0-b][4c] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
411 expanded_regex = re.sub(
412 '{RM_SIB_BYTES/0}', '[048]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
413 expanded_regex = re.sub(
414 '{RM_SIB_BYTES/1}', '[048][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
415 expanded_regex = re.sub(
416 '{RM_SIB_BYTES/2}', '[159]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
417 expanded_regex = re.sub(
418 '{RM_SIB_BYTES/3}', '[159][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
419 expanded_regex = re.sub(
420 '{RM_SIB_BYTES/4}', '[26aA]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
421 expanded_regex = re.sub(
422 '{RM_SIB_BYTES/5}', '[26aA][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
423 expanded_regex = re.sub(
424 '{RM_SIB_BYTES/6}', '[37bB]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
425 expanded_regex = re.sub(
426 '{RM_SIB_BYTES/7}', '[37bB][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
427 register_regex = re.sub(
428 '{RM_SIB_BYTES}', '[c-fC-F][0-9a-fA-F]', register_regex)
429 register_regex = re.sub('{RM_SIB_BYTES/0}', '[cC][0-7]', register_regex)
430 register_regex = re.sub('{RM_SIB_BYTES/1}', '[cC][8-9a-fA-F]', register_regex)
431 register_regex = re.sub('{RM_SIB_BYTES/2}', '[dD][0-7]', register_regex)
432 register_regex = re.sub('{RM_SIB_BYTES/3}', '[dD][8-9a-fA-F]', register_regex)
433 register_regex = re.sub('{RM_SIB_BYTES/4}', '[eE][0-7]', register_regex)
434 register_regex = re.sub('{RM_SIB_BYTES/5}', '[eE][8-9a-fA-F]', register_regex)
435 register_regex = re.sub('{RM_SIB_BYTES/6}', '[fF][0-7]', register_regex)
436 register_regex = re.sub('{RM_SIB_BYTES/7}', '[fF][8-9a-fA-F]', register_regex)
437 # Replace register placeholders
438 for register, value in REGISTERS_RE.iteritems():
439 expanded_regex = re.sub('{%' + register + '}',
440 '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex)
441 register_regex = re.sub('{%' + register + '}',
442 '(?:%' + '|%'.join(value) +')', register_regex)
443 for register, value in REGISTERS_RE.iteritems():
444 expanded_regex = re.sub('{' + register + '}',
445 '(?:' + '|'.join(value) + ')', expanded_regex)
446 register_regex = re.sub('{' + register + '}',
447 '(?:' + '|'.join(value) + ')', register_regex)
448 expanded_regex = re.compile(expanded_regex)
449 register_regex = re.compile(register_regex)
450 # Add index_rr and output_rr fields if we are dealing with 64-bit case
451 if options.bitness == 32:
452 subst_fixed = subst
453 subst_register_fixed = subst_register
454 subst_memory_fixed = subst_memory
455 else:
456 if input_rr:
457 input_note = '[%eax..%edi]' if index == 'rax' else '[%r8d..%r15d]'
458 else:
459 input_note = 'any_nonspecial'
460 if output_rr:
461 output_note = '[%eax..%edi]' if output_regs == 'eax' else '[%r8d..%r14d]'
462 else:
463 output_note = None
464 subst_fixed = subst[0:-1] + (input_note, output_note) + subst[-1:]
465 subst_register_fixed = (
466 subst_register[0:-1] + (input_note, output_note) + subst_register[-1:])
467 subst_memory_fixed = (
468 subst_memory[0:-1] + (input_note, output_note) + subst_memory[-1:])
469 # If we already have replacements in cache then wejust reuse them.
470 output_key = (reg, rm, rm_to_reg, start_byte, index_r8, input_rr, output_rr)
471 if output_key in AddModRM_Compressor.replacements:
472 replacements = AddModRM_Compressor.replacements[output_key]
473 main_compressors.append(
474 Compressor(expanded_regex, subst_fixed, replacements[0]))
475 register_compressors.append(
476 Compressor(register_regex, subst_register_fixed, replacements[1]))
477 memory_compressors.append(
478 Compressor(expanded_regex, subst_memory_fixed, replacements[2]))
479 if options.bitness == 64 and not index_r8:
480 AddModRM_Compressor(
481 regex, subst, subst_register, subst_memory,
482 reg=reg, rm=rm, rm_to_reg=rm_to_reg, start_byte=start_byte,
483 index_r8=True, input_rr=input_rr, output_rr=output_rr)
484 return
485 # It can be memory only instruction, register only one or both
486 main_compressor = Compressor(expanded_regex, subst_fixed)
487 register_compressor = Compressor(register_regex, subst_register_fixed)
488 memory_compressor = Compressor(expanded_regex, subst_memory_fixed)
489
490 # Generation time!
491 if reg is None:
492 # reg field is used as opcode extension
493 byte_range = [byte for byte in xrange(256) if byte & 0x38 == start_byte]
494 else:
495 byte_range = xrange(256)
496
497 for modrm in byte_range:
498 # Parse ModRM
499 mod_field = (modrm & 0xc0) >> 6
500 reg_field = (modrm & 0x38) >> 3
501 rm_field = (modrm & 0x07)
502 if reg is not None:
503 reg_text = '%' + REGISTERS[reg][reg_field]
504 # If mod == 3 then it's register-to-register instruction
505 if mod_field == 3:
506 bytes = '{:02x}'.format(modrm)
507 rm_text = '%' + REGISTERS[rm][rm_field]
508 replacement = [bytes]
509 if reg is None:
510 replacement.append(rm_text)
511 else:
512 replacement.append(rm_text if rm_to_reg else reg_text)
513 replacement.append(reg_text if rm_to_reg else rm_text)
514 if options.bitness == 64:
515 replacement.append('any_nonspecial')
516 output = reg_text if rm_to_reg else rm_text
517 if output_rr:
518 replacement.append(output)
519 else:
520 replacement.append(None)
521 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
522 if output_rr is True and output == '%r15d': continue
523 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
524 replacement = tuple(replacement)
525 main_compressor.replacements.append(replacement)
526 register_compressor.replacements.append(replacement)
527 # If mod != 3 then it's register-to-memory instruction
528 else:
529 # If RM field != %rsp then there are no index
530 if rm_field != validator.REG_RSP:
531 base_text = '%' + REGISTERS[base][rm_field]
532 # If RM field == %rbp and MOD fiels is zero then it's absolute address
533 if mod_field == 0 and rm_field == validator.REG_RBP:
534 bytes = '{:02x} 00 00 00 00'.format(modrm)
535 rm_text = '0x0' if options.bitness == 32 else '0x0(%rip)'
536 base_text = '%rip'
537 # Memory access with just a base register
538 elif mod_field == 0:
539 bytes = '{:02x}'.format(modrm)
540 rm_text = '({})'.format(base_text)
541 # Memory access with base and 8bit offset
542 elif mod_field == 1:
543 bytes = '{:02x} 00'.format(modrm)
544 rm_text = '0x0({})'.format(base_text)
545 # Memory access with base and 32bit offset
546 else: # mod_field == 2
547 bytes = '{:02x} 00 00 00 00'.format(modrm)
548 rm_text = '0x0({})'.format(base_text)
549 replacement = [bytes]
550 if reg is None:
551 replacement.append(rm_text)
552 else:
553 replacement.append(rm_text if rm_to_reg else reg_text)
554 replacement.append(reg_text if rm_to_reg else rm_text)
555 if options.bitness == 64:
556 replacement.append('any_nonspecial')
557 output = reg_text if rm_to_reg else None
558 if output_rr:
559 replacement.append(output)
560 else:
561 replacement.append(None)
562 if input_rr and base_text not in X86_64_BASE_REGISTERS: continue
563 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
564 if output_rr is True and output == '%r15d': continue
565 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
566 replacement = tuple(replacement)
567 main_compressor.replacements.append(replacement)
568 memory_compressor.replacements.append(replacement)
569 else:
570 # If RM field == %rsp then we have SIB byte
571 for sib in xrange(256):
572 scale_field = (sib & 0xc0) >> 6
573 index_field = (sib & 0x38) >> 3
574 base_field = (sib & 0x07)
575 index_text = '%' + INDEXES[index][index_field]
576 base_text = '%' + REGISTERS[base][base_field]
577 scale_text = pow(2, scale_field)
578 # If BASE is %rbp and MOD == 0 then index with 32bit offset is used
579 if mod_field == 0 and base_field == validator.REG_RBP:
580 bytes = '{:02x} {:02x} 00 00 00 00'.format(modrm, sib)
581 if (options.bitness == 32 or
582 index_field != validator.REG_RSP or
583 scale_field != 0):
584 rm_text = '0x0(,{},{})'.format(index_text, scale_text)
585 else:
586 rm_text = '0x0'
587 base_text = ''
588 # Memory access with base and index (no offset)
589 elif mod_field == 0:
590 bytes = '{:02x} {:02x}'.format(modrm, sib)
591 rm_text = '({},{},{})'.format(base_text, index_text, scale_text)
592 # Memory access with base, index and 8bit offset
593 elif mod_field == 1:
594 bytes = '{:02x} {:02x} 00'.format(modrm, sib)
595 rm_text = '0x0({},{},{})'.format(base_text, index_text, scale_text)
596 # Memory access with base, index and 32bit offset
597 elif mod_field == 2:
598 bytes = '{:02x} {:02x} 00 00 00 00'.format(modrm, sib)
599 rm_text = '0x0({},{},{})'.format(base_text, index_text, scale_text)
600 # Pretty-printing of access via %rsp
601 if (scale_field == 0 and index != 'r8' and
602 base_field == validator.REG_RSP and
603 index_field == validator.REG_RSP):
604 #index_text = 'any_nonspecial'
605 rm_text = ('0x0({})' if mod_field else '({})').format(base_text)
606 if index_text == "%riz":
607 index_text = 'any_nonspecial'
608 replacement = [bytes]
609 if reg is None:
610 replacement.append(rm_text)
611 else:
612 replacement.append(rm_text if rm_to_reg else reg_text)
613 replacement.append(reg_text if rm_to_reg else rm_text)
614 if options.bitness == 64:
615 if not input_rr or index_text == 'any_nonspecial':
616 replacement.append('any_nonspecial')
617 else:
618 replacement.append('%' + REGISTERS[input][index_field])
619 output = reg_text if rm_to_reg else None
620 replacement.append(output if output_rr else None)
621 if input_rr:
622 if base_text not in X86_64_BASE_REGISTERS: continue
623 if index_text in X86_64_BASE_REGISTERS - set(['%r15']): continue
624 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
625 if output_rr is True and output == '%r15d': continue
626 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
627 replacement = tuple(replacement)
628 main_compressor.replacements.append(replacement)
629 memory_compressor.replacements.append(replacement)
630
631 assert len(main_compressor.replacements) > 1
632 assert len(register_compressor.replacements) > 1
633 assert len(memory_compressor.replacements) > 1
634 main_compressor.replacements = tuple(main_compressor.replacements)
635 register_compressor.replacements = tuple(register_compressor.replacements)
636 memory_compressor.replacements = tuple(memory_compressor.replacements)
637 main_compressors.append(main_compressor)
638 register_compressors.append(register_compressor)
639 memory_compressors.append(memory_compressor)
640 AddModRM_Compressor.replacements[output_key] = (
641 main_compressor.replacements,
642 register_compressor.replacements,
643 memory_compressor.replacements
644 )
645 if options.bitness == 64 and not index_r8:
646 AddModRM_Compressor(
647 regex, subst, subst_register, subst_memory,
648 reg=reg, rm=rm, rm_to_reg=rm_to_reg, start_byte=start_byte,
649 index_r8=True, input_rr=input_rr, output_rr=output_rr)
650 # Replacements cache.
651 AddModRM_Compressor.replacements = {}
652
653
654 def PrepareCompressors():
655 global compressors
656 global main_compressors
657 global register_compressors
658 global memory_compressors
659
660 # "Larger" compressors should be tried first, then "smaller" ones.
661 main_compressors = []
662 register_compressors = []
663 memory_compressors = []
664 extra_compressors = []
665
666 if options.bitness == 32:
667 register_kinds = ('al', 'ax', 'eax', 'mm0', 'xmm0', 'ymm0')
668 register_kind_pairs = (
669 ( 'al', 'al'),
670 ( 'ax', 'al'),
671 ( 'ax', 'ax'),
672 ( 'eax', 'al'),
673 ( 'eax', 'ax'),
674 ( 'eax', 'eax'),
675 ( 'eax', 'mm0'),
676 ( 'mm0', 'eax'),
677 ( 'eax', 'xmm0'),
678 ('xmm0', 'eax'),
679 ( 'mm0', 'mm0'),
680 ( 'mm0', 'xmm0'),
681 ('xmm0', 'mm0'),
682 ('xmm0', 'xmm0'),
683 ('xmm0', 'ymm0'),
684 ('ymm0', 'xmm0'),
685 ('ymm0', 'ymm0')
686 )
687 else:
688 register_kinds = ('al', 'spl', 'ax', 'eax', 'mm0', 'xmm0', 'ymm0',
689 'r8b', 'r8w', 'r8d', 'r8', 'xmm8', 'ymm8')
690 register_kind_pairs = (
691 ( 'al', 'al'),
692 ( 'spl', 'spl'), ( 'spl', 'r8b'), ( 'r8b', 'spl'), ( 'r8b', 'r8b'),
693 ( 'ax', 'al'),
694 ( 'ax', 'spl'), ( 'ax', 'r8b'), ( 'r8w', 'spl'), ( 'r8w', 'r8b'),
695 ( 'ax', 'ax'), ( 'ax', 'r8w'), ( 'r8w', 'ax'), ( 'r8w', 'r8w'),
696 ( 'eax', 'al'),
697 ( 'eax', 'spl'), ( 'eax', 'r8b'), ( 'r8d', 'spl'), ( 'r8d', 'r8b'),
698 ( 'eax', 'ax'), ( 'eax', 'r8w'), ( 'r8d', 'ax'), ( 'r8d', 'r8w'),
699 ( 'eax', 'eax'), ( 'eax', 'r8d'), ( 'r8d', 'eax'), ( 'r8d', 'r8d'),
700 ( 'rax', 'al'),
701 ( 'rax', 'spl'), ( 'rax', 'r8b'), ( 'r8', 'spl'), ( 'r8', 'r8b'),
702 ( 'rax', 'rax'), ( 'rax', 'r8'), ( 'r8', 'rax'), ( 'r8', 'r8'),
703 ( 'eax', 'mm0'), ( 'r8d', 'mm0'),
704 ( 'rax', 'mm0'), ( 'r8', 'mm0'),
705 ( 'mm0', 'eax'), ( 'mm0', 'r8d'),
706 ( 'mm0', 'rax'), ( 'mm0', 'r8'),
707 ( 'eax', 'xmm0'), ( 'eax', 'xmm8'), ( 'r8d', 'xmm0'), ( 'r8d', 'xmm8'),
708 ( 'rax', 'xmm0'), ( 'rax', 'xmm8'), ( 'r8', 'xmm0'), ( 'r8', 'xmm8'),
709 ('xmm0', 'eax'), ('xmm0', 'r8d'), ('xmm8', 'eax'), ('xmm8', 'r8d'),
710 ('xmm0', 'rax'), ('xmm0', 'r8'), ('xmm8', 'rax'), ('xmm8', 'r8'),
711 ( 'mm0', 'mm0'),
712 ( 'mm0', 'xmm0'), ( 'mm0', 'xmm8'),
713 ('xmm0', 'mm0'), ('xmm8', 'mm0'),
714 ('xmm0', 'xmm0'), ('xmm0', 'xmm8'), ('xmm8', 'xmm0'), ('xmm8', 'xmm8'),
715 ('xmm0', 'ymm0'), ('xmm0', 'ymm8'), ('xmm8', 'ymm0'), ('xmm8', 'ymm8'),
716 ('ymm0', 'xmm0'), ('ymm0', 'xmm8'), ('ymm8', 'xmm0'), ('ymm8', 'xmm8'),
717 ('ymm0', 'ymm0'), ('ymm0', 'ymm8'), ('ymm8', 'ymm0'), ('ymm8', 'ymm8')
718 )
719
720 # Largest compressors: both reg and rm fields are used
721 for reg, rm in register_kind_pairs:
722 start_reg = REGISTERS[reg][0]
723 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
724 start_rm = REGISTERS[rm][0]
725 end_rm = REGISTERS[rm][-1 if rm[0:2] != 'r8' else -2]
726 # First instruction uses just ModR/M byte in 32bit mode but both
727 # ModR/M in 64bit mode. Both approaches will work in both cases,
728 # this is just an optimization to avoid needless work.
729 if options.bitness == 32:
730 bytes = '({RM_BYTE})'
731 else:
732 bytes = '({RM_SIB_BYTES})'
733 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
734 # Normal instructions with two operands (reg to rm).
735 if options.bitness == 64 and rm in ('eax', 'r8d'):
736 # Zero-extending version first
737 AddModRM_Compressor(
738 '.*?' + bytes + extra_bytes +
739 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
740 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
741 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
742 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
743 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
744 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
745 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
746 reg=reg, rm=rm, rm_to_reg=False, output_rr=True)
747 # Zero-extending xchg/xadd
748 AddModRM_Compressor(
749 '.*?' + bytes + extra_bytes +
750 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
751 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
752 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
753 '[%{}..%{} or memory]'.format(start_rm, end_rm),
754 ' # write to both'),
755 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
756 '[%{}..%{}]'.format(start_rm, end_rm),
757 ' # reg to rm; write to both'),
758 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]',
759 ' # write to both'),
760 reg=reg, rm=rm, rm_to_reg=None, output_rr=True)
761 if options.bitness == 64 and rm in ('al', 'spl', 'ax', 'eax', 'rax',
762 'r8b', 'r8w', 'r8d', 'r8'):
763 # Dangerous next
764 AddModRM_Compressor(
765 '.*?' + bytes + extra_bytes +
766 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
767 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
768 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
769 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
770 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
771 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
772 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
773 reg=reg, rm=rm, rm_to_reg=False, output_rr=None)
774 # Dangerous xchg/xadd
775 AddModRM_Compressor(
776 '.*?' + bytes + extra_bytes +
777 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
778 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
779 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
780 '[%{}..%{} or memory]'.format(start_rm, end_rm),
781 ' # write to both'),
782 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
783 '[%{}..%{}]'.format(start_rm, end_rm),
784 ' # reg to rm; write to both'),
785 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]',
786 ' # write to both'),
787 reg=reg, rm=rm, rm_to_reg=None, output_rr=None)
788 # Now normal version
789 AddModRM_Compressor(
790 '.*?' + bytes + extra_bytes +
791 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
792 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
793 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
794 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
795 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
796 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
797 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
798 reg=reg, rm=rm, rm_to_reg=False)
799 # Normal instructions with two operands (rm to reg).
800 if options.bitness == 64 and reg in ('eax', 'r8d'):
801 # Zero-extending version first
802 AddModRM_Compressor(
803 '.*?' + bytes + extra_bytes +
804 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
805 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
806 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
807 '[%{}..%{}]'.format(start_reg, end_reg), ''),
808 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
809 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
810 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
811 reg=reg, rm=rm, rm_to_reg=True, output_rr=True)
812 if options.bitness == 64 and reg in ('al', 'spl', 'ax', 'eax', 'rax',
813 'r8b', 'r8w', 'r8d', 'r8'):
814 # Dangerous next
815 AddModRM_Compressor(
816 '.*?' + bytes + extra_bytes +
817 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
818 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
819 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
820 '[%{}..%{}]'.format(start_reg, end_reg), ''),
821 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
822 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
823 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
824 reg=reg, rm=rm, rm_to_reg=True, output_rr=None)
825 # Now normal version
826 AddModRM_Compressor(
827 '.*?' + bytes + extra_bytes +
828 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
829 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
830 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
831 '[%{}..%{}]'.format(start_reg, end_reg), ''),
832 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
833 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
834 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
835 reg=reg, rm=rm, rm_to_reg=True)
836 # 3DNow! instructions. Additional byte is opcode extension.
837 AddModRM_Compressor(
838 '.*?' + bytes + ' [0-9a-fA-F][0-9a-fA-F] \\w* '
839 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
840 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
841 '[%{}..%{}]'.format(start_reg, end_reg), ''),
842 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
843 '[%{}..%{}]'.format(start_reg, end_reg), ' # reg to rm'),
844 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
845 reg=reg, rm=rm, rm_to_reg=True)
846
847 # Smaller compressors: only rm field is used.
848 for rm in register_kinds:
849 start_rm = REGISTERS[rm][0]
850 end_rm = REGISTERS[rm][-1 if rm[0:2] != 'r8' else -2]
851 for opcode in range(8):
852 # First instruction uses just ModR/M byte in 32bit mode but both
853 # ModR/M in 64bit mode. Both approaches will work in both cases,
854 # this is just an optimization to avoid needless work.
855 if options.bitness == 32:
856 bytes = '({RM_BYTE/' + str(opcode) + '})'
857 else:
858 bytes = '({RM_SIB_BYTES/' + str(opcode) + '})'
859 if options.bitness == 64:
860 # No memory access (e.g. prefetch)
861 AddModRM_Compressor(
862 '.*?' + bytes + ' ?\\w* (?:\\$0x0,|%cl,)?({%' + rm + '}).*'
863 '{RR_NOTES}()',
864 ('XX/' + str(opcode),
865 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
866 ('XX/' + str(opcode), '[%{}..%{}]'.format(start_rm, end_rm), ''),
867 ('XX/' + str(opcode), '[memory]', ''),
868 reg=None, rm=rm, input_rr=False, start_byte=opcode*8)
869 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
870 # Part of opcode is encoded in ModR/M
871 AddModRM_Compressor(
872 '.*?' + bytes + extra_bytes +
873 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
874 '({%' + rm + '}).*{RR_NOTES}()',
875 ('XX/' + str(opcode),
876 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
877 ('XX/' + str(opcode), '[%{}..%{}]'.format(start_rm, end_rm), ''),
878 ('XX/' + str(opcode), '[memory]', ''),
879 reg=None, rm=rm, start_byte=opcode*8)
880
881 # Even smaller compressors: only low 3 bits of opcode are used.
882 for reg in register_kinds + ('st(0)',):
883 start_reg = REGISTERS[reg][0]
884 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
885 for opcode in range(8):
886 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
887 # Operand is encoded in opcode
888 extra_compressors.append(Compressor(re.compile(
889 '.*?[0-9a-fA-F]([0-7])' + extra_bytes +
890 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
891 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*()'),
892 ('[0..7]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
893 [('0', '%' + REGISTERS[reg][0]),
894 ('1', '%' + REGISTERS[reg][1]),
895 ('2', '%' + REGISTERS[reg][2]),
896 ('3', '%' + REGISTERS[reg][3]),
897 ('4', '%' + REGISTERS[reg][4]),
898 ('5', '%' + REGISTERS[reg][5]),
899 ('6', '%' + REGISTERS[reg][6]),
900 ('7', '%' + REGISTERS[reg][7])]))
901 extra_compressors.append(Compressor(re.compile(
902 '.*?[0-9a-fA-F]([89a-fA-F])' + extra_bytes +
903 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
904 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*()'),
905 ('[8..f]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
906 [('8', '%' + REGISTERS[reg][0]),
907 ('9', '%' + REGISTERS[reg][1]),
908 ('a', '%' + REGISTERS[reg][2]),
909 ('b', '%' + REGISTERS[reg][3]),
910 ('c', '%' + REGISTERS[reg][4]),
911 ('d', '%' + REGISTERS[reg][5]),
912 ('e', '%' + REGISTERS[reg][6]),
913 ('f', '%' + REGISTERS[reg][7])]))
914 compressors = (main_compressors + memory_compressors + register_compressors +
915 extra_compressors)
916
917 # Special compressors: will handle some cosmetic issues.
918 #
919 # SETxx ignores reg field and thus are described as many separate instructions
920 compressors.append(Compressor(
921 re.compile('.*0f 9[0-9a-fA-F] XX(/[0-7]) set.*()'), ('', ''),
922 [('/' + str(i),) for i in range(8)]))
923 # BSWAP is described with opcode "0f c8+r", not "0f /1" in manual
924 compressors.append(Compressor(
925 re.compile('.*0f (XX/1) bswap.*()'), ('c[9-f]', ''), [('XX/1',)]))
926 # "and $0xe0,[%eax..%edi]" is treated specially which means that we list all
927 # versions of and "[$0x1..$0xff],[%eax..%edi]" separately here.
928 # Without this rule these ands comprise 2/3 of the whole output!
929 compressors.append(Compressor(
930 re.compile('.*(83 e0 01 and \\$0x1,%eax)()'),
931 ('83 XX/0 00 add[l]? $0x0,[%eax..%edi or memory]', ' # special and'),
932 [('83 e{} {:02x} and $0x{:x},%{}'.format(r, i, i, REGISTERS['eax'][r]),)
933 for i in range(1, 256) for r in range(8)] +
934 [('83 XX/0 00 add[l]? $0x0,[%eax..%edi or memory]',)]))
935 # Merge memory and non-memory access
936 for letter, reg in (('b', 'al'), ('w', 'ax'), ('l', 'eax')):
937 start_reg = REGISTERS[reg][0]
938 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
939 for notes in ('', ' # rm to reg', ' # reg to rm'):
940 compressors.append(Compressor(re.compile(
941 '.* \w*(' + letter + ') .*(\\[memory]).*()()'),
942 ('[{}]?'.format(letter),
943 '[%{}..%{} or memory]'.format(start_reg, end_reg), '', ''),
944 [(letter, '[memory]', ''),
945 ('', '[%{}..%{}]'.format(start_reg, end_reg), notes)]))
946
947
948 def main():
949 # We are keeping these global to share state graph and compressors
950 # between workers spawned by multiprocess. Passing them every time is slow.
951 global options, xml_file
952 global dfa
953 global worker_validator
954 options, xml_file = ParseOptions()
955 dfa = dfa_parser.ParseXml(xml_file)
956 worker_validator = validator.Validator(
957 validator_dll=options.validator_dll,
958 decoder_dll=options.decoder_dll)
959 PrepareCompressors()
960
961 assert dfa.initial_state.is_accepting
962 assert not dfa.initial_state.any_byte
963
964 print >> sys.stderr, len(dfa.states), 'states'
965
966 num_suffixes = dfa_traversal.GetNumSuffixes(dfa.initial_state)
967
968 # We can't just write 'num_suffixes[dfa.initial_state]' because
969 # initial state is accepting.
970 total_instructions = sum(
971 num_suffixes[t.to_state]
972 for t in dfa.initial_state.forward_transitions.values())
973 print >> sys.stderr, total_instructions, 'regular instructions total'
974
975 tasks = dfa_traversal.CreateTraversalTasks(dfa.states, dfa.initial_state)
976 print >> sys.stderr, len(tasks), 'tasks'
977
978 pool = multiprocessing.Pool(processes=1)
979
980 results = pool.imap(Worker, tasks)
981
982 total = 0
983 num_valid = 0
984 full_output = set()
985 for prefix, count, valid_count, output in results:
986 print >> sys.stderr, 'Prefix:', ', '.join(map(hex, prefix))
987 total += count
988 num_valid += valid_count
989 full_output |= output
990 for instruction in sorted(Compressed(full_output)):
991 print instruction
992
993 print >> sys.stderr, total, 'instructions were processed'
994 print >> sys.stderr, num_valid, 'valid instructions'
995
996
997 if __name__ == '__main__':
998 main()
OLDNEW
« no previous file with comments | « no previous file | src/trusted/validator_ragel/testdata/32bit_regular.golden » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698