|
|
Created:
7 years, 1 month ago by khim Modified:
6 years, 10 months ago CC:
native-client-reviews_googlegroups.com Visibility:
Public. |
DescriptionRegular instructions golden file test.
compress_regular_instructions.py traverses the tree in a fashion
similar to other exhaustive tests and then "compresses" the list
of instructions found accepting by the validator.
Patch Set 1 : #
Total comments: 38
Patch Set 2 : #
Total comments: 32
Patch Set 3 : #
Total comments: 44
Patch Set 4 : #
Total comments: 8
Patch Set 5 : #
Total comments: 44
Patch Set 6 : #
Total comments: 14
Patch Set 7 : #
Total comments: 26
Patch Set 8 : #
Total comments: 10
Patch Set 9 : #Patch Set 10 : #
Total comments: 42
Patch Set 11 : #
Total comments: 6
Patch Set 12 : #
Total comments: 56
Patch Set 13 : #Patch Set 14 : #Patch Set 15 : #
Total comments: 8
Patch Set 16 : #Patch Set 17 : #
Messages
Total messages: 48 (0 generated)
PTAL 32bit version works, but 64bit version produces file too large to fit in rietvield (cmpxchg/xadd/xchg are not compressed which adds 10M to 500K file)
preliminary feedback. i am not familiar with this part of the code base, and am finding it a bit of a slog. in general, comments about pre/post conditions as well as higher level design would be appreciated. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:85: import lockfile unused imports. style nits. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:102: # Register names in 'natual' order (as defined by IA32/x86-64 ABI) are these equivalence classes? used for... compression? why are the al and spl classes separated? what is the meaning of the ordering in the value arrays? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:160: for bit, initial_rr in enumerate(validator.ALL_REGISTERS + [None]): bit is not used, since bit_position = 1 << bit is invariant maintained (comment?), so why enumerate and have it here? that validator.ALL_REGISTERS + [None] is ordered to match valid_inputs is somewhat non-obvious. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:176: output_rr = 'output_rr=%' + REGISTERS['eax'][known_final_rr] why not a single 16 element array? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:178: output_rr = 'output_rr=%' + REGISTERS['r8d'][known_final_rr - 8] Since 2nd half of returned tuple from ValidateAndGetFinalRestrictedRegister is an enum OperandName, and the enum goes up to 27 (with NO_REG=25 mapped to None), is there an invariant / post-condition that guarantees that 15 is the highest possible returned integer value? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:215: # subst = ('[0-7]', '[%eax..%esi]', ' # register in opcode') 7 is %edi, so this example is confusing. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:234: # "compressed instruction" '4[0-7] inc [%eax..%esi] # register in opcode'. why isn't this [%eax..%edi]?
Answered questions in comments, will fix code tomorrow https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:102: # Register names in 'natual' order (as defined by IA32/x86-64 ABI) On 2013/10/31 22:51:54, bsy wrote: > are these equivalence classes? used for... compression? > These are just names of registers. They are used... for different purposes. There are MANY types of registers in x86 and we need mapping between register numbers and register names in a few places (mostly for compression, but also for some other things). > why are the al and spl classes separated? This probably warrants a comment. AMD introduced strange thing: it added four "new" 8bit registers (lowest byte of %rsp, %rbp, %rsi and %rdi). To access them you need to use REX byte (the mere fact that REX byte is used turns %ah into %spl, etc). > what is the meaning of the ordering in the value arrays? They are ordered in the same order they are listed in documentation and elsewhere. E.g. %ah is register number 5 and so is %rsp. As you can see modern registers have simple mappings (%r8 is 8th register and %ymm12 is 12th register), but old registers have more-or-less arbitrary names and are ordered in some more-or-less arbitrary fashion. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:176: output_rr = 'output_rr=%' + REGISTERS['eax'][known_final_rr] On 2013/10/31 22:51:54, bsy wrote: > why not a single 16 element array? Because this information will eventually go into the textual output. Not only single 16 element array is longer (and we are trying to create small file here, right?), it's also harder to read for human (and we want an output in human-readable form, right?). https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:178: output_rr = 'output_rr=%' + REGISTERS['r8d'][known_final_rr - 8] On 2013/10/31 22:51:54, bsy wrote: > Since 2nd half of returned tuple from ValidateAndGetFinalRestrictedRegister is > an enum OperandName, and the enum goes up to 27 (with NO_REG=25 mapped to None), > is there an invariant / post-condition that guarantees that 15 is the highest > possible returned integer value? Uhm... You are looking on said post-condition, right? If function will return something above %r15d or below %eax then we'll have IndexError here and test will fail (as it should). Too subtle? Needs comment? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:215: # subst = ('[0-7]', '[%eax..%esi]', ' # register in opcode') On 2013/10/31 22:51:54, bsy wrote: > 7 is %edi, so this example is confusing. It's a typo :-( Sorry for confusion. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:234: # "compressed instruction" '4[0-7] inc [%eax..%esi] # register in opcode'. On 2013/10/31 22:51:54, bsy wrote: > why isn't this [%eax..%edi]? Copy-paste of a typo :-( Will fix that.
https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:14: Instruction: 00 00 %al,(%rax) add %al,(%rax) or may be even add %al,(%eax). https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:29: Only applies if all possibile memory accesses are accepted by validator. possible https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:63: Instruction: 66 0f 71 XX/2 00 psrlw $0x0,[%xmm0..%xmm7] Shouldn't it be %mm0..%mm7? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:79: Instruction: 0f 90 XX/1 seto [%al..%bh or memory] I don't understand what is the difference between /0 and /1. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:167: assert known_final_rr is False or known_final_rr == final_rr We can use valid_inputs == 0 instead of known_final_rr is False (we will need to move the assert one line up). This way we don't need to change its type at runtime and we can initialize it with None. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:173: if known_final_rr is None: Possible improvement: we can extract converting restricted register to a string to a function. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:222: # Then it will produce the following eigth instructions: eight
https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:214: # regex = '.*?[0-9a-fA-F]([0-7]) \\w* (%e(?:[abcd]x|[sb]p|[sd]i)).*()' ".*?" I don't understand to what expression "?" applies to. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:217: # (4, '%esp'), (5, '%ebp'), (6, '%esi'), (7, '%edi') Add square brackets: replacements = [(0, '%eax'),.... (7, '%edi')] https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:257: format = '' Rename to format_str to avoid confusion with the format method.
https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:352: True - instruction uses rm as source, reg as destination "rm_to_reg", "reg_to_rm", "xchg". https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:358: True - instruction can be used to produce "restricted register" "sandboxing writes", "no GP register writes", "nonsandboxing writes"
https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:14: Instruction: 00 00 %al,(%rax) On 2013/11/01 08:17:49, halyavin wrote: > add %al,(%rax) or may be even add %al,(%eax). Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:29: Only applies if all possibile memory accesses are accepted by validator. On 2013/11/01 08:17:49, halyavin wrote: > possible Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:63: Instruction: 66 0f 71 XX/2 00 psrlw $0x0,[%xmm0..%xmm7] On 2013/11/01 08:17:49, halyavin wrote: > Shouldn't it be %mm0..%mm7? Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:79: Instruction: 0f 90 XX/1 seto [%al..%bh or memory] On 2013/11/01 08:17:49, halyavin wrote: > I don't understand what is the difference between /0 and /1. There are no difference. "set" ignores "reg" field which means that XX/0, XX/1, ... XX/7 are all the same. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:85: import lockfile On 2013/10/31 22:51:54, bsy wrote: > unused imports. style nits. Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:160: for bit, initial_rr in enumerate(validator.ALL_REGISTERS + [None]): On 2013/10/31 22:51:54, bsy wrote: > bit is not used, since bit_position = 1 << bit is invariant maintained > (comment?), so why enumerate and have it here? > Because I forgot to remove "bit" when I've switched to bit_position Fixed. > that validator.ALL_REGISTERS + [None] is ordered to match valid_inputs is > somewhat non-obvious. Added comment. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:167: assert known_final_rr is False or known_final_rr == final_rr On 2013/11/01 08:17:49, halyavin wrote: > We can use valid_inputs == 0 instead of known_final_rr is False (we will need to > move the assert one line up). This way we don't need to change its type at > runtime and we can initialize it with None. Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:173: if known_final_rr is None: On 2013/11/01 08:17:49, halyavin wrote: > Possible improvement: we can extract converting restricted register to a string > to a function. I'm not sure two-line function used just once will improve readability. I've merged two arrays (and removed %r15d from the sum) to catch a case where final_rr is %r15d. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:214: # regex = '.*?[0-9a-fA-F]([0-7]) \\w* (%e(?:[abcd]x|[sb]p|[sd]i)).*()' On 2013/11/01 08:34:29, halyavin wrote: > ".*?" I don't understand to what expression "?" applies to. *? is non-greedy *, look it up in python's manual https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:217: # (4, '%esp'), (5, '%ebp'), (6, '%esi'), (7, '%edi') On 2013/11/01 08:34:29, halyavin wrote: > Add square brackets: replacements = [(0, '%eax'),.... (7, '%edi')] It's a tuple, why would I need square brackets here? https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:222: # Then it will produce the following eigth instructions: On 2013/11/01 08:17:49, halyavin wrote: > eight Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:257: format = '' On 2013/11/01 08:34:29, halyavin wrote: > Rename to format_str to avoid confusion with the format method. Done, although I'm not sure how can there be any confusion: format is a method, not global function!
https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:11: The following compression rules are present: Rules are applied only when all variants are accepted by validator. The following ... https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:31: 1b. Compress ModR/M (+SIB & displacement) register only There is no SIB & displacement for registers. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:58: 2b. Compress ModR/M (+SIB & displacement) register-only with opcode extension. There is no SIB & displacement for registers. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:123: 'mm8': [ 'mm{}'.format(N) for N in range(8) ], 'mm_alt' https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:125: 'st(8)': [ 'st({})'.format(N) for N in range(8) ], 'st_alt(0)' https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:142: # tie two groups together. Some instruction require particular value to Some instructions https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:149: # We try to feed all possible 'restricted register' into validator and then restricted registers
https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:188: # Note that iteration order is aligned with CCEPTABLE_X86_64_INPUTS array ACCEPTABLE_X86_64_INPUTS https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:194: # final_rr should not depent on input_rr depend https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:234: dis[0] = re.sub(' # 0x[ 0-9a-fA-F]*', '', dis[0]) 0x[ ]*[0-9a-fA-F]* https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:235: # Absolute addressing is allowed in 32-bit mode and completely forbidden Zero displacements are represented as 0x0 for all instructions except jumps where they disassembled as non-zero due to %eip/%rip-relative addressing. We replace this displacement with %eip/%rip to increase compression. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:295: def Compressed(instructions, show_progress=None): We better to create method that collects trace. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:296: split = None Use empty string for initial value. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:313: format_str += instruction[pos:match.start(len(match.groups()))] Extract these 5 lines to a function. https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:389: REGISTERS_RE['st(8)'] = REGISTERS_RE['st(0)'] st(8)->st_alt https://chromiumcodereview.appspot.com/49183002/diff/330001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:1337: replacements.append(format_str.format(*replacement)) Replace with for expression.
https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:352: True - instruction uses rm as source, reg as destination On 2013/11/01 14:08:57, halyavin wrote: > "rm_to_reg", "reg_to_rm", "xchg". Done. https://codereview.chromium.org/49183002/diff/80001/src/trusted/validator_rag... src/trusted/validator_ragel/compress_regular_instructions.py:358: True - instruction can be used to produce "restricted register" On 2013/11/01 14:08:57, halyavin wrote: > "sandboxing writes", "no GP register writes", "nonsandboxing writes" Done.
https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:11: The following compression rules are present: On 2013/11/06 12:11:51, halyavin wrote: > Rules are applied only when all variants are accepted by validator. The > following ... Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:31: 1b. Compress ModR/M (+SIB & displacement) register only On 2013/11/06 12:11:51, halyavin wrote: > There is no SIB & displacement for registers. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:58: 2b. Compress ModR/M (+SIB & displacement) register-only with opcode extension. On 2013/11/06 12:11:51, halyavin wrote: > There is no SIB & displacement for registers. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:123: 'mm8': [ 'mm{}'.format(N) for N in range(8) ], On 2013/11/06 12:11:51, halyavin wrote: > 'mm_alt' Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:125: 'st(8)': [ 'st({})'.format(N) for N in range(8) ], On 2013/11/06 12:11:51, halyavin wrote: > 'st_alt(0)' Removed instead. If we no longer have predictable names for all the second groups then st(8) is just not needed. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:142: # tie two groups together. Some instruction require particular value to On 2013/11/06 12:11:51, halyavin wrote: > Some instructions Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:149: # We try to feed all possible 'restricted register' into validator and then On 2013/11/06 12:11:51, halyavin wrote: > restricted registers Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:188: # Note that iteration order is aligned with CCEPTABLE_X86_64_INPUTS array On 2013/11/06 14:23:24, halyavin wrote: > ACCEPTABLE_X86_64_INPUTS Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:194: # final_rr should not depent on input_rr On 2013/11/06 14:23:24, halyavin wrote: > depend Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:234: dis[0] = re.sub(' # 0x[ 0-9a-fA-F]*', '', dis[0]) On 2013/11/06 14:23:24, halyavin wrote: > 0x[ ]*[0-9a-fA-F]* Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:235: # Absolute addressing is allowed in 32-bit mode and completely forbidden On 2013/11/06 14:23:24, halyavin wrote: > Zero displacements are represented as 0x0 for all instructions except jumps > where they disassembled as non-zero due to %eip/%rip-relative addressing. We > replace this displacement with %eip/%rip to increase compression. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:295: def Compressed(instructions, show_progress=None): On 2013/11/06 14:23:24, halyavin wrote: > We better to create method that collects trace. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:296: split = None On 2013/11/06 14:23:24, halyavin wrote: > Use empty string for initial value. Not sure what's wrong with None (we don't yet HAVE a split, right?), but whatever. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:313: format_str += instruction[pos:match.start(len(match.groups()))] On 2013/11/06 14:23:24, halyavin wrote: > Extract these 5 lines to a function. Done. https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:389: REGISTERS_RE['st(8)'] = REGISTERS_RE['st(0)'] On 2013/11/06 14:23:24, halyavin wrote: > st(8)->st_alt Removed https://codereview.chromium.org/49183002/diff/330001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:1337: replacements.append(format_str.format(*replacement)) On 2013/11/06 14:23:24, halyavin wrote: > Replace with for expression. Done.
https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:179: for initial_rr in validator.ALL_REGISTERS + [None]: I think that using index and "valid_inputs |= 1 << bit" is better than bit_position. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:196: return [ACCEPTABLE_X86_64_INPUTS[valid_inputs], we wanted to use true/false + array. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:234: '0x' + str(bytes) + '(.*)', it should be hex(bytes) + '(.*)', (note that hex even prints '0x' for us) https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:289: def CompressionTemplate(instruction, match, mark): """ Replace all match groups with the mark. """ https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:298: def CompressOneMatch(instructions, compressors, compressors parameter is not used. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:304: if not replacement_str in instructions: break Use return instead of break. Then you can remove else from for cycle. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:324: except KeyError: Use simple conditional logic instead of exceptions. The reader will not know that CompressOneMatch never throws KeyError. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:334: else: You don't need else, because you return from the function in the for cycle.
https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:412: '%rip' I don't understand the space-alignment algorithm in these two variables. INDEXES has 7 symbols between commas, X86_64_BASE_REGISTERS has variable number of symbols between commas. Additionally, commas after '%r15w' and '%r15' are not aligned. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:416: reg=None, rm=None, writes_to='rm', start_byte=0, Bad indentation. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:425: regex: regular expressions for the compressor regular expression https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:435: start_byte: first valid byte ModR/M byte (used when reg is None) "first valid ModR/M byte". Is there multiple ModR/M bytes or invalid ModR/M bytes? May be "index of the ModR/M byte"? https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:443: index_r8: must be called in False position (used to create two compressors must not be set by external users https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:443: index_r8: must be called in False position (used to create two compressors If index_r8 is internal, we should make it the last parameter.
https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:448: # Expand RR_NOTES section in regex. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:454: base = 'r8' if '8' in rm or rm == 'mmalt' else 'rax' Strange construction. Should base be set to 'r8' if rm equals to 'xmm8'? I would rather see "rm in [list with all variants]" instead. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:460: expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{'+ input + It is hard to read the replacement string with this indentation. We should put input_rr and output_rr on separate lines instead. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:466: address_regex = '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?\\))' Are "()" and "0x0()" valid addresses? I think we don't need "?" for the base register. We do need it in the else branch though. https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:469: '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?(?:,(?:%{' + index + '}))?' I would split the line into 5 to make it easier to see different groups: '(?:0x0|(?:0x0)?\\(' + '(?:%{' + base + '})?' + '(?:,(?:%{' + index + '}))?' + '(?:,(?:1|2|4|8))?' + '\\))'
https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:429: reg: reg operand kind (see REGISTERS array) or None What None means? Does it mean that reg operand is absent? https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:430: rm: rm operand kind (see REGISTERS array) Can rm equal to None too? https://chromiumcodereview.appspot.com/49183002/diff/450001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:474: # If both modrm and reg are given then ModR/M then ModR/M what? Is this comment related to assert below?
https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:179: for initial_rr in validator.ALL_REGISTERS + [None]: On 2013/11/07 10:54:38, halyavin wrote: > I think that using index and "valid_inputs |= 1 << bit" is better than > bit_position. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:196: return [ACCEPTABLE_X86_64_INPUTS[valid_inputs], On 2013/11/07 10:54:38, halyavin wrote: > we wanted to use true/false + array. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:234: '0x' + str(bytes) + '(.*)', On 2013/11/07 10:54:38, halyavin wrote: > it should be hex(bytes) + '(.*)', (note that hex even prints '0x' for us) Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:289: def CompressionTemplate(instruction, match, mark): On 2013/11/07 10:54:38, halyavin wrote: > """ Replace all match groups with the mark. """ Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:298: def CompressOneMatch(instructions, compressors, On 2013/11/07 10:54:38, halyavin wrote: > compressors parameter is not used. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:304: if not replacement_str in instructions: break On 2013/11/07 10:54:38, halyavin wrote: > Use return instead of break. Then you can remove else from for cycle. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:324: except KeyError: On 2013/11/07 10:54:38, halyavin wrote: > Use simple conditional logic instead of exceptions. The reader will not know > that CompressOneMatch never throws KeyError. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:334: else: On 2013/11/07 10:54:38, halyavin wrote: > You don't need else, because you return from the function in the for cycle. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:412: '%rip' On 2013/11/07 11:08:49, halyavin wrote: > I don't understand the space-alignment algorithm in these two variables. INDEXES > has 7 symbols between commas, X86_64_BASE_REGISTERS has variable number of > symbols between commas. Additionally, commas after '%r15w' and '%r15' are not > aligned. The idea was to make sure different kinds of the same register are placed under each other. Fixed. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:416: reg=None, rm=None, writes_to='rm', start_byte=0, On 2013/11/07 11:08:49, halyavin wrote: > Bad indentation. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:425: regex: regular expressions for the compressor On 2013/11/07 11:08:49, halyavin wrote: > regular expression Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:429: reg: reg operand kind (see REGISTERS array) or None On 2013/11/07 12:07:28, halyavin wrote: > What None means? Does it mean that reg operand is absent? Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:430: rm: rm operand kind (see REGISTERS array) On 2013/11/07 12:07:28, halyavin wrote: > Can rm equal to None too? No, it can not be None. Fixed. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:435: start_byte: first valid byte ModR/M byte (used when reg is None) On 2013/11/07 11:08:49, halyavin wrote: > "first valid ModR/M byte". Is there multiple ModR/M bytes or invalid ModR/M > bytes? May be "index of the ModR/M byte"? Replaced with opcode_bits https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:443: index_r8: must be called in False position (used to create two compressors On 2013/11/07 11:08:49, halyavin wrote: > must not be set by external users Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:443: index_r8: must be called in False position (used to create two compressors On 2013/11/07 11:08:49, halyavin wrote: > must not be set by external users Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:448: On 2013/11/07 11:59:38, halyavin wrote: > # Expand RR_NOTES section in regex. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:454: base = 'r8' if '8' in rm or rm == 'mmalt' else 'rax' On 2013/11/07 11:59:38, halyavin wrote: > Strange construction. Should base be set to 'r8' if rm equals to 'xmm8'? I would > rather see "rm in [list with all variants]" instead. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:460: expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{'+ input + On 2013/11/07 11:59:38, halyavin wrote: > It is hard to read the replacement string with this indentation. We should put > input_rr and output_rr on separate lines instead. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:466: address_regex = '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?\\))' On 2013/11/07 11:59:38, halyavin wrote: > Are "()" and "0x0()" valid addresses? I think we don't need "?" for the base > register. We do need it in the else branch though. Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:469: '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?(?:,(?:%{' + index + '}))?' On 2013/11/07 11:59:38, halyavin wrote: > I would split the line into 5 to make it easier to see different groups: > '(?:0x0|(?:0x0)?\\(' + > '(?:%{' + base + '})?' + > '(?:,(?:%{' + index + '}))?' + > '(?:,(?:1|2|4|8))?' + > '\\))' Done. https://codereview.chromium.org/49183002/diff/450001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:474: # If both modrm and reg are given then ModR/M On 2013/11/07 12:07:28, halyavin wrote: > then ModR/M what? Is this comment related to assert below? Done.
https://chromiumcodereview.appspot.com/49183002/diff/520001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/520001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:419: memory_compressors (regsiter <-> memory instructions) regsiter->register https://chromiumcodereview.appspot.com/49183002/diff/520001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:524: '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex) It is better to have context-independent language. https://chromiumcodereview.appspot.com/49183002/diff/520001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:524: '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex) Create a single array and join it via '|'.join(...). https://chromiumcodereview.appspot.com/49183002/diff/520001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:531: '(?:' + '|'.join(value) + ')', register_regex) It makes sense to extract this template language into separate function.
https://codereview.chromium.org/49183002/diff/520001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/520001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:419: memory_compressors (regsiter <-> memory instructions) On 2013/11/08 15:35:19, halyavin wrote: > regsiter->register This comment is removed https://codereview.chromium.org/49183002/diff/520001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:524: '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex) On 2013/11/08 15:35:19, halyavin wrote: > It is better to have context-independent language. Not sure why, but this language is removed, we are now processing only one instruction variant which simplifies logic and makes test 30% faster. https://codereview.chromium.org/49183002/diff/520001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:524: '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex) On 2013/11/08 15:35:19, halyavin wrote: > Create a single array and join it via '|'.join(...). This logic is also removed https://codereview.chromium.org/49183002/diff/520001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:531: '(?:' + '|'.join(value) + ')', register_regex) On 2013/11/08 15:35:19, halyavin wrote: > It makes sense to extract this template language into separate function. Language is removed: we accept all variants here now but since a single comparison later will reject them difference in speed is negligible.
https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:425: memory_accessed: True if instruction accesses memory Not used. Default to True makes it even more misleading. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:441: # Two upper bits of ModR/M byte (mod field) must be equal to 11. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:446: if byte & 0x38 == opcode_bits << 3] if (byte >> 3) & 0x7 == opcode_bits https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:452: mod_field = (modrm & 0xc0) >> 6 (modrm >> 6) & 0x3 https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:453: reg_field = (modrm & 0x38) >> 3 (modrm >> 3) & 0x7 https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:457: bytes = '{:02x}'.format(modrm) bytes->byte_text https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:459: replacement = [bytes] It is not clear to me which variables can be equal to None in code below. I think some asserts will help to clear this up. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:463: replacement.append(rm_text if writes_to == 'reg' else reg_text) Can any on these 2 fields equal to None? https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:467: output = reg_text if writes_to == 'reg' else rm_text Can output equal to None? https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:468: replacement.append(output if register_write == 'sandbox' else None) Why we are pushing None here? I thought that replacement is an array of strings without None values since they will lead to an exception during replacement. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:469: if register_write == 'protect' and output in X86_64_BASE_REGISTERS: # Do not generate replacements that will violate sandboxing rules. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:473: if writes_to == 'both' and reg_text in X86_64_BASE_REGISTERS: Can reg_text equal to None here? https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:476: replacements.append(replacement) We can join these 2 lines to avoid changing type of replacement variable. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:485: """Creates replacement tuples list for register-to-register instructions Copy paste error: "register-to-register" https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:515: base = 'r8' if '8' in rm or rm == 'mmalt' else 'rax' We wanted to replace it with the access to dictionary. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:520: # Two upper bits in ModR/M byte (mod field) must be equal to 00, 01 or 10. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:539: # If RM field == %rbp and MOD fiels is zero then it's absolute address fiels->field https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:584: scale_text = pow(2, scale_field) Use shift instead and apply str(). https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:588: if (options.bitness == 32 or Reverting this if will reduce amount of negative logic here. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:590: scale_field != 0 or index[0:2] == 'r8'): Can index be equal to r8d/r8w/r8b here? https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:613: if index_text == "%riz": Use singular quotes. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:614: index_text = 'any_nonspecial' Why we need this replacement?
https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:425: memory_accessed: True if instruction accesses memory On 2013/11/12 11:44:06, halyavin wrote: > Not used. Default to True makes it even more misleading. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:441: On 2013/11/12 11:44:06, halyavin wrote: > # Two upper bits of ModR/M byte (mod field) must be equal to 11. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:446: if byte & 0x38 == opcode_bits << 3] On 2013/11/12 11:44:06, halyavin wrote: > if (byte >> 3) & 0x7 == opcode_bits Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:452: mod_field = (modrm & 0xc0) >> 6 On 2013/11/12 11:44:06, halyavin wrote: > (modrm >> 6) & 0x3 I don't really like it: reight now it's easy to see that we've grabbed all the bits from ModR/M because 0xc0 | 0x38 | 0x7 == 0xff. But whatever, I'll not fight for that. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:453: reg_field = (modrm & 0x38) >> 3 On 2013/11/12 11:44:06, halyavin wrote: > (modrm >> 3) & 0x7 Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:457: bytes = '{:02x}'.format(modrm) On 2013/11/12 11:44:06, halyavin wrote: > bytes->byte_text Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:459: replacement = [bytes] On 2013/11/12 11:44:06, halyavin wrote: > It is not clear to me which variables can be equal to None in code below. I > think some asserts will help to clear this up. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:463: replacement.append(rm_text if writes_to == 'reg' else reg_text) On 2013/11/12 11:44:06, halyavin wrote: > Can any on these 2 fields equal to None? Hmm... Reg can be None - that's why we are checking for that case two lines above! Rm can not be none (there are actually couple of such instructions but this will just complicate logic immensely for little benefit thus we don't handle this case here). https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:467: output = reg_text if writes_to == 'reg' else rm_text On 2013/11/12 11:44:06, halyavin wrote: > Can output equal to None? Sure. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:468: replacement.append(output if register_write == 'sandbox' else None) On 2013/11/12 11:44:06, halyavin wrote: > Why we are pushing None here? I thought that replacement is an array of strings > without None values since they will lead to an exception during replacement. Replacement uses .format(...) which means None becomes 'None' https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:469: if register_write == 'protect' and output in X86_64_BASE_REGISTERS: On 2013/11/12 11:44:06, halyavin wrote: > # Do not generate replacements that will violate sandboxing rules. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:473: if writes_to == 'both' and reg_text in X86_64_BASE_REGISTERS: On 2013/11/12 11:44:06, halyavin wrote: > Can reg_text equal to None here? Makes no sense, really but should be handled correctly. Added assert to clarify the situation. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:476: replacements.append(replacement) On 2013/11/12 11:44:06, halyavin wrote: > We can join these 2 lines to avoid changing type of replacement variable. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:485: """Creates replacement tuples list for register-to-register instructions On 2013/11/12 11:44:06, halyavin wrote: > Copy paste error: "register-to-register" Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:515: base = 'r8' if '8' in rm or rm == 'mmalt' else 'rax' On 2013/11/12 11:44:06, halyavin wrote: > We wanted to replace it with the access to dictionary. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:520: On 2013/11/12 11:44:06, halyavin wrote: > # Two upper bits in ModR/M byte (mod field) must be equal to 00, 01 or 10. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:539: # If RM field == %rbp and MOD fiels is zero then it's absolute address On 2013/11/12 11:44:06, halyavin wrote: > fiels->field Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:584: scale_text = pow(2, scale_field) On 2013/11/12 11:44:06, halyavin wrote: > Use shift instead and apply str(). str() is not really needed, but to make sure "..._text" actually text... It's not time-critical part of the code thus probably Ok https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:588: if (options.bitness == 32 or On 2013/11/12 11:44:06, halyavin wrote: > Reverting this if will reduce amount of negative logic here. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:590: scale_field != 0 or index[0:2] == 'r8'): On 2013/11/12 11:44:06, halyavin wrote: > Can index be equal to r8d/r8w/r8b here? Apparently not. We don't support this form of lea and all other instructions are forbidden because of security rules. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:613: if index_text == "%riz": On 2013/11/12 11:44:06, halyavin wrote: > Use singular quotes. Done. https://codereview.chromium.org/49183002/diff/720001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:614: index_text = 'any_nonspecial' On 2013/11/12 11:44:06, halyavin wrote: > Why we need this replacement? Because i've copy-pasted check from another place and it expected to see 'any_nonspecial' there. Fixed.
https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:448: # end make rejection faster (%r15 is equal to 0x7 and %rbp is 0x5). make -> to make https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:467: if reg is None: We can make code longer but easier to understand with respect to None/not None cases. if reg is None: assert writes_to == 'rm' input, output = None, rm_text replacement.append(output) else: reg_text = '%' + REGISTERS[reg][reg_field] if writes_to == 'reg': input, output = rm_text, reg_text else: # rm, both input, output = reg_text, rm_text replacement.extend([input, output]) Then we don't need reg_text and rm_text anymore. if writes_to == 'both' and reg_text in X86_64_BASE_REGISTERS can be replaced with if writes_to == 'both' and input in X86_64_BASE_REGISTERS We can then extract the if to input, output = AppendOperandsReplacement( replacement, rm_text, reg, reg_field, writes_to) input -> index_32 in ModRMMemoryReplacements https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:540: if byte & 0x38 == opcode_bits << 3] Change as in previous function. https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:556: if mod_field == 0 and rm_field == validator.REG_RBP: Extract into function: bytes_text, rm_text, base_text = BaseOnlyMemoryOperand( mod_field, base, rm_field) https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:603: if mod_field == 0 and base_field == validator.REG_RBP: Extract into function: bytes_text, rm_text, base_text = SIBMemoryOperand( mod_field, base, base_field, index_text, scale_text) https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:609: index_field == validator.REG_RSP and can we use index_text in ['%esp', '%rsp'] here? https://chromiumcodereview.appspot.com/49183002/diff/890001/src/trusted/valid... src/trusted/validator_ragel/compress_regular_instructions.py:629: base_field == validator.REG_RSP and can we use index_text in ['%esp', '%rsp'] here?
https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:448: # end make rejection faster (%r15 is equal to 0x7 and %rbp is 0x5). On 2013/11/14 07:07:50, halyavin wrote: > make -> to make Done. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:467: if reg is None: On 2013/11/14 07:07:50, halyavin wrote: > We can make code longer but easier to understand with respect to None/not None > cases. > > if reg is None: > assert writes_to == 'rm' > input, output = None, rm_text > replacement.append(output) > else: > reg_text = '%' + REGISTERS[reg][reg_field] > if writes_to == 'reg': > input, output = rm_text, reg_text > else: # rm, both > input, output = reg_text, rm_text > replacement.extend([input, output]) > Then we don't need reg_text and rm_text anymore. > if writes_to == 'both' and reg_text in X86_64_BASE_REGISTERS > can be replaced with > if writes_to == 'both' and input in X86_64_BASE_REGISTERS > > We can then extract the if to > input, output = AppendOperandsReplacement( > replacement, rm_text, reg, reg_field, writes_to) > > input -> index_32 in ModRMMemoryReplacements Done. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:540: if byte & 0x38 == opcode_bits << 3] On 2013/11/14 07:07:50, halyavin wrote: > Change as in previous function. Done. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:556: if mod_field == 0 and rm_field == validator.REG_RBP: On 2013/11/14 07:07:50, halyavin wrote: > Extract into function: > bytes_text, rm_text, base_text = BaseOnlyMemoryOperand( > mod_field, base, rm_field) Done. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:603: if mod_field == 0 and base_field == validator.REG_RBP: On 2013/11/14 07:07:50, halyavin wrote: > Extract into function: > bytes_text, rm_text, base_text = SIBMemoryOperand( > mod_field, base, base_field, index_text, scale_text) Done. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:609: index_field == validator.REG_RSP and On 2013/11/14 07:07:50, halyavin wrote: > can we use index_text in ['%esp', '%rsp'] here? Yes, we can. Not %rsp, of course, but %riz. https://codereview.chromium.org/49183002/diff/890001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:629: base_field == validator.REG_RSP and On 2013/11/14 07:07:50, halyavin wrote: > can we use index_text in ['%esp', '%rsp'] here? Yes, we can do that, too.
https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:415: base_text='%r15', index_text='%r15'): Can we use %riz as default value instead? https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:436: if index_text in X86_64_BASE_REGISTERS - set(['%r15']): Why index_text can be %r15? Is (%r15,%r15,) valid? If you need neutral value, use %riz. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:442: if (writes_to == 'both' and input in X86_64_BASE_REGISTERS): We don't need brackets here. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:454: reg_field: value of reg field in ModR/M byte (is only used if reg != None) You changed the argument from reg_field to modrm. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:468: reg_field = (modrm >> 3) & 0x07 Move this line to else clause. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:562: base_text = '%rip' Use %riz or %eiz in 32-bit case. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:589: index_text: textual representation of "base" register base->index https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:606: scale_field == 0): scale_text == '1'. It will make easier to understand that this form is equivalent to one in else branch. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:611: base_text = '' Use %riz/%eiz. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:627: scale_field == 0): scale_text == '1' https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:628: rm_text = ('0x0({})' if mod_field else '({})').format(base_text) if mod_field in (1, 2): # 8-bit or 32-bit offset rm_text = '0x0({})'.format(base_text) else: # mod_field == 0, no offset. rm_text = '({})'.format(base_text) https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:717: # Convert %r8 to %r8, %r9 to %r9d, etc Convert %r8 to %r8d https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:723: # instriuction which explicitly write to reg operand can do that instriuction->instruction
https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:415: base_text='%r15', index_text='%r15'): On 2013/11/18 09:50:47, halyavin wrote: > Can we use %riz as default value instead? Hmm... Yes, I just wanted to use "always valid" register as default, but it's not important for base. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:436: if index_text in X86_64_BASE_REGISTERS - set(['%r15']): On 2013/11/18 09:50:47, halyavin wrote: > Why index_text can be %r15? Is (%r15,%r15,) valid? If you need neutral value, > use %riz. That's two separate questions :-) Yes, I need a neutral value for index and I wanted to use something valid for base since I initially had memory_accessed=True by default, but the fact that (%r15,%r15,1) is valid is just a quirk of our automata: we don't allow modification of %r15 (it must be disallowed even if it's not used as restricted register for memory access!) thus it's safe to accept %r15 as index. Note: ValidateInstruction actually verifies that %r15 is never produced as restricted register. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:442: if (writes_to == 'both' and input in X86_64_BASE_REGISTERS): On 2013/11/18 09:50:47, halyavin wrote: > We don't need brackets here. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:454: reg_field: value of reg field in ModR/M byte (is only used if reg != None) On 2013/11/18 09:50:47, halyavin wrote: > You changed the argument from reg_field to modrm. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:468: reg_field = (modrm >> 3) & 0x07 On 2013/11/18 09:50:47, halyavin wrote: > Move this line to else clause. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:562: base_text = '%rip' On 2013/11/18 09:50:47, halyavin wrote: > Use %riz or %eiz in 32-bit case. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:589: index_text: textual representation of "base" register On 2013/11/18 09:50:47, halyavin wrote: > base->index Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:606: scale_field == 0): On 2013/11/18 09:50:47, halyavin wrote: > scale_text == '1'. It will make easier to understand that this form is > equivalent to one in else branch. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:611: base_text = '' On 2013/11/18 09:50:47, halyavin wrote: > Use %riz/%eiz. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:627: scale_field == 0): On 2013/11/18 09:50:47, halyavin wrote: > scale_text == '1' Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:628: rm_text = ('0x0({})' if mod_field else '({})').format(base_text) On 2013/11/18 09:50:47, halyavin wrote: > if mod_field in (1, 2): # 8-bit or 32-bit offset > rm_text = '0x0({})'.format(base_text) > else: # mod_field == 0, no offset. > rm_text = '({})'.format(base_text) Grr. Four lines instead of one. Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:717: # Convert %r8 to %r8, %r9 to %r9d, etc On 2013/11/18 09:50:47, halyavin wrote: > Convert %r8 to %r8d Done. https://codereview.chromium.org/49183002/diff/960001/src/trusted/validator_ra... src/trusted/validator_ragel/compress_regular_instructions.py:723: # instriuction which explicitly write to reg operand can do that On 2013/11/18 09:50:47, halyavin wrote: > instriuction->instruction Done.
https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:602: # In 32-bit mode there are another, shorter, form, but it's used Remove this comment since it confuses the reader. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:725: # xchg with memory can not be used to sandbox it's operand, only Move comment to the else clause. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:727: if writes_to == 'reg' and register_write == 'sandbox': Add comment that writes_to == 'reg' means that output is a register.
https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:698: if writes_to == 'reg' and register_write == 'sandbox': The same as below https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:706: replacements.append(replacement) We wanted to join this lines.
https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:602: # In 32-bit mode there are another, shorter, form, but it's used On 2013/11/18 14:13:10, halyavin wrote: > Remove this comment since it confuses the reader. Done. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:698: if writes_to == 'reg' and register_write == 'sandbox': On 2013/11/19 08:57:50, halyavin wrote: > The same as below Done. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:706: replacements.append(replacement) On 2013/11/19 08:57:50, halyavin wrote: > We wanted to join this lines. Done. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:725: # xchg with memory can not be used to sandbox it's operand, only On 2013/11/18 14:13:10, halyavin wrote: > Move comment to the else clause. Done. https://codereview.chromium.org/49183002/diff/1010001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:727: if writes_to == 'reg' and register_write == 'sandbox': On 2013/11/18 14:13:10, halyavin wrote: > Add comment that writes_to == 'reg' means that output is a register. Done.
PTAL
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:857: if reg[0:2] == 'r8' and register_write != 'ignore' and writes_to != 'rm': end_reg = REGISTERS[reg][-1] # Exclude r15 from the interval, if instruction can write to reg. if (end_reg.startswith('r15') and register_write != 'ignore' and writes_to in ['reg', 'both']): end_reg = REGISTERS[reg][-2] https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:862: if rm[0:2] == 'r8' and register_write != 'ignore' and writes_to != 'reg': The same as above. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:867: regex = '(?: 00)*' This line is confusing. Since this is the start of the regular expression, I assumed that it matches the start of the instruction. Do we need these zeros at all? There could be zero of them anyway. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:868: # Additional byte is opcode extension with 3DNow! instructions. # The last byte of the 3DNow! instruction is opcode extension. The same question as above: do we need this at all? https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:871: regex += ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?' # 2 spaces separate byte code from instruction name. regex += ' ' regex += '(?:lock )?' # Instruction name regex += '\\w* ' # Immediate or implicit operand that can precede reg/rm pair. regex += '(?:...)?' Why do we need %xmm0 and 2 immediates here? I thought that enter/leave do not have any register or memory parameters. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:887: else: if writes_to == 'reg': notes_register = ' # rm to reg' else: notes_register = ' # reg to rm' if notes != '': notes_register += '; ' + notes notes = ' # ' + notes https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:892: notes = ' # ' + notes Why we have space before # here but not in notes_register? Can we omit '# ' and write function def ToComment(comment): if comment != '': return '# ' + comment return '' https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:912: if register_write == 'sandbox': Can we have situation that second operand is memory but register_write == 'sandbox'?
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:925: regex += '.*' Can we use '$' instead? https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:935: subst_r8 = subst[0:-1] + (input_note_r8, output_note) + subst[-1:] Add notes at the end. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:949: '.*?(04 0[47])' + regex, subst_memory, memory_replacement)) mod_rm_sib_regex = '(04 0[47])'
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1026: {'writes_to':'reg', 'x3Dnow':'yes'}) 'yes'->True https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1026: {'writes_to':'reg', 'x3Dnow':'yes'}) 'x3Dnow'-> 'is_3Dnow'
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:991: instruction_kinds = [ Use append and extract register_write to local variable.
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1051: if rm[0:2] == 'r8' and register_write != 'ignore': The same as in previous function. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1060: # We only need to process ModR/M+SIB '04 04' or '04 07' here Extract to function. Note, that you will need opcode_bits as a parameter. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1065: regex = ('(?: 00)* (?:lock )?\\w* (?:\\$0x0,|%cl,)?' Split into multiple additions as in previous function.
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1172: end_reg = REGISTERS[reg][-2 if text2[-2] == 'e' else -1] use nibble instead. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1177: '(%(?:' + REGISTERS_RE[reg][1] + ')).*()', I have no idea why sometimes it is [1] and sometimes it is [0] https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1177: '(%(?:' + REGISTERS_RE[reg][1] + ')).*()', Why not '(%' + REGISTERS_RE[reg][1] + ').*()'? https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1195: tuple([text1] + ['[%{}..%{}]'.format(start_reg, end_reg)] * 2 + Extract '[%{}..%{}]'.format(start_reg, end_reg) to variable.
https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:857: if reg[0:2] == 'r8' and register_write != 'ignore' and writes_to != 'rm': On 2013/12/06 10:58:46, halyavin wrote: > end_reg = REGISTERS[reg][-1] > # Exclude r15 from the interval, if instruction can write to reg. > if (end_reg.startswith('r15') and register_write != 'ignore' and > writes_to in ['reg', 'both']): > end_reg = REGISTERS[reg][-2] Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:862: if rm[0:2] == 'r8' and register_write != 'ignore' and writes_to != 'reg': On 2013/12/06 10:58:46, halyavin wrote: > The same as above. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:867: regex = '(?: 00)*' On 2013/12/06 10:58:46, halyavin wrote: > This line is confusing. Since this is the start of the regular expression, I assumed that it matches the start of the instruction. Well, it's does not work that way. Tried to add the clarifying comment. > Do we need these zeros at all? There could be zero of them anyway. Sure, most instructions have zero zeros here, but some (like movabs) may have up to eight zeros here. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:868: # Additional byte is opcode extension with 3DNow! instructions. On 2013/12/06 10:58:46, halyavin wrote: > # The last byte of the 3DNow! instruction is opcode extension. > The same question as above: do we need this at all? Yes, we do. Do you really think I would add useless pieces of code to the copressor which works just fine as it is. This time here is even comment which explains just WHY do we need these! https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:871: regex += ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?' On 2013/12/06 10:58:46, halyavin wrote: > # 2 spaces separate byte code from instruction name. > regex += ' ' > regex += '(?:lock )?' > # Instruction name > regex += '\\w* ' > # Immediate or implicit operand that can precede reg/rm pair. > regex += '(?:...)?' > Done. > Why do we need %xmm0 and 2 immediates here? %xmm0 is used e.g. in pblendvb, two immediates are used e.g. in insertq > I thought that enter/leave do not have any register or memory parameters. They don't, you are correct. These are not for them. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:887: else: On 2013/12/06 10:58:46, halyavin wrote: > if writes_to == 'reg': > notes_register = ' # rm to reg' > else: > notes_register = ' # reg to rm' > if notes != '': > notes_register += '; ' + notes > notes = ' # ' + notes Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:892: notes = ' # ' + notes On 2013/12/06 10:58:46, halyavin wrote: > Why we have space before # here but not in notes_register? > Can we omit '# ' and write function > def ToComment(comment): > if comment != '': > return '# ' + comment > return '' What will it save? https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:912: if register_write == 'sandbox': On 2013/12/06 10:58:46, halyavin wrote: > Can we have situation that second operand is memory but register_write == > 'sandbox'? Yes, why not? https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:925: regex += '.*' On 2013/12/06 11:25:43, halyavin wrote: > Can we use '$' instead? We can, but of course not here. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:935: subst_r8 = subst[0:-1] + (input_note_r8, output_note) + subst[-1:] On 2013/12/06 11:25:43, halyavin wrote: > Add notes at the end. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:949: '.*?(04 0[47])' + regex, subst_memory, memory_replacement)) On 2013/12/06 11:25:43, halyavin wrote: > mod_rm_sib_regex = '(04 0[47])' Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:991: instruction_kinds = [ On 2013/12/06 11:41:33, halyavin wrote: > Use append and extract register_write to local variable. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1026: {'writes_to':'reg', 'x3Dnow':'yes'}) On 2013/12/06 11:31:17, halyavin wrote: > 'yes'->True Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1026: {'writes_to':'reg', 'x3Dnow':'yes'}) On 2013/12/06 11:31:17, halyavin wrote: > 'x3Dnow'-> 'is_3Dnow' Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1051: if rm[0:2] == 'r8' and register_write != 'ignore': On 2013/12/06 12:38:27, halyavin wrote: > The same as in previous function. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1060: # We only need to process ModR/M+SIB '04 04' or '04 07' here On 2013/12/06 12:38:27, halyavin wrote: > Extract to function. Note, that you will need opcode_bits as a parameter. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1065: regex = ('(?: 00)* (?:lock )?\\w* (?:\\$0x0,|%cl,)?' On 2013/12/06 12:38:27, halyavin wrote: > Split into multiple additions as in previous function. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1172: end_reg = REGISTERS[reg][-2 if text2[-2] == 'e' else -1] On 2013/12/06 13:02:51, halyavin wrote: > use nibble instead. Done. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1177: '(%(?:' + REGISTERS_RE[reg][1] + ')).*()', On 2013/12/06 13:02:51, halyavin wrote: > I have no idea why sometimes it is [1] and sometimes it is [0] There are comment here which explains it! Tryed to change it. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1177: '(%(?:' + REGISTERS_RE[reg][1] + ')).*()', On 2013/12/06 13:02:51, halyavin wrote: > Why not '(%' + REGISTERS_RE[reg][1] + ').*()'? For historical reasons. Fixed. https://codereview.chromium.org/49183002/diff/1110001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1195: tuple([text1] + ['[%{}..%{}]'.format(start_reg, end_reg)] * 2 + On 2013/12/06 13:02:51, halyavin wrote: > Extract '[%{}..%{}]'.format(start_reg, end_reg) to variable. Done.
https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:885: if is_3Dnow: Write it as if else and replace "Additional byte" with "Immediate byte". May be save all byte code part of regex to separate variable.
https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:831: """ Returns two memory regexes: for bytes and textual representation. """ It is 3 regex now.
https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:831: """ Returns two memory regexes: for bytes and textual representation. """ On 2013/12/06 16:17:58, halyavin wrote: > It is 3 regex now. Done. https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:885: if is_3Dnow: On 2013/12/06 16:05:47, halyavin wrote: > Write it as if else and replace "Additional byte" with "Immediate byte". > May be save all byte code part of regex to separate variable. Done.
https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1209: ' \\w* (?:\\$0x0,|%ax,|%st,)?' Use raw literals when you are escaping '\\'
https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:6: Traverse the validator's DFA, collect all "normal" instruction and then instruction -> instructions https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:99: # use first register, but 'spl' group uses fifth register because it's first are named by the first register but 'spl' group is named by the fifth register https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:245: # subst (which is used to denote compressed version) and replacements (which which is used to create the compressed version https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:254: # When faced with instriuction '40 inc %eax' it will capture the following with instruction https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:255: # pieces of said instruction: '4[0] inc [%eax]'. Add the empty group which will be replaced with the note. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:271: # Note that last group is only used in the replacement. It's used to grab marks the last group https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:271: # Note that last group is only used in the replacement. It's used to grab marks Replacement is a poor choice of word in this sentence because it is exactly replacements field that doesn't contain the last group. I suggest to reformulate the last 2 paragraphs to avoid using 'replacement' word in this meaning (but it is ok to use 'replacement' word in the other meaning). For example: ... Then it will replace groups to produce the following 8 instructions: ... ...and will substitute one "compressed instruction"... ... Note that the last group is only used in the substitution ... ... It matches marks added by the previous compressors and allows us to insert the new mark. ... https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:280: def __init__(self, regex, subst, replacements=None): replacements are never None. Such rules are useless. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:283: self.replacements = [] if replacements is None else replacements This line can be simplified to self.replacements=replacements https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:293: return format_str + instruction[pos:match.start(len(match.groups()))] You are not replacing the last group with the mark. As far as I understand this is intended behavior but you need to update the comment. We need to rethink this moment. Does regex always ends with '.*()'? Why ignoring the comment completely is ok? If the last group matches '# reg to rm' for example, then it means that there is no replacement that will match the original instruction itself. If the last group is always '()' and at the end of the regex, we can use simpler expression instruction[pos:] here. This will mean that we always replace last group with itself. This behavior is easy to understand. Also, in this case we must fix the comment above about last group grabbing comments leaved by previous compressors. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:395: REGISTERS_RE['st\\(0\\)'] = REGISTERS_RE['st(0)'] Do we need this line? https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:403: # Register which can not be used as base in 64-bit mode in all incarnations The comment is wrong. We do can use %r15, %rbp, %rsp and %rip as base. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:436: if index_text in X86_64_BASE_REGISTERS - set(['%r15']): We wanted to add comment here. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:836: if options.bitness == 32: # We only match memory operands without offset, so we don't need to add zeros to the end of modrm_regex and modrm_sib_regex regular expressions. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:837: regex_mem = '\\(%esp,%eax,1\\)' regex_mem->mem_regex for consistency. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:898: output = None rename to output_rr_regex https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:899: output_note = None rename to output_rr_subst https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:905: output = '%' + reg + '|None' We need to either remove None from here or explain how it could happen. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:921: output = '%' + rm + '|None' We need to add comment of why we are doing such slightly incorrect compression. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:937: if memory_accessed: Can we move regex += '; input_rr=(%eax|%r8d)' here and regex += '; input_rr = (any_nonspecial)' to the else clause? https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:938: input_note = '[%eax..%edi]' input_note->input_rr_subst https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:939: input_note_r8 = '[%r8d..%r15d]' input_note_r8->input_rr_subst_r8 https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:952: notes_register = ' # rm to reg' notes_register->notes_register_subst https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:957: notes = ' # ' + notes notes_subst = '# ' + notes
https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:934: if options.bitness == 64: # We generate 2 variants of compressors with memory operands to enumerate 2 possible values of index_r8 property. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:942: input_note_r8 = 'any_nonspecial' r8->index8
https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:304: instructions -= subset Add "assert instruction in subset". We may see a few surprises here. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:843: def RegToRmCompressors(rm, reg, writes_to, memory_accessed=True, Big refactoring (and so it is better to make it in a separate patchset so that we can roll it back). I don't like us maintaining several variables in parallel. I think it is better to complicate enumeration but simplify Compressor creation. We will need to introduce 2 new parameters: rm_compression which is 3-state ('register', 'memory', 'register_or_memory') and index_r8 which is 3 state (None, False, True). The enumeration will be changed as follows: for instruction_kind in instruction_kinds: for rm_compression in ('register', 'memory', 'register_or_memory'): if rm_compression == 'register' and instruction_kind.get('memory_accessed'): continue if options.bitness == 64 and rm_compression != 'register': for index_r8 in (False, True): compressors.extends(RegToRmCompressors(reg=reg, rm=rm, rm_compression=rm_compression, index_r8=index_r8, **instruction_kind) else: compressors.extends(RegToRmCompressors(reg=reg, rm=rm, rm_compression=rm_compression, **instruction_kind) This transformation will allow us to ensure that regex is exactly what is needed by compressor. Currently we make regex more general so that it matches all 5 variants.
https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:304: instructions -= subset On 2013/12/07 19:51:15, halyavin wrote: > Add "assert instruction in subset". We may see a few surprises here. No surprises. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:843: def RegToRmCompressors(rm, reg, writes_to, memory_accessed=True, On 2013/12/07 19:51:15, halyavin wrote: > Big refactoring (and so it is better to make it in a separate patchset so that > we can roll it back). > > I don't like us maintaining several variables in parallel. I think it is better > to complicate enumeration but simplify Compressor creation. We will need to > introduce 2 new parameters: rm_compression which is 3-state ('register', > 'memory', 'register_or_memory') and index_r8 which is 3 state (None, False, > True). > > The enumeration will be changed as follows: > > for instruction_kind in instruction_kinds: > for rm_compression in ('register', 'memory', 'register_or_memory'): > if rm_compression == 'register' and instruction_kind.get('memory_accessed'): > continue > if options.bitness == 64 and rm_compression != 'register': > for index_r8 in (False, True): > compressors.extends(RegToRmCompressors(reg=reg, rm=rm, > rm_compression=rm_compression, index_r8=index_r8, **instruction_kind) > else: > compressors.extends(RegToRmCompressors(reg=reg, rm=rm, > rm_compression=rm_compression, **instruction_kind) > > This transformation will allow us to ensure that regex is exactly what is needed > by compressor. Currently we make regex more general so that it matches all 5 > variants. Done.
https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:6: Traverse the validator's DFA, collect all "normal" instruction and then On 2013/12/07 17:27:15, halyavin wrote: > instruction -> instructions Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:99: # use first register, but 'spl' group uses fifth register because it's first On 2013/12/07 17:27:15, halyavin wrote: > are named by the first register but 'spl' group is named by the fifth register Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:245: # subst (which is used to denote compressed version) and replacements (which On 2013/12/07 17:27:15, halyavin wrote: > which is used to create the compressed version Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:254: # When faced with instriuction '40 inc %eax' it will capture the following On 2013/12/07 17:27:15, halyavin wrote: > with instruction Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:255: # pieces of said instruction: '4[0] inc [%eax]'. On 2013/12/07 17:27:15, halyavin wrote: > Add the empty group which will be replaced with the note. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:271: # Note that last group is only used in the replacement. It's used to grab marks On 2013/12/07 17:27:15, halyavin wrote: > the last group Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:271: # Note that last group is only used in the replacement. It's used to grab marks On 2013/12/07 17:27:15, halyavin wrote: > Replacement is a poor choice of word in this sentence because it is exactly > replacements field that doesn't contain the last group. I suggest to reformulate > the last 2 paragraphs to avoid using 'replacement' word in this meaning (but it > is ok to use 'replacement' word in the other meaning). > For example: ... Then it will replace groups to produce the following 8 > instructions: ... > ...and will substitute one "compressed instruction"... > ... Note that the last group is only used in the substitution ... > ... It matches marks added by the previous compressors and allows us to insert > the new mark. ... Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:280: def __init__(self, regex, subst, replacements=None): On 2013/12/07 17:27:15, halyavin wrote: > replacements are never None. Such rules are useless. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:283: self.replacements = [] if replacements is None else replacements On 2013/12/07 17:27:15, halyavin wrote: > This line can be simplified to self.replacements=replacements Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:293: return format_str + instruction[pos:match.start(len(match.groups()))] On 2013/12/07 17:27:15, halyavin wrote: > You are not replacing the last group with the mark. As far as I understand this > is intended behavior but you need to update the comment. > Actually this behavior is bad. Take a look on AllMemoryNonMemoryCompressors, e.g.: there we need to use '()()' in regex to handle this case. This is remnant of very early [bad] design decision, but sadly it was not all that easy to fix since it must be changed in all callsites. > We need to rethink this moment. Does regex always ends with '.*()'? Why ignoring > the comment completely is ok? If the last group matches '# reg to rm' for > example, then it means that there is no replacement that will match the original > instruction itself. > > If the last group is always '()' and at the end of the regex, we can use simpler > expression instruction[pos:] here. This will mean that we always replace last > group with itself. This behavior is easy to understand. > > Also, in this case we must fix the comment above about last group grabbing > comments leaved by previous compressors. I've bitten the bullet and fixed all the callsites instead. I've tried to add some logic to CompressOneMatch instead and quickly found out that it's basically impossible to change anything there: add couple of pythong instructions (ifs, etc) and execution time goes up two, three times! https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:395: REGISTERS_RE['st\\(0\\)'] = REGISTERS_RE['st(0)'] On 2013/12/07 17:27:15, halyavin wrote: > Do we need this line? Probably not. Removed. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:403: # Register which can not be used as base in 64-bit mode in all incarnations On 2013/12/07 17:27:15, halyavin wrote: > The comment is wrong. We do can use %r15, %rbp, %rsp and %rip as base. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:436: if index_text in X86_64_BASE_REGISTERS - set(['%r15']): On 2013/12/07 17:27:15, halyavin wrote: > We wanted to add comment here. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:836: if options.bitness == 32: On 2013/12/07 17:27:15, halyavin wrote: > # We only match memory operands without offset, so we don't need to add zeros to > the end of modrm_regex and modrm_sib_regex regular expressions. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:837: regex_mem = '\\(%esp,%eax,1\\)' On 2013/12/07 17:27:15, halyavin wrote: > regex_mem->mem_regex for consistency. Actually it makes more sense to put _mem as suffix. Changes other *_regex variables instead. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:898: output = None On 2013/12/07 17:27:15, halyavin wrote: > rename to output_rr_regex regex_output_rr https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:899: output_note = None On 2013/12/07 17:27:15, halyavin wrote: > rename to output_rr_subst subst_output_rr https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:905: output = '%' + reg + '|None' On 2013/12/07 17:27:15, halyavin wrote: > We need to either remove None from here or explain how it could happen. Removed. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:921: output = '%' + rm + '|None' On 2013/12/07 17:27:15, halyavin wrote: > We need to add comment of why we are doing such slightly incorrect compression. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:934: if options.bitness == 64: On 2013/12/07 17:49:32, halyavin wrote: > # We generate 2 variants of compressors with memory operands to enumerate 2 > possible values of index_r8 property. Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:937: if memory_accessed: On 2013/12/07 17:27:15, halyavin wrote: > Can we move regex += '; input_rr=(%eax|%r8d)' here and > regex += '; input_rr = (any_nonspecial)' to the else clause? Done. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:938: input_note = '[%eax..%edi]' On 2013/12/07 17:27:15, halyavin wrote: > input_note->input_rr_subst subst_input_rr https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:939: input_note_r8 = '[%r8d..%r15d]' On 2013/12/07 17:27:15, halyavin wrote: > input_note_r8->input_rr_subst_r8 Removed. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:942: input_note_r8 = 'any_nonspecial' On 2013/12/07 17:49:32, halyavin wrote: > r8->index8 Removed https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:952: notes_register = ' # rm to reg' On 2013/12/07 17:27:15, halyavin wrote: > notes_register->notes_register_subst N/A: Changed (and hopefully simplified) logic differently instead. https://codereview.chromium.org/49183002/diff/1150001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:957: notes = ' # ' + notes On 2013/12/07 17:27:15, halyavin wrote: > notes_subst = '# ' + notes N/A: Changed (and hopefully simplified) logic differently instead.
https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1130001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1209: ' \\w* (?:\\$0x0,|%ax,|%st,)?' On 2013/12/06 16:30:38, halyavin wrote: > Use raw literals when you are escaping '\\' Done.
https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:949: subst = ('XX', subst_rm, '[%{}..%{}]'.format(start_reg, end_reg)) You created subst_reg which you can use here. https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:961: subst = ('XX', '[%{}..%{}]'.format(start_reg, end_reg), subst_rm) The same. https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1283: if reg[0:2] == 'r8': REGISTERS[reg][-1].startswith('r15') https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1292: end_out = REGISTERS[out_reg][-1 if out_reg[0:2] != 'r8' else -2] Use end_out.startswith('r15') instead.
Significant robustness change: we no longer generate version where information is hidden (interval %rax..%r14 where %rbp and %rsp is not included) and, even more importantly, produce %eax|%ecx|%edx|%ebx|%esi|%edi instead of %eax..%edi is %rbp and %rsp are forbidden. Unfortunately it significantly changes both output and logs... https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... File src/trusted/validator_ragel/compress_regular_instructions.py (right): https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:949: subst = ('XX', subst_rm, '[%{}..%{}]'.format(start_reg, end_reg)) On 2013/12/10 08:03:34, halyavin wrote: > You created subst_reg which you can use here. Done. https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:961: subst = ('XX', '[%{}..%{}]'.format(start_reg, end_reg), subst_rm) On 2013/12/10 08:03:34, halyavin wrote: > The same. Done. https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1283: if reg[0:2] == 'r8': On 2013/12/10 08:03:34, halyavin wrote: > REGISTERS[reg][-1].startswith('r15') This code is removed compretely. https://codereview.chromium.org/49183002/diff/1440001/src/trusted/validator_r... src/trusted/validator_ragel/compress_regular_instructions.py:1292: end_out = REGISTERS[out_reg][-1 if out_reg[0:2] != 'r8' else -2] On 2013/12/10 08:03:34, halyavin wrote: > Use end_out.startswith('r15') instead. This code is removed completely. |