Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(217)

Issue 521028: Direct call to native RegExp code from JavaScript (Closed)

Created:
10 years, 11 months ago by Søren Thygesen Gjesse
Modified:
9 years, 7 months ago
CC:
v8-dev
Visibility:
Public.

Description

Direct call to native RegExp code from JavaScript. Calls to RegExp no longer have to be via a call to the runtime system. A new stub have been added which can handle this call in generated code. The stub checks all the parameters and creates RegExp entry frame in the same way as it is created by the runtime system. Bailout to the runtime system is done whenever an uncommon situation is encountered or when the static data used is not initialized. After running the native RegExp code the last match info is updated like in the runtime system. Currently only ASCII strings are handled. Added another argument to the RegExp entry frame. It indicated whether the call is direct from JavaScript code or through the runtime system. This information is used when RegExp execution is interrupted. If an interruption happens when RegExp code is called directly a retry is issued causing the interruption to be handled via the runtime system. The reason for this is that the direct call to RegExp code does not support garbage collection. Committed: http://code.google.com/p/v8/source/detail?r=3542

Patch Set 1 #

Total comments: 83

Patch Set 2 : '' #

Patch Set 3 : '' #

Unified diffs Side-by-side diffs Delta from patch set Stats (+530 lines, -79 lines) Patch
M src/arm/codegen-arm.h View 1 chunk +3 lines, -0 lines 0 comments Download
M src/arm/codegen-arm.cc View 1 chunk +13 lines, -0 lines 0 comments Download
M src/arm/regexp-macro-assembler-arm.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/arm/regexp-macro-assembler-arm.cc View 1 2 chunks +9 lines, -3 lines 0 comments Download
M src/arm/simulator-arm.h View 2 chunks +5 lines, -5 lines 0 comments Download
M src/assembler.h View 1 1 chunk +5 lines, -0 lines 0 comments Download
M src/assembler.cc View 1 1 chunk +13 lines, -0 lines 0 comments Download
M src/code-stubs.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/codegen.h View 1 chunk +20 lines, -0 lines 0 comments Download
M src/codegen.cc View 1 chunk +1 line, -0 lines 0 comments Download
M src/flag-definitions.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/ia32/codegen-ia32.h View 1 chunk +3 lines, -0 lines 0 comments Download
M src/ia32/codegen-ia32.cc View 1 3 chunks +288 lines, -0 lines 0 comments Download
M src/ia32/macro-assembler-ia32.h View 1 chunk +9 lines, -0 lines 0 comments Download
M src/ia32/macro-assembler-ia32.cc View 1 chunk +11 lines, -0 lines 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.cc View 1 3 chunks +20 lines, -8 lines 0 comments Download
M src/ia32/simulator-ia32.h View 1 chunk +3 lines, -3 lines 0 comments Download
M src/jsregexp.h View 1 2 2 chunks +45 lines, -1 line 0 comments Download
M src/jsregexp.cc View 3 chunks +12 lines, -31 lines 0 comments Download
M src/objects.h View 1 2 1 chunk +8 lines, -0 lines 0 comments Download
M src/regexp-delay.js View 3 chunks +3 lines, -3 lines 0 comments Download
M src/regexp-macro-assembler.cc View 4 chunks +4 lines, -13 lines 0 comments Download
M src/regexp-stack.h View 1 chunk +12 lines, -0 lines 0 comments Download
M src/v8-counters.h View 1 chunk +3 lines, -1 line 0 comments Download
M src/x64/codegen-x64.h View 1 chunk +3 lines, -0 lines 0 comments Download
M src/x64/codegen-x64.cc View 1 chunk +13 lines, -0 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.h View 2 chunks +3 lines, -0 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.cc View 1 2 chunks +14 lines, -8 lines 0 comments Download
M src/x64/simulator-x64.h View 1 chunk +3 lines, -3 lines 0 comments Download

Messages

Total messages: 6 (0 generated)
Søren Thygesen Gjesse
10 years, 11 months ago (2010-01-05 08:47:51 UTC) #1
Erik Corry
LGTM Please remember to file a bug to do the optimization on x64 and ARM ...
10 years, 11 months ago (2010-01-05 12:41:16 UTC) #2
Lasse Reichstein
Only read until regexp-stack.h, but LGTM so far. http://codereview.chromium.org/521028/diff/1/28 File src/arm/regexp-macro-assembler-arm.cc (right): http://codereview.chromium.org/521028/diff/1/28#newcode62 src/arm/regexp-macro-assembler-arm.cc:62: * ...
10 years, 11 months ago (2010-01-05 12:46:39 UTC) #3
Lasse Reichstein
LGTM all http://codereview.chromium.org/521028/diff/1/3 File src/ia32/codegen-ia32.cc (right): http://codereview.chromium.org/521028/diff/1/3#newcode8004 src/ia32/codegen-ia32.cc:8004: __ j(greater, &runtime); Nevermind, I see that ...
10 years, 11 months ago (2010-01-06 08:32:41 UTC) #4
Lasse Reichstein
> We just return a failure if there is a stack overflow exception. Apparently > ...
10 years, 11 months ago (2010-01-06 08:36:25 UTC) #5
Søren Thygesen Gjesse
10 years, 11 months ago (2010-01-06 10:51:05 UTC) #6
Thank you both for the thorough review. I have addressed all comments.

http://codereview.chromium.org/521028/diff/1/28
File src/arm/regexp-macro-assembler-arm.cc (right):

http://codereview.chromium.org/521028/diff/1/28#newcode62
src/arm/regexp-macro-assembler-arm.cc:62: *       - direct_call        (if 1
direct call from JavaScript code, if 0 call
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> "1" -> "1,".

Done.

http://codereview.chromium.org/521028/diff/1/28#newcode71
src/arm/regexp-macro-assembler-arm.cc:71: *       - int* capture_array
(int[num_saved_registers_], for output).
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> This line should be removed, and a line for start_index added, to match the
new
> parameter passing.

Done.

http://codereview.chromium.org/521028/diff/1/28#newcode91
src/arm/regexp-macro-assembler-arm.cc:91: *              int start_index,
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> As discussed, if you pass start_index, then "at_start" isn't necessary (just
> create a bug entry about removing it).

Bug 564 added.

http://codereview.chromium.org/521028/diff/1/17
File src/assembler.h (right):

http://codereview.chromium.org/521028/diff/1/17#newcode425
src/assembler.h:425: static ExternalReference address_of_memory_address();
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Perhaps rename to address_of_regexp_stack_memory_address, to make it obvious
> which memory we are talking about.

Done. Also changed address_of_memory_size to address_of_regexp_stack_memory_size

http://codereview.chromium.org/521028/diff/1/3
File src/ia32/codegen-ia32.cc (right):

http://codereview.chromium.org/521028/diff/1/3#newcode5948
src/ia32/codegen-ia32.cc:5948: destination()->Split(less_equal);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Empty line here too, for consistency.

Removed the one I added above instead.

http://codereview.chromium.org/521028/diff/1/3#newcode7935
src/ia32/codegen-ia32.cc:7935: __ jmp(&runtime);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> That looks inefficient. Rather put the "if (FLAG_regexp_entry_native)" around
> the conditional part instead of generating it unconditionally and then jump
past
> it.

Changed

  __ jmp(&runtime);

to
  __ TailCallRuntime(ExternalReference(Runtime::kRegExpExec), 4, 1);
  return;

This avoids a large if around the  main code. I expect this flag to be removed
when this code is thoroughly tested.

http://codereview.chromium.org/521028/diff/1/3#newcode7951
src/ia32/codegen-ia32.cc:7951: __ j(not_equal, &runtime, not_taken);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Default for forward jumps is not_taken, so omit the hint.
> (Or change the assembler to not emit not_taken hints for forward jumps and
taken
> hints for backwards jumps).
> This goes for all the not_taken hints below.

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode7955
src/ia32/codegen-ia32.cc:7955: __ j(equal, &runtime, not_taken);
On 2010/01/05 12:41:17, Erik Corry wrote:
> This looks superfluous.  If it's the undefined object then it isn't a fixed
> array, so it will go to the runtime anyway.  Alternatively it is the fixed
array
> check that is unnecessary.

Removed the check altogether and changed it to an native code ASSERT. In the
runtime system data is cast to an FixedArray if type is IRREGEXP.

http://codereview.chromium.org/521028/diff/1/3#newcode7956
src/ia32/codegen-ia32.cc:7956: __ CmpObjectType(ecx, FIXED_ARRAY_TYPE, ebx);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Checking against undefined is not necessary. It's included in the
> not-fixed-array test.
> Make a comment that the value can never (and MUST never) be a smi.

emoved the check altogether and changed it to an native code ASSERT. In the
runtime system data is cast to an FixedArray if type is IRREGEXP. This native
code ASSERT includes a smi check to document that it will never be a smi.

http://codereview.chromium.org/521028/diff/1/3#newcode7963
src/ia32/codegen-ia32.cc:7963: __ j(not_zero, &runtime, not_taken);
On 2010/01/05 12:41:17, Erik Corry wrote:
> Superfluous test.

Removed smi test.

http://codereview.chromium.org/521028/diff/1/3#newcode7964
src/ia32/codegen-ia32.cc:7964: __ cmp(ebx, JSRegExp::IRREGEXP * 2);  // Type is
a smi.
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Use Smi::valueOf(JSRegExp::IRREGEXP) instead of hardcoding the conversion.
> No need to test for the smi tag first. If it's equal to
Smi::valueof(something)
> then it's smi-tagged.

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode7968
src/ia32/codegen-ia32.cc:7968: // Check that the numbers of captures fit in the
static offsets vector buffer.
On 2010/01/05 12:41:17, Erik Corry wrote:
> numbers -> number

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode7970
src/ia32/codegen-ia32.cc:7970: __ test(edx, Immediate(kSmiTagMask));
On 2010/01/05 12:41:17, Erik Corry wrote:
> Can this ever happen?

No (heap verify insists on it being a smi), check removed.

http://codereview.chromium.org/521028/diff/1/3#newcode7972
src/ia32/codegen-ia32.cc:7972: // Calculate number of capture registers
(number_of_captures + 1) * 2.
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Add:
>   ASSERT_EQ(0, kSmiTag);
>   ASSERT_EQ(1, kSmiTagSize + kSmiShiftSize);
> to ensure that Smi::valueOf(n) == n * 2

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode7975
src/ia32/codegen-ia32.cc:7975: __ cmp(edx,
OffsetsVector::kStaticOffsetsVectorSize);
On 2010/01/05 12:41:17, Erik Corry wrote:
> Some comment and an assert is needed here as you are using the fact that Smis
> are 2x untagged values.

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode7987
src/ia32/codegen-ia32.cc:7987: __ and_(ebx, kStringRepresentationMask);
On 2010/01/05 12:41:17, Erik Corry wrote:
> Where are you checking that the string is ASCII?

Combined the representation and encoding check below.

http://codereview.chromium.org/521028/diff/1/3#newcode7988
src/ia32/codegen-ia32.cc:7988: ASSERT_EQ(0, kSeqStringTag);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Why not test for SeqAsciiString directly here, instead of testing ASCII later?
> (Unless you plan on supporting UC16 strings as well :)

Combined the representation and encoding check below.

http://codereview.chromium.org/521028/diff/1/3#newcode7989
src/ia32/codegen-ia32.cc:7989: __ test(ebx, Operand(ebx));
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> how about just: 
>   test(ebx, Immediate(kStringRepresentationMask))
> (keep the ASSERT).

Combined the representation and encoding check below.

http://codereview.chromium.org/521028/diff/1/3#newcode8002
src/ia32/codegen-ia32.cc:8002: __ sar(eax, 1);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> ASSERT_EQ(1, kSmiTagSize + kSmiShiftSize);

Used SmiUntag instead.

http://codereview.chromium.org/521028/diff/1/3#newcode8004
src/ia32/codegen-ia32.cc:8004: __ j(greater, &runtime);
On 2010/01/05 12:41:17, Erik Corry wrote:
> Greater_equal?

Check in the runtime system is

  RUNTIME_ASSERT(index >= 0);
  RUNTIME_ASSERT(index <= subject->length());

with equal accepted.

http://codereview.chromium.org/521028/diff/1/3#newcode8006
src/ia32/codegen-ia32.cc:8006: // ebx: Length of subject string
On 2010/01/05 12:41:17, Erik Corry wrote:
> We don't use this.

Comment removed.

http://codereview.chromium.org/521028/diff/1/3#newcode8015
src/ia32/codegen-ia32.cc:8015: // Check that the JSArray is in fast case.
On 2010/01/05 12:41:17, Erik Corry wrote:
> Can we ensure that this is always the case?

I don't think so, as it is just a JavaScript argument passed to the %_RegExpExec
call. The "real" lastMatchInfo in regexp-delay.js is an array literal.

http://codereview.chromium.org/521028/diff/1/3#newcode8020
src/ia32/codegen-ia32.cc:8020: __ j(not_equal, &runtime, not_taken);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Is it necessary to check the type if we also check the map? (I.e., can any
> non-fixed-array every have fixed_array_map as map?)

No, map check is sufficient.

http://codereview.chromium.org/521028/diff/1/3#newcode8029
src/ia32/codegen-ia32.cc:8029: // Check the encoding of the subject string (only
support ascii).
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Why only ASCII? UC16 should be just as simple, except for a multiplier here
and
> there :)

Two byte strings will be added in a separate change.

http://codereview.chromium.org/521028/diff/1/3#newcode8033
src/ia32/codegen-ia32.cc:8033: __ and_(ebx, kStringEncodingMask);
On 2010/01/05 12:41:17, Erik Corry wrote:
> You can do this together with the representation above by using the full
> representation.  Of course if you intend to support both encodings you can
move
> the representation test down here instead.

Combined the representation and encoding check here.

http://codereview.chromium.org/521028/diff/1/3#newcode8038
src/ia32/codegen-ia32.cc:8038: // Check that a RegExp stack is allocated.
On 2010/01/05 12:41:17, Erik Corry wrote:
> Can we ensure that there is always a small regexp stack available so we can
skip
> this test?

It might be possible, however I am not 100% sure about the case when using
Locker. Changed to only check size and not both size and address.
Good idea. Added

  RegExpStack stack;

to esure a minimul RegExp stach for this thread. Changed the tests below to
native code ASSERT's. Actually the stack is always allocated because that we
will never do a direct call without having done at least one runtime call first.

http://codereview.chromium.org/521028/diff/1/3#newcode8048
src/ia32/codegen-ia32.cc:8048: __ j(zero, &runtime, not_taken);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Is it necessary to test both?  
> It should be sufficient to check the size being non-zero.

Changed to only check size.

http://codereview.chromium.org/521028/diff/1/3#newcode8053
src/ia32/codegen-ia32.cc:8053: __ test(edx, Immediate(kSmiTagMask));
On 2010/01/05 12:41:17, Erik Corry wrote:
> It must be possible to ensure that this is not a Smi.

It never is - smi check removed.

http://codereview.chromium.org/521028/diff/1/3#newcode8055
src/ia32/codegen-ia32.cc:8055: __ CmpObjectType(edx, CODE_TYPE, ebx);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> No need to test for smi mask. The value can only ever be a Code object or
> the_hole. Or are we assuming that the values can be maliciously tampered with?

Smi check removed.

http://codereview.chromium.org/521028/diff/1/3#newcode8120
src/ia32/codegen-ia32.cc:8120: __ j(equal, &failure_or_exception, taken);
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Could we just do
>  k(not_equal, &runtime)
> ?

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode8125
src/ia32/codegen-ia32.cc:8125: __ mov(Operand(eax), Factory::null_value());
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Where are we handling the pending exception?
> The runtime call would handle it during JS-reentry, but I think we need to
throw
> manually here.

Added code to check for pending exception when result is EXCEPTION. If result is
EXCEPTION and there is no pending exception jump to the runtime system.

http://codereview.chromium.org/521028/diff/1/3#newcode8137
src/ia32/codegen-ia32.cc:8137: // Load last_match_info which is still known to
be a fast case JSArray.
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Well, it's still assumed to be one. I'm sure someone malicious with the
debugger
> could break that assumption by breaking inside the regexp and fiddling with
the
> array (if it's visible somehow).
> 

That might be true, which means that we ought to check all the arguments used
once again. Future debugger support might include mutating locals and function
arguments... I will do that in a separate change.

http://codereview.chromium.org/521028/diff/1/3#newcode8144
src/ia32/codegen-ia32.cc:8144: __ shl(edx, 1);  // Number of capture registers
to smi.
On 2010/01/05 12:41:17, Erik Corry wrote:
> I think the macro assembler has a function call for this.

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode8146
src/ia32/codegen-ia32.cc:8146: __ shr(edx, 1);  // Number of capture registers
back from smi.
On 2010/01/05 12:41:17, Erik Corry wrote:
> And this?

Done.

http://codereview.chromium.org/521028/diff/1/3#newcode8146
src/ia32/codegen-ia32.cc:8146: __ shr(edx, 1);  // Number of capture registers
back from smi.
On 2010/01/06 08:32:41, Lasse Reichstein wrote:
> Add a comment that the value is always non-negative (to explain why it's safe
to
> use shr isntead of sar).

Changed to use SmiUntag - which uses sar :-). It is assumed to be a sensible
value.

http://codereview.chromium.org/521028/diff/1/3#newcode8175
src/ia32/codegen-ia32.cc:8175: __ test(edi, Immediate(0x80000000));
On 2010/01/05 12:41:17, Erik Corry wrote:
> Didn't the shl already set the negative flag?

It did. Removed test instruction and changed condition to negative.

http://codereview.chromium.org/521028/diff/1/3#newcode8175
src/ia32/codegen-ia32.cc:8175: __ test(edi, Immediate(0x80000000));
On 2010/01/06 08:32:41, Lasse Reichstein wrote:
> Yes, and even if it didn't, test(edi,edi) would be shorter.

Removed the test instruction altogether.

http://codereview.chromium.org/521028/diff/1/3#newcode8177
src/ia32/codegen-ia32.cc:8177: __ add(edi, Operand(esp, 2 * kPointerSize));  //
Adding smi to smi.
On 2010/01/05 12:46:39, Lasse Reichstein wrote:
> Is there really no register free to hold this value? What if we were counting
> down instead of up, so we didin't need both the counter and the limit?
> We are reading it repeatedly from memory inside a loop (I know it'll be in the
> level-1 cache, but still).

Done.

http://codereview.chromium.org/521028/diff/1/2
File src/ia32/regexp-macro-assembler-ia32.cc (right):

http://codereview.chromium.org/521028/diff/1/2#newcode62
src/ia32/regexp-macro-assembler-ia32.cc:62: *       - at_start             (if
1, start at start of string, if 0, don't)
On 2010/01/05 12:41:17, Erik Corry wrote:
> start at start of string, if 0, don't-> we are starting at the start of the
> string, otherwise 0

Done.

http://codereview.chromium.org/521028/diff/1/2#newcode66
src/ia32/regexp-macro-assembler-ia32.cc:66: *       - start index         
(character index if start)
On 2010/01/05 12:41:17, Erik Corry wrote:
> if -> of

Done.

Powered by Google App Engine
This is Rietveld 408576698