Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(760)

Issue 3850005: Use finite-length end-anchored regexps to reduce part of regexp that is searched. (Closed)

Created:
10 years, 2 months ago by Lasse Reichstein
Modified:
9 years, 6 months ago
Reviewers:
Erik Corry
CC:
v8-dev
Visibility:
Public.

Description

Use finite-length end-anchored regexps to reduce part of regexp that is searched.

Patch Set 1 #

Total comments: 22

Patch Set 2 : Addressed review comments. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+186 lines, -20 lines) Patch
M src/arm/regexp-macro-assembler-arm.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/arm/regexp-macro-assembler-arm.cc View 1 2 chunks +13 lines, -1 line 0 comments Download
M src/ast.h View 6 chunks +11 lines, -6 lines 0 comments Download
M src/ast.cc View 1 chunk +40 lines, -9 lines 0 comments Download
M src/bytecodes-irregexp.h View 1 1 chunk +2 lines, -1 line 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.cc View 1 2 chunks +11 lines, -1 line 0 comments Download
M src/interpreter-irregexp.cc View 1 chunk +9 lines, -0 lines 0 comments Download
M src/jsregexp.cc View 2 chunks +13 lines, -1 line 0 comments Download
M src/regexp-macro-assembler.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/regexp-macro-assembler-irregexp.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/regexp-macro-assembler-irregexp.cc View 1 chunk +6 lines, -0 lines 0 comments Download
M src/regexp-macro-assembler-tracer.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/regexp-macro-assembler-tracer.cc View 1 chunk +6 lines, -0 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.cc View 1 2 chunks +13 lines, -1 line 0 comments Download
M test/mjsunit/regexp.js View 1 1 chunk +56 lines, -0 lines 0 comments Download

Messages

Total messages: 3 (0 generated)
Lasse Reichstein
10 years, 2 months ago (2010-10-18 12:28:41 UTC) #1
Erik Corry
LGTM with better test. http://codereview.chromium.org/3850005/diff/1/2 File src/arm/regexp-macro-assembler-arm.cc (right): http://codereview.chromium.org/3850005/diff/1/2#newcode145 src/arm/regexp-macro-assembler-arm.cc:145: Label inside_string; Looks superfluous! http://codereview.chromium.org/3850005/diff/1/2#newcode931 ...
10 years, 2 months ago (2010-10-19 08:56:19 UTC) #2
Lasse Reichstein
10 years, 2 months ago (2010-10-19 09:41:30 UTC) #3
http://codereview.chromium.org/3850005/diff/1/2
File src/arm/regexp-macro-assembler-arm.cc (right):

http://codereview.chromium.org/3850005/diff/1/2#newcode145
src/arm/regexp-macro-assembler-arm.cc:145: Label inside_string;
Removed.

http://codereview.chromium.org/3850005/diff/1/2#newcode931
src/arm/regexp-macro-assembler-arm.cc:931: Label after_position;
Changed here.

http://codereview.chromium.org/3850005/diff/1/2#newcode935
src/arm/regexp-macro-assembler-arm.cc:935: LoadCurrentCharacterUnchecked(-1, 1);
Comment added.

http://codereview.chromium.org/3850005/diff/1/6
File src/bytecodes-irregexp.h (right):

http://codereview.chromium.org/3850005/diff/1/6#newcode92
src/bytecodes-irregexp.h:92: V(SET_CURRENT_POSITION_FROM_END, 48, 4) /* bc8
offset24 uint32              */
Changed to /* bc8 idx24 */
I changed the representation at some point but forgot the comment.

http://codereview.chromium.org/3850005/diff/1/7
File src/ia32/regexp-macro-assembler-ia32.cc (right):

http://codereview.chromium.org/3850005/diff/1/7#newcode968
src/ia32/regexp-macro-assembler-ia32.cc:968: Label after_position;
On 2010/10/19 08:56:19, Erik Corry wrote:
> NearLabel

Done.

http://codereview.chromium.org/3850005/diff/1/7#newcode972
src/ia32/regexp-macro-assembler-ia32.cc:972: LoadCurrentCharacterUnchecked(-1,
1);
On 2010/10/19 08:56:19, Erik Corry wrote:
> Comment.

Done.

http://codereview.chromium.org/3850005/diff/1/10
File src/jsregexp.cc (right):

http://codereview.chromium.org/3850005/diff/1/10#newcode5247
src/jsregexp.cc:5247: max_length < kMaxBacksearchLimit) {
The extra check at the start introduces a slight overhead, independently of the
input string.
For all input strings shorter than the max_length, the overhead is wasted (we
won't do anything). 
I.e., if the max_length is large, it's likely that the extra code is pure
overhead.
Also, making the max less than 24 bits allows me to use only one word for the
bytecode.

I expect regexps with a very long, but finite, max_length to be very rare (it
doesn't contain any * or + quantifiers, but yet it allows very large matches).

http://codereview.chromium.org/3850005/diff/1/16
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/3850005/diff/1/16#newcode1057
src/x64/regexp-macro-assembler-x64.cc:1057: Label after_position;
On 2010/10/19 08:56:19, Erik Corry wrote:
> NearLabel

Done.

http://codereview.chromium.org/3850005/diff/1/16#newcode1061
src/x64/regexp-macro-assembler-x64.cc:1061: LoadCurrentCharacterUnchecked(-1,
1);
On 2010/10/19 08:56:19, Erik Corry wrote:
> Comment.

Done.

http://codereview.chromium.org/3850005/diff/1/18
File test/mjsunit/regexp.js (right):

http://codereview.chromium.org/3850005/diff/1/18#newcode594
test/mjsunit/regexp.js:594: // Check that end-anchored regexps are optimized
correctly.
Adding long-regexp/short-string and $ inside disjunction.
Other cases are already accounted for.

http://codereview.chromium.org/3850005/diff/1/18#newcode607
test/mjsunit/regexp.js:607: re.lastIndex = 5;
Start-index close to end is here.

http://codereview.chromium.org/3850005/diff/1/18#newcode612
test/mjsunit/regexp.js:612: var re = /^(?:a|bc)g$/g;
Anchored at both ends is here.

Powered by Google App Engine
This is Rietveld 408576698