Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(385)

Issue 8188: Some new regexp infrastructure. (Closed)

Created:
12 years, 1 month ago by Christian Plesner Hansen
Modified:
9 years, 7 months ago
CC:
v8-dev
Visibility:
Public.

Description

Some new regexp infrastructure. Pretty rudimentary.

Patch Set 1 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+1659 lines, -17 lines) Patch
M src/ast.h View 1 chunk +194 lines, -0 lines 11 comments Download
M src/ast.cc View 2 chunks +153 lines, -0 lines 6 comments Download
M src/flag-definitions.h View 1 chunk +2 lines, -0 lines 0 comments Download
M src/globals.h View 1 chunk +1 line, -0 lines 0 comments Download
M src/jsregexp.h View 1 chunk +13 lines, -0 lines 2 comments Download
M src/jsregexp.cc View 3 chunks +477 lines, -0 lines 9 comments Download
M src/list.h View 2 chunks +3 lines, -0 lines 0 comments Download
M src/parser.h View 1 chunk +2 lines, -0 lines 0 comments Download
M src/parser.cc View 4 chunks +567 lines, -0 lines 7 comments Download
M src/string-stream.h View 2 chunks +5 lines, -2 lines 0 comments Download
M src/string-stream.cc View 8 chunks +23 lines, -14 lines 0 comments Download
M test/cctest/SConscript View 1 chunk +1 line, -1 line 0 comments Download
A test/cctest/test-regexp.cc View 1 chunk +218 lines, -0 lines 0 comments Download

Messages

Total messages: 5 (0 generated)
Christian Plesner Hansen
12 years, 1 month ago (2008-10-27 10:25:28 UTC) #1
Lasse Reichstein
LGTM. Mainly style comments. http://codereview.chromium.org/8188/diff/1/2 File src/ast.cc (right): http://codereview.chromium.org/8188/diff/1/2#newcode197 Line 197: class RegExpUnparser: public RegExpVisitor ...
12 years, 1 month ago (2008-10-27 13:12:57 UTC) #2
Erik Corry
I only skimmed it. Looks good to me. http://codereview.chromium.org/8188/diff/1/3 File src/ast.h (right): http://codereview.chromium.org/8188/diff/1/3#newcode1184 Line 1184: ...
12 years, 1 month ago (2008-10-27 14:58:43 UTC) #3
Erik Corry
More comments not in the last group for some reason.
12 years, 1 month ago (2008-10-27 14:59:05 UTC) #4
Christian Plesner Hansen
12 years, 1 month ago (2008-10-27 18:57:02 UTC) #5
Thanks for all the comments.  I've fixed (almost) all of them and submitted the
code.

http://codereview.chromium.org/8188/diff/1/2
File src/ast.cc (right):

http://codereview.chromium.org/8188/diff/1/2#newcode197
Line 197: class RegExpUnparser: public RegExpVisitor {
I can't be bothered to write the full format but I've added a comment that says
that it prints the trees as sexps.  This code should be moved to the test file
later but I think it will be useful for us at this point to have it as part of
the library as well.

http://codereview.chromium.org/8188/diff/1/2#newcode253
Line 253: stream()->Add("^");
Nope, the style guide allows both forms.  "In general, curly braces are not
required for single-line statements, but they are allowed if you like them".  I
don't like them.

http://codereview.chromium.org/8188/diff/1/2#newcode257
Line 257: if (first) first = false;
Good point.

http://codereview.chromium.org/8188/diff/1/3
File src/ast.h (right):

http://codereview.chromium.org/8188/diff/1/3#newcode1184
Line 1184: // Regular expressions
I copied the banner style from parser.cc which doesn't use periods at the end so
I prefer to leave it as it is.

http://codereview.chromium.org/8188/diff/1/3#newcode1236
Line 1236: enum Type { START, END, BOUNDARY, NON_BOUNDARY };
Good point.

http://codereview.chromium.org/8188/diff/1/3#newcode1249
Line 1249: CharacterRange(uc32 from, uc32 to, bool is_special)
I've renamed it to is_character_class_.

http://codereview.chromium.org/8188/diff/1/3#newcode1271
Line 1271: unsigned from_ : 21;
Yes, and then when we unfold the character classes so we don't need the
is_character_class_ bit we can fit a CharacterRange in a single word.

http://codereview.chromium.org/8188/diff/1/3#newcode1332
Line 1332: class RegExpCapture: public RegExpTree {
The ast nodes that I don't yet convert into nodes are incomplete so once we
start actually using them we'll add the info.

http://codereview.chromium.org/8188/diff/1/6
File src/jsregexp.cc (right):

http://codereview.chromium.org/8188/diff/1/6#newcode954
Line 954: case '.':
This is the general '.'.  However, I don't expect this code to survive long so
it's probably not worth the bother to fix it.

http://codereview.chromium.org/8188/diff/1/6#newcode986
Line 986: break;
No, I've fixed it.

http://codereview.chromium.org/8188/diff/1/6#newcode1015
Line 1015: Vector<const uc32> input);
No.  It was getting late...

http://codereview.chromium.org/8188/diff/1/7
File src/jsregexp.h (right):

http://codereview.chromium.org/8188/diff/1/7#newcode131
Line 131: class RegExpEngine {
Yes.

http://codereview.chromium.org/8188/diff/1/9
File src/parser.cc (right):

http://codereview.chromium.org/8188/diff/1/9#newcode3271
Line 3271: return c == '|' || c == ')' || c == '\0';
Hmm.  I've replaced it with a special kEndMarker which is just a different name
for kBadChar.

http://codereview.chromium.org/8188/diff/1/9#newcode3595
Line 3595: // other implementations...
I think we're free to do whatever we want with \8 and \9 and I suggest we treat
them as '8' and '9'.  Otherwise we're probably stuck with the
octal/decimal-back-reference confusion.

By the way, you forgot to mention leading zeros.  The spec explicitly disallows
them but lo and behold:

  js> /\000011/.test("\x09")
  true

http://codereview.chromium.org/8188/diff/1/9#newcode3662
Line 3662: uc32 c = ParseCharacterEscape(CHECK_OK);
Right, I expect what we'll do is check for backreferences first in the atom
parser and then if we can verify that it isn't one fall through to here.

Powered by Google App Engine
This is Rietveld 408576698