Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(486)

Issue 8871: Experimental RegExp: changed handling of non-standard escape sequences. (Closed)

Created:
12 years, 1 month ago by Lasse Reichstein
Modified:
9 years, 6 months ago
CC:
v8-dev
Visibility:
Public.

Description

* Modified RegExp parser to handle bad \c, \x, \u and decimal escapes gracefully. if the escape sequence is not valid, the \c, \x or \u are treated as identity escapes (i.e., "c", "x", or "u"). Decimal escapes that are larger than the *current* number of left capture parentheses are treated as 1..3 digit octal numbers, and \8 and \9 are treated as identity escapes. * Added multiline_flag to regexp parser. Please ignore first two patch-sets. Third time is the charm.

Patch Set 1 #

Patch Set 2 : RegExp handling of escapes. #

Patch Set 3 : RegExp escape handling #

Patch Set 4 : RegExp escape handling, with review comments. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+216 lines, -146 lines) Patch
M regexp2000/src/parser.cc View 1 2 3 14 chunks +127 lines, -64 lines 0 comments Download
M regexp2000/test/cctest/test-regexp.cc View 1 2 3 chunks +89 lines, -82 lines 0 comments Download

Messages

Total messages: 1 (0 generated)
Christian Plesner Hansen
12 years, 1 month ago (2008-10-29 12:45:46 UTC) #1
Lgtm, with comments.

http://codereview.chromium.org/8871/diff/207/7
File regexp2000/src/parser.cc (right):

http://codereview.chromium.org/8871/diff/207/7#newcode3639
Line 3639: ASSERT('a'^'A' = 0x20);  // Upper and lower case letters differ by
one bit.
I think you mean double equals.

In any case I think this could be a STATIC_CHECK instead.

http://codereview.chromium.org/8871/diff/207/7#newcode3657
Line 3657: if (has_more() && '0' <= current() && current() <= '7') {
You don't have to explicitly check for has_more() since current() returns
kBadChar when you reach the end, which is not a digit.  I use this to save
has_more() calls all over the place.

http://codereview.chromium.org/8871/diff/207/7#newcode3669
Line 3669: if (!has_more()) {
This check is redundant, the same will happen if you just let it fall through to
line 3680.  There is very rarely a need to explicitly check for the end of the
input explicitly.

http://codereview.chromium.org/8871/diff/207/7#newcode3677
Line 3677: for (int i = 0;;) {
Please separate this into a declaration and a while (true).  Alternatively,
couldn't this be an ordinary for loop if you remove the i++ and condition the
last three lines on (i < length - 1).

http://codereview.chromium.org/8871/diff/207/7#newcode3681
Line 3681: for (int i = chars_seen_count - 1; i >= 0; i--) {
You're reusing the name 'i', that's a Bad Thing(TM).

http://codereview.chromium.org/8871/diff/207/7#newcode3750
Line 3750: // If \u is not followed by a two-digit hexadecimal, treat it
two -> four

Powered by Google App Engine
This is Rietveld 408576698