Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(12)

Issue 21126: Fix some glitches around unicode regexps matching ascii subject strings. (Closed)

Created:
11 years, 10 months ago by Erik Corry
Modified:
9 years, 7 months ago
Reviewers:
Lasse Reichstein
CC:
v8-dev
Visibility:
Public.

Description

Fix some glitches around unicode regexps matching ascii subject strings. This has been fixed in other changes now.

Patch Set 1 #

Total comments: 6
Unified diffs Side-by-side diffs Delta from patch set Stats (+86 lines, -12 lines) Patch
M src/jsregexp.cc View 7 chunks +81 lines, -12 lines 4 comments Download
M src/objects.h View 1 chunk +1 line, -0 lines 1 comment Download
M test/mjsunit/regexp-UC16.js View 1 chunk +4 lines, -0 lines 1 comment Download

Messages

Total messages: 2 (0 generated)
Erik Corry
11 years, 10 months ago (2009-02-06 11:42:34 UTC) #1
Lasse Reichstein
11 years, 10 months ago (2009-02-06 15:50:56 UTC) #2
LGTM with comments

http://codereview.chromium.org/21126/diff/1/4
File src/jsregexp.cc (right):

http://codereview.chromium.org/21126/diff/1/4#newcode1763
Line 1763: }
If uncanonicalize implements Canonicalize^-1 o Canonicalize (where Canonicalize
is according to the specification), then the last loop is not necessary. If
"character" is ASCII, so is every letter in "letters", and if "character" is not
ASCII, neither are any letter in "letters". I.e., the loop can be replaced by:
if (character <= String::kMaxAsciiCharCode) { 
  return length;
} 
return 0;

http://codereview.chromium.org/21126/diff/1/4#newcode1785
Line 1785: if (length == 1) {
What about the length == 0 case? 
Could we emit a fail or backtrack immediately in that case?
Is is that handled by the quick check details so we never get here?

http://codereview.chromium.org/21126/diff/1/4#newcode1786
Line 1786: if (ascii && c > String::kMaxAsciiCharCodeU) {
Wasn't this checked in line 1747, so length will always be 0 if this test would
be successful?

http://codereview.chromium.org/21126/diff/1/4#newcode2333
Line 2333: // succeed.
The comment is a little too pessimistic. A negated character class is completely
equivalent to a non-negated one with at most one more range in its
representation. If nothing else, we could canonicalize character classes to only
sorted non-overlapping positive ranges.

http://codereview.chromium.org/21126/diff/1/3
File src/objects.h (right):

http://codereview.chromium.org/21126/diff/1/3#newcode3239
Line 3239: static const unsigned kMaxAsciiCharCodeU =
unibrow::Utf8::kMaxOneByteChar;
This constant is just an alias for a Unibrow constant. It might be better to use
a unibrow constant directly, since all it's used for is to compare to a
unibrow::uchar.

http://codereview.chromium.org/21126/diff/1/2
File test/mjsunit/regexp-UC16.js (right):

http://codereview.chromium.org/21126/diff/1/2#newcode45
Line 45: assertFalse(/\xc1/i.test('fooA'));
Please give the tests a name (if nothing else, a string representation of the
code that is executed).

Powered by Google App Engine
This is Rietveld 408576698