Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(238)

Issue 507051: Attempt to make \b\w+ faster. Slight performance increase on, e.g., string unpacking. (Closed)

Created:
11 years ago by Lasse Reichstein
Modified:
9 years, 6 months ago
Reviewers:
Erik Corry
CC:
v8-dev
Visibility:
Public.

Description

Attempt to make \b\w+ faster. Slight performance increase on, e.g., string unpacking.

Patch Set 1 #

Total comments: 2

Patch Set 2 : Removed preloading change. Didn't do nothing. #

Patch Set 3 : Implement single character lookahead for \b #

Total comments: 30

Patch Set 4 : Addressed review comments. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+1102 lines, -182 lines) Patch
M src/arm/regexp-macro-assembler-arm.h View 1 2 3 1 chunk +0 lines, -2 lines 0 comments Download
M src/arm/regexp-macro-assembler-arm.cc View 1 2 3 5 chunks +62 lines, -31 lines 0 comments Download
M src/ast.h View 1 2 3 3 chunks +12 lines, -8 lines 0 comments Download
M src/flag-definitions.h View 1 chunk +3 lines, -0 lines 0 comments Download
M src/heap.cc View 1 2 3 1 chunk +1 line, -1 line 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.h View 1 2 3 1 chunk +1 line, -4 lines 0 comments Download
M src/ia32/regexp-macro-assembler-ia32.cc View 1 2 3 3 chunks +82 lines, -43 lines 0 comments Download
M src/jsregexp.h View 1 2 3 14 chunks +122 lines, -8 lines 0 comments Download
M src/jsregexp.cc View 1 2 3 13 chunks +659 lines, -18 lines 0 comments Download
M src/parser.cc View 1 2 3 4 chunks +12 lines, -6 lines 0 comments Download
M src/regexp-macro-assembler.h View 1 2 3 1 chunk +0 lines, -2 lines 0 comments Download
M src/regexp-macro-assembler-tracer.h View 1 2 3 1 chunk +0 lines, -2 lines 0 comments Download
M src/regexp-macro-assembler-tracer.cc View 1 2 3 1 chunk +1 line, -8 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.h View 1 2 3 1 chunk +0 lines, -2 lines 0 comments Download
M src/x64/regexp-macro-assembler-x64.cc View 1 2 3 3 chunks +85 lines, -46 lines 0 comments Download
M test/cctest/test-regexp.cc View 1 chunk +62 lines, -1 line 0 comments Download

Messages

Total messages: 4 (0 generated)
Lasse Reichstein
Early warning review. I'll be removing some of the changes again. http://codereview.chromium.org/507051/diff/1/4 File src/ast.h (right): ...
11 years ago (2009-12-18 14:26:28 UTC) #1
Lasse Reichstein
Please do review now.
11 years ago (2009-12-18 15:51:48 UTC) #2
Erik Corry
I don't see the test that uses the posessive syntax. LGTM http://codereview.chromium.org/507051/diff/5001/6001 File src/arm/regexp-macro-assembler-arm.cc (right): ...
10 years, 11 months ago (2010-01-04 12:56:40 UTC) #3
Lasse Reichstein
10 years, 11 months ago (2010-01-06 17:00:49 UTC) #4
Addressed comments.
Added support for word/non-word matching on ARM. Please check these.
Still needs to add test for possessive syntax.

http://codereview.chromium.org/507051/diff/5001/6001
File src/arm/regexp-macro-assembler-arm.cc (right):

http://codereview.chromium.org/507051/diff/5001/6001#newcode473
src/arm/regexp-macro-assembler-arm.cc:473: if (!preloaded) {
Sadly not. We should only preload if we support the character class. That would
mean first switching on the type to see if we should preload, and then switching
again to do the actual matching - which means duplicating the selection logic
and keeping the two duplicates in sync.

Ok, I rewrote it completely, moving the loading out of the check function and
relying on the caller to load the character. Much better that way, and more
consistent with other check functions.

http://codereview.chromium.org/507051/diff/5001/6006
File src/ia32/regexp-macro-assembler-ia32.cc (right):

http://codereview.chromium.org/507051/diff/5001/6006#newcode647
src/ia32/regexp-macro-assembler-ia32.cc:647: __
sub(Operand(current_character()), Immediate(0x0b));
It might be a coincidence, but we never use the loaded character after checking
for a character class.
But it's unnecessary to destroy it. I'll change the sub to a lea.

http://codereview.chromium.org/507051/diff/5001/6008
File src/jsregexp.cc (right):

http://codereview.chromium.org/507051/diff/5001/6008#newcode1630
src/jsregexp.cc:1630: int NegativeLookaheadChoiceNode::EatsAtLeast(int
still_to_find,
Fixed.

http://codereview.chromium.org/507051/diff/5001/6008#newcode2056
src/jsregexp.cc:2056: true,
Added.

http://codereview.chromium.org/507051/diff/5001/6008#newcode2134
src/jsregexp.cc:2134: assembler->GoTo(on_non_word);
Hence the "if(expect_word_character) {" above :)

http://codereview.chromium.org/507051/diff/5001/6008#newcode4101
src/jsregexp.cc:4101: SetRelation CharacterRange::WordCharacterRelation(

I could, but Merge needs to actually build the three sets (first only,
intersection, second only), while this function only determines whether there
would be something in the sets.

If something should be reused, it would probably be by abstracting the actions
of Merge (using either templates or an accumulator object with virtual methods)
so that it didn't need to build the result, but could take any action for each
range it outputs. I don't think it's much simpler, though.

http://codereview.chromium.org/507051/diff/5001/6008#newcode4416
src/jsregexp.cc:4416: kInsideFirst = 1,
The meanings are different. This enum represents the location of a single
element (in the four areas of a Venn diagram). SetRelation represents how two
sets relate, i.e., which areas of the Venn diagram for the two sets contain
elements.

If there is any relation, it would be that SetRelation::kInFirst = 1 <<
kInsideFirst (etc).
I'll move this enum to the header file and implement the SetRelation enum in
terms of this enum.

http://codereview.chromium.org/507051/diff/5001/6008#newcode4420
src/jsregexp.cc:4420: 
added

http://codereview.chromium.org/507051/diff/5001/6008#newcode4421
src/jsregexp.cc:4421: // Utility function from CharacterRange::Merge. Adds a
range at the end of
Fixed

http://codereview.chromium.org/507051/diff/5001/6008#newcode4485
src/jsregexp.cc:4485: kInsideFirst = 1,
One of the ones in this file is superflous. I'll retain the on at top level only
and move it to the .h file.

http://codereview.chromium.org/507051/diff/5001/6008#newcode5001
src/jsregexp.cc:5001: case AFTER_NONWORD_CHARACTER:
Not precisely enough. If the ComputeFirstCharacterSet result is used for
anything else than detecting boundaries (and I want to use it for detecting
possessive loops too), we need to know the exact first character set, not just
that it's a subset of word characters or subset of non-word characters.

Also, the meaning of AFTER_NONWORD_CHARACTER does not assume anything of the
next character - even though we can currently only introduce it by detecting a
following word character after a boundaray assertion.

http://codereview.chromium.org/507051/diff/5001/6008#newcode5040
src/jsregexp.cc:5040: if (text.type == TextElement::ATOM) {
If I understand it correctly, we convert ranges and atoms to include their
case-equivalent characters when we create the Node from the AST. The case
independence flag shouldn't matter after this.

http://codereview.chromium.org/507051/diff/5001/6009
File src/jsregexp.h (right):

http://codereview.chromium.org/507051/diff/5001/6009#newcode839
src/jsregexp.h:839: // Non-default types. Used for modifying a boundary node if
its surrounding
Possibly. I didn't really like either.
It's a type that doesn't correspond to one directly expressible in the syntax.
Perhaps I shouldn't try to find a short way to say that. I'll elaborate.

http://codereview.chromium.org/507051/diff/5001/6009#newcode841
src/jsregexp.h:841: AFTER_NONWORD_CHARACTER,
The name is correct for what it's used for. We actually reduce the AT_BOUNDARY
to AFTER_NONWORD_CHARACTER and only check that the previous character is a
non-word character. We have statically determined that the next character must
be a word character, otherwise we will fail anyway.

If it was named NONWORD_WORD_BOUNDARY we would expect to need to check both the
previous and next characters.

Powered by Google App Engine
This is Rietveld 408576698