Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(11)

Issue 9965107: MIPS: Regexp: Improve the speed that we scan for an initial point (Closed)

Created:
8 years, 8 months ago by kalmard
Modified:
8 years, 8 months ago
CC:
v8-dev
Visibility:
Public.

Description

MIPS: Regexp: Improve the speed that we scan for an initial point Port r11204 (f20b1723). Original commit message: Regexp: Improve the speed that we scan for an initial point where a non-anchored regexp can match by using a Boyer-Moore-like table. This is done by identifying non-greedy non-capturing loops in the nodes that eat any character one at a time. For example in the middle of the regexp /foo[\s\S]*?bar/ we find such a loop. There is also such a loop implicitly inserted at the start of any non-anchored regexp. When we have found such a loop we look ahead in the nodes to find the set of characters that can come at given distances. For example for the regexp /.?foo/ we know that there are at least 3 characters ahead of us, and the sets of characters that can occur are [any, [f, o], [o]]. We find a range in the lookahead info where the set of characters is reasonably constrained. In our example this is from index 1 to 2 (0 is not constrained). We can now look 3 characters ahead and if we don't find one of [f, o] (the union of [f, o] and [o]) then we can skip forwards by the range size (in this case 2). For Unicode input strings we do the same, but modulo 128. We also look at the first string fed to the regexp and use that to get a hint of the character frequencies in the inputs. This affects the assessment of whether the set of characters is 'reasonably constrained'. We still have the old lookahead mechanism, which uses a wide load of multiple characters followed by a mask and compare to determine whether a match is possible at this point. BUG= TEST=

Patch Set 1 #

Patch Set 2 : Rebased on r11256. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+4 lines, -2 lines) Patch
M src/mips/regexp-macro-assembler-mips.cc View 1 2 chunks +4 lines, -2 lines 0 comments Download

Messages

Total messages: 4 (0 generated)
kalmard
8 years, 8 months ago (2012-04-03 14:09:07 UTC) #1
kalmard
Rebased on r11256.
8 years, 8 months ago (2012-04-10 15:57:35 UTC) #2
Erik Corry
LGTM landed as r11281 with some whitespace fixes.
8 years, 8 months ago (2012-04-12 07:46:03 UTC) #3
kalmard
8 years, 8 months ago (2012-04-12 08:17:04 UTC) #4
Thanks for landing. Closing.

Powered by Google App Engine
This is Rietveld 408576698