Description[regexp] implement character classes for unicode regexps.
We divide character ranges into
- BMP, matched normally.
- non-BMP, matched as alternatives of surrogate pair ranges.
- lone surrogates, matched with lookaround assertion that its indeed lone.
R=erik.corry@gmail.com
BUG=v8:2952
LOG=N
Committed: https://crrev.com/ea820ad5fa282a323a86fe20e64f83ee67ba5f04
Cr-Commit-Position: refs/heads/master@{#33432}
Committed: https://crrev.com/e709aa24c0c17abf684972fbb9e887731b20fd41
Cr-Commit-Position: refs/heads/master@{#33437}
Patch Set 1 #Patch Set 2 : simplify /./u a bit. #Patch Set 3 : use constants #Patch Set 4 : refactorings #Patch Set 5 : windows warning fix #Patch Set 6 : lookaround builder #
Total comments: 18
Patch Set 7 : rebase #Patch Set 8 : addressed comments #
Total comments: 4
Patch Set 9 : addressed comments on the test case. #Patch Set 10 : test cases #Patch Set 11 : rebase #
Total comments: 2
Patch Set 12 : fix #Patch Set 13 : more tests #
Dependent Patchsets: Messages
Total messages: 54 (26 generated)
|