|
|
Description[regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}.
R=littledan@chromium.org, mathias@qiwi.be
BUG=v8:4743
Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4
Committed: https://crrev.com/a8e88eaab8cdde31bf7972ad94c03e31eb4a3d7d
Cr-Original-Commit-Position: refs/heads/master@{#36969}
Cr-Commit-Position: refs/heads/master@{#36974}
Patch Set 1 #
Total comments: 4
Patch Set 2 : tests #Patch Set 3 : fix wrong expectations #Patch Set 4 : Fix no-i18n build. #
Messages
Total messages: 27 (11 generated)
lgtm https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc File src/regexp/regexp-parser.cc (right): https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc... src/regexp/regexp-parser.cc:365: if (!ParsePropertyClass(ranges, p == 'P')) { I'm a fan of this refactoring to match the spec better. Do we have any tests that get at what something like /\P{Lu}/ui matches (presumably, all code points)? I can't find any.
On 2016/06/13 at 20:23:15, Dan Ehrenberg wrote: > lgtm > > https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc > File src/regexp/regexp-parser.cc (right): > > https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc... > src/regexp/regexp-parser.cc:365: if (!ParsePropertyClass(ranges, p == 'P')) { > I'm a fan of this refactoring to match the spec better. Do we have any tests that get at what something like /\P{Lu}/ui matches (presumably, all code points)? I can't find any. Or, sorry, probably all cased code points.
LGTM https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc File src/regexp/regexp-parser.cc (right): https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc... src/regexp/regexp-parser.cc:365: if (!ParsePropertyClass(ranges, p == 'P')) { On 2016/06/13 20:23:15, Dan Ehrenberg wrote: > Do we have any tests that get at what something like /\P{Lu}/ui matches > (presumably, all code points)? It wouldn’t match all code points, since non-Letter symbols exist. Here’s a string in which any symbol would be matched: "\0@\xFF\u1E85\u2CB7\uA767\u{1D3FF}" Here’s a string in which no symbol would be matched: "\u0130\u2102\u212C\u2145\u{1D400}\u{1D7CA}" The following program prints the code points that shouldn’t be matched: https://mothereff.in/regexpu#input=const+regex+%3D+/%5CP%7BLu%7D/ui%3B%0Afor+... https://codereview.chromium.org/2059113002/diff/1/test/mjsunit/harmony/regexp... File test/mjsunit/harmony/regexp-property-special.js (right): https://codereview.chromium.org/2059113002/diff/1/test/mjsunit/harmony/regexp... test/mjsunit/harmony/regexp-property-special.js:42: f(/\P{Assigned}+/u, "🄰🄱🄲"); Maybe test the empty string as well? f(/\P{Assigned}/u, "");
On 2016/06/13 20:43:58, mathias wrote: > https://codereview.chromium.org/2059113002/diff/1/src/regexp/regexp-parser.cc... > src/regexp/regexp-parser.cc:365: if (!ParsePropertyClass(ranges, p == 'P')) { > On 2016/06/13 20:23:15, Dan Ehrenberg wrote: > > Do we have any tests that get at what something like /\P{Lu}/ui matches > > (presumably, all code points)? > > It wouldn’t match all code points, since non-Letter symbols exist. Sorry, that last part is incorrect. It should’ve said “since Lu symbols exist to which no non-Lu symbol case-folds using the *simple* mapping”.
I addressed comments and added a test case for /\P{Lu}/ui, comparing the result to code produced by regexpu. I hope including regexpu-generated code is alright. I found this one curious thing: If regexpu transpiles with the "use ES6 u flag in output" flag, the result matches \u{3f4}. If transpiled without the ES6 u flag, the result does not match \u{3f4}. Now that could be a bug in V8, but this can also be reproduced on Firefox. U+03F4 in UnicodeData.txt 03F4;GREEK CAPITAL THETA SYMBOL;Lu;0;L;<compat> 0398;;;;N;;;;03B8; and in CaseFolding.txt 03F4; C; 03B8; # GREEK CAPITAL THETA SYMBOL U+03B8 in UnicodeData.txt 03B8;GREEK SMALL LETTER THETA;Ll;0;L;;;;;N;;;0398;;0398 So let me see if I understand this correctly: /\P{Lu}/ includes U+03B8, since its general category is 'Ll'. As per canonicalization in unicode, we apply unicode case folding to the input and the regexp, so we should get a match. Is there maybe a bug in regexpu if the "use ES6 u flag in output" flag is disabled? https://codereview.chromium.org/2059113002/diff/1/test/mjsunit/harmony/regexp... File test/mjsunit/harmony/regexp-property-special.js (right): https://codereview.chromium.org/2059113002/diff/1/test/mjsunit/harmony/regexp... test/mjsunit/harmony/regexp-property-special.js:42: f(/\P{Assigned}+/u, "🄰🄱🄲"); On 2016/06/13 20:43:58, mathias wrote: > Maybe test the empty string as well? > > f(/\P{Assigned}/u, ""); Done.
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from littledan@chromium.org, mathias@qiwi.be Link to the patchset: https://codereview.chromium.org/2059113002/#ps20001 (title: "tests")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2059113002/20001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: v8_linux_nodcheck_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel_ng/bu...) v8_linux_nodcheck_rel_ng_triggered on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel_ng_tr...)
On 2016/06/14 09:44:12, Yang wrote: > I addressed comments and added a test case for /\P{Lu}/ui, comparing the result > to code produced by regexpu. I hope including regexpu-generated code is alright. Of course — it’s just a regular expression :) > I found this one curious thing: […] Good catch! I’ve filed it and will follow up on it here: https://github.com/mathiasbynens/regexpu-core/issues/7 Thanks for the report.
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from littledan@chromium.org, mathias@qiwi.be Link to the patchset: https://codereview.chromium.org/2059113002/#ps40001 (title: "fix wrong expectations")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2059113002/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
CQ bit was unchecked
Message was sent while issue was closed.
Description was changed from ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 ========== to ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969}
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.chromium.org/2065083002/ by yangguo@chromium.org. The reason for reverting is: compile failure.
Message was sent while issue was closed.
Description was changed from ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ========== to ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ==========
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from littledan@chromium.org, mathias@qiwi.be Link to the patchset: https://codereview.chromium.org/2059113002/#ps60001 (title: "Fix no-i18n build.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2059113002/60001
Message was sent while issue was closed.
Description was changed from ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ========== to ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Cr-Commit-Position: refs/heads/master@{#36969} ========== to ========== [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. R=littledan@chromium.org, mathias@qiwi.be BUG=v8:4743 Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4 Committed: https://crrev.com/a8e88eaab8cdde31bf7972ad94c03e31eb4a3d7d Cr-Original-Commit-Position: refs/heads/master@{#36969} Cr-Commit-Position: refs/heads/master@{#36974} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/a8e88eaab8cdde31bf7972ad94c03e31eb4a3d7d Cr-Commit-Position: refs/heads/master@{#36974} |