|
|
Description[regexp] step back if starting unicode regexp within surrogate pair.
See https://github.com/tc39/ecma262/issues/128
R=erik.corry@gmail.com, littledan@chromium.org
BUG=v8:2952
LOG=N
Committed: https://crrev.com/3246d26b71d0c4d86705e3994b0c328c846a3fd4
Cr-Commit-Position: refs/heads/master@{#33488}
Patch Set 1 #Patch Set 2 : fix small issue #Patch Set 3 : check for step back only at start #Patch Set 4 : rebase #
Total comments: 2
Patch Set 5 : rebase to ToT #
Total comments: 16
Patch Set 6 : addressed comments #Patch Set 7 : land #Patch Set 8 : fix #
Messages
Total messages: 26 (12 generated)
On 2016/01/19 11:13:24, Yang wrote: This depends on https://codereview.chromium.org/1578253005/
Erik, could you review this next? Thanks.
The CQ bit was checked by yangguo@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1608693003/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1608693003/80001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: CLs for remote refs other than refs/pending/heads/master must contain NOTRY=true and NOPRESUBMIT=true in order for the CQ to process them
erikcorry@google.com changed reviewers: + erikcorry@google.com
lgtm https://codereview.chromium.org/1608693003/diff/60001/src/regexp/jsregexp.cc File src/regexp/jsregexp.cc (left): https://codereview.chromium.org/1608693003/diff/60001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5035: Inadvertent edit? https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc File src/regexp/jsregexp.cc (right): https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5051: // Reading forward. Forwrad match the lead surrogate and assert that Forwrad -> Forward Also, this ternary expression is too big, and would be better as an 'if' IMHO https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5052: // no Unfortunate line break. (Autoformatter doesn't really understand block comments. If you are using vim then gq is smarter.) https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5132: AddUnanchoredAdvance(compiler, result, on_success); Is this better than what the other half of the 'if' would add? https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5140: return result; I think it's worth checking if the choice node has only one alternative, and returning that instead in that case. Eg. \d or [,.] will only have one alternative, and we will get better code I think (no flushing). https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6545: RegExpNode* OptionallyStepBackToLeadSurrogate(RegExpCompiler* compiler, Going backwards is some standards-mandated strangeness for sure. Wil this always make progress on global matches? Eg if I search for /()/ with /g and I start in the middle of a pair? https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6559: int position_register = compiler->UnicodeLookaroundPositionRegister(); How does this actually do the backwards step? Some cleverness with the position register? https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6638: node = OptionallyStepBackToLeadSurrogate(&compiler, node); Do we need this for non-sticky non-global regexps? Don't they always start at the beginning? https://codereview.chromium.org/1608693003/diff/80001/test/mjsunit/harmony/un... File test/mjsunit/harmony/unicode-regexp-last-index.js (right): https://codereview.chromium.org/1608693003/diff/80001/test/mjsunit/harmony/un... test/mjsunit/harmony/unicode-regexp-last-index.js:7: var r = /./ug; All these tests should be duplicated without the /u flag, to ensure that non-u behaviour did not change.
https://codereview.chromium.org/1608693003/diff/60001/src/regexp/jsregexp.cc File src/regexp/jsregexp.cc (left): https://codereview.chromium.org/1608693003/diff/60001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5035: On 2016/01/22 10:10:10, erikcorry wrote: > Inadvertent edit? Undone. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc File src/regexp/jsregexp.cc (right): https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5051: // Reading forward. Forwrad match the lead surrogate and assert that On 2016/01/22 10:10:10, erikcorry wrote: > Forwrad -> Forward > > Also, this ternary expression is too big, and would be better as an 'if' IMHO Done. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5052: // no On 2016/01/22 10:10:10, erikcorry wrote: > Unfortunate line break. (Autoformatter doesn't really understand block > comments. If you are using vim then gq is smarter.) Done. And thanks for the vim tip :) https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5132: AddUnanchoredAdvance(compiler, result, on_success); On 2016/01/22 10:10:10, erikcorry wrote: > Is this better than what the other half of the 'if' would add? This adds [\0-\ud7ff\uc000\uffff]|[\ud800-\udbff][\udc00-\udfff]? instead of [\0-\ud7ff\ue000\uffff]|[\ud800-\udbff][\udc00-\udfff]|(?!<[\ud800-\udbff])[\udc00-\udfff]|[\ud800-\udbff](?![\udc00-\udfff]) Or [BMP+T]|TL? instead of [BMP]|TL|(?!<T)L|T(?!L) So I think this is a win. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:5140: return result; On 2016/01/22 10:10:10, erikcorry wrote: > I think it's worth checking if the choice node has only one alternative, and > returning that instead in that case. Eg. \d or [,.] will only have one > alternative, and we will get better code I think (no flushing). Done in ChoiceNode::Emit. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6545: RegExpNode* OptionallyStepBackToLeadSurrogate(RegExpCompiler* compiler, On 2016/01/22 10:10:10, erikcorry wrote: > Going backwards is some standards-mandated strangeness for sure. Wil this > always make progress on global matches? Eg if I search for /()/ with /g and I > start in the middle of a pair? You are right. I have to figure out this one. In case of /()/g we generally have to step to the next code point, so the current implementation of stepping just a single character is incorrect to begin with. I'll work on this in a separate CL. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6559: int position_register = compiler->UnicodeLookaroundPositionRegister(); On 2016/01/22 10:10:10, erikcorry wrote: > How does this actually do the backwards step? Some cleverness with the position > register? The TextNode for lead surrogates below consumes one character backward. https://codereview.chromium.org/1608693003/diff/80001/src/regexp/jsregexp.cc#... src/regexp/jsregexp.cc:6638: node = OptionallyStepBackToLeadSurrogate(&compiler, node); On 2016/01/22 10:10:10, erikcorry wrote: > Do we need this for non-sticky non-global regexps? Don't they always start at > the beginning? Done. https://codereview.chromium.org/1608693003/diff/80001/test/mjsunit/harmony/un... File test/mjsunit/harmony/unicode-regexp-last-index.js (right): https://codereview.chromium.org/1608693003/diff/80001/test/mjsunit/harmony/un... test/mjsunit/harmony/unicode-regexp-last-index.js:7: var r = /./ug; On 2016/01/22 10:10:10, erikcorry wrote: > All these tests should be duplicated without the /u flag, to ensure that non-u > behaviour did not change. Done.
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from erikcorry@google.com Link to the patchset: https://codereview.chromium.org/1608693003/#ps100001 (title: "addressed comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1608693003/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1608693003/100001
The CQ bit was unchecked by commit-bot@chromium.org
CLs for remote refs other than refs/pending/heads/master must contain NOTRY=true and NOPRESUBMIT=true in order for the CQ to process them
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from erikcorry@google.com Link to the patchset: https://codereview.chromium.org/1608693003/#ps120001 (title: "land")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1608693003/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1608693003/120001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: v8_linux_arm64_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm64_rel/builds/1...) v8_linux_nodcheck_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel/build...)
The CQ bit was checked by yangguo@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from erikcorry@google.com Link to the patchset: https://codereview.chromium.org/1608693003/#ps140001 (title: "git cl set_commit")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1608693003/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1608693003/140001
Message was sent while issue was closed.
Committed patchset #8 (id:140001)
Message was sent while issue was closed.
Description was changed from ========== [regexp] step back if starting unicode regexp within surrogate pair. See https://github.com/tc39/ecma262/issues/128 R=erik.corry@gmail.com, littledan@chromium.org BUG=v8:2952 LOG=N ========== to ========== [regexp] step back if starting unicode regexp within surrogate pair. See https://github.com/tc39/ecma262/issues/128 R=erik.corry@gmail.com, littledan@chromium.org BUG=v8:2952 LOG=N Committed: https://crrev.com/3246d26b71d0c4d86705e3994b0c328c846a3fd4 Cr-Commit-Position: refs/heads/master@{#33488} ==========
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/3246d26b71d0c4d86705e3994b0c328c846a3fd4 Cr-Commit-Position: refs/heads/master@{#33488} |