|
|
Description[regexp] Support unicode capture names in non-unicode patterns
This ensures that capture names containing surrogate pairs are parsed
correctly even in non-unicode RegExp patterns by introducing a new
scanning mode which unconditionally combines surrogate pairs.
BUG=v8:5437, v8:6192
Review-Url: https://codereview.chromium.org/2791163003
Cr-Commit-Position: refs/heads/master@{#44466}
Committed: https://chromium.googlesource.com/v8/v8/+/a8651c5671dd6e41595ffb438e7ea013082f2d38
Patch Set 1 #Patch Set 2 : Update test #
Total comments: 5
Patch Set 3 : Remove template parameter from ReadNext #
Messages
Total messages: 26 (15 generated)
The CQ bit was checked by jgruber@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by jgruber@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
jgruber@chromium.org changed reviewers: + yangguo@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_arm_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm_rel_ng/builds/...) v8_linux_arm_rel_ng_triggered on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm_rel_ng_trigger...)
https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... File src/regexp/regexp-parser.cc (left): https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... src/regexp/regexp-parser.cc:332: Advance(); How come we can remove this Advance here and below? https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... File src/regexp/regexp-parser.cc (right): https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... src/regexp/regexp-parser.cc:49: template <bool update_position> I wonder whether it makes sense to make the scan mode a template param. Or alternatively make this bool a normal parameter. I sort of doubt that this has any performance implications.
https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... File src/regexp/regexp-parser.cc (left): https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... src/regexp/regexp-parser.cc:332: Advance(); On 2017/04/05 12:52:38, Yang wrote: > How come we can remove this Advance here and below? The advance is now done in ParseCaptureGroupName. That was necessary since advancing from the '<' already needs to use the FORCE_COMBINE_SURROGATE_PAIRS mode. https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... File src/regexp/regexp-parser.cc (right): https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... src/regexp/regexp-parser.cc:49: template <bool update_position> On 2017/04/05 12:52:38, Yang wrote: > I wonder whether it makes sense to make the scan mode a template param. Or > alternatively make this bool a normal parameter. I sort of doubt that this has > any performance implications. Agreed, will do that.
On 2017/04/05 12:57:31, jgruber wrote: > https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... > File src/regexp/regexp-parser.cc (left): > > https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... > src/regexp/regexp-parser.cc:332: Advance(); > On 2017/04/05 12:52:38, Yang wrote: > > How come we can remove this Advance here and below? > > The advance is now done in ParseCaptureGroupName. That was necessary since > advancing from the '<' already needs to use the FORCE_COMBINE_SURROGATE_PAIRS > mode. > > https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... > File src/regexp/regexp-parser.cc (right): > > https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... > src/regexp/regexp-parser.cc:49: template <bool update_position> > On 2017/04/05 12:52:38, Yang wrote: > > I wonder whether it makes sense to make the scan mode a template param. Or > > alternatively make this bool a normal parameter. I sort of doubt that this has > > any performance implications. > > Agreed, will do that. lgtm.
The CQ bit was checked by jgruber@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... File src/regexp/regexp-parser.cc (right): https://codereview.chromium.org/2791163003/diff/20001/src/regexp/regexp-parse... src/regexp/regexp-parser.cc:49: template <bool update_position> On 2017/04/05 12:57:30, jgruber wrote: > On 2017/04/05 12:52:38, Yang wrote: > > I wonder whether it makes sense to make the scan mode a template param. Or > > alternatively make this bool a normal parameter. I sort of doubt that this has > > any performance implications. > > Agreed, will do that. Done.
The CQ bit was unchecked by jgruber@chromium.org
The CQ bit was checked by jgruber@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from yangguo@chromium.org Link to the patchset: https://codereview.chromium.org/2791163003/#ps40001 (title: "Remove template parameter from ReadNext")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1491550079721040, "parent_rev": "57bef9a1e2621555f70b9258593ae4a4235307ef", "commit_rev": "a8651c5671dd6e41595ffb438e7ea013082f2d38"}
Message was sent while issue was closed.
Description was changed from ========== [regexp] Support unicode capture names in non-unicode patterns This ensures that capture names containing surrogate pairs are parsed correctly even in non-unicode RegExp patterns by introducing a new scanning mode which unconditionally combines surrogate pairs. BUG=v8:5437,v8:6192 ========== to ========== [regexp] Support unicode capture names in non-unicode patterns This ensures that capture names containing surrogate pairs are parsed correctly even in non-unicode RegExp patterns by introducing a new scanning mode which unconditionally combines surrogate pairs. BUG=v8:5437,v8:6192 Review-Url: https://codereview.chromium.org/2791163003 Cr-Commit-Position: refs/heads/master@{#44466} Committed: https://chromium.googlesource.com/v8/v8/+/a8651c5671dd6e41595ffb438e7ea013082... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/v8/v8/+/a8651c5671dd6e41595ffb438e7ea013082...
Message was sent while issue was closed.
littledan@chromium.org changed reviewers: + littledan@chromium.org
Message was sent while issue was closed.
In https://github.com/tc39/test262/pull/998#discussion_r113419975 , anba noticed that the named capture specification text actually doesn't support the feature that was implemented in this patch. It's awkward to specify for the same reason that it was funny to implement here--the spec doesn't go through different modes about whether things are Unicode or non-Unicode. I'm sorry for creating this extra work by suggesting the other resolution on this question before, but maybe we should leave out surrogate pairs. In the end, the test262 tests checked in against the current spec look just like the test preceding this patch. What would you think of simply reverting this patch?
Message was sent while issue was closed.
On 2017/05/03 19:14:20, Dan Ehrenberg wrote: > In https://github.com/tc39/test262/pull/998#discussion_r113419975 , anba noticed > that the named capture specification text actually doesn't support the feature > that was implemented in this patch. It's awkward to specify for the same reason > that it was funny to implement here--the spec doesn't go through different modes > about whether things are Unicode or non-Unicode. Yes, I saw that come up on the test262 CL. Makes sense, and it's nice to be able to simplify behavior here again. > I'm sorry for creating this extra work by suggesting the other resolution on > this question before, but maybe we should leave out surrogate pairs. In the end, > the test262 tests checked in against the current spec look just like the test > preceding this patch. > > What would you think of simply reverting this patch? Sounds good. Do you know of any other V8 failures on the new test262 tests?
Message was sent while issue was closed.
On 2017/05/04 06:17:04, jgruber wrote: > On 2017/05/03 19:14:20, Dan Ehrenberg wrote: > > In https://github.com/tc39/test262/pull/998#discussion_r113419975 , anba > noticed > > that the named capture specification text actually doesn't support the feature > > that was implemented in this patch. It's awkward to specify for the same > reason > > that it was funny to implement here--the spec doesn't go through different > modes > > about whether things are Unicode or non-Unicode. > > Yes, I saw that come up on the test262 CL. Makes sense, and it's nice to be able > to > simplify behavior here again. > > > I'm sorry for creating this extra work by suggesting the other resolution on > > this question before, but maybe we should leave out surrogate pairs. In the > end, > > the test262 tests checked in against the current spec look just like the test > > preceding this patch. > > > > What would you think of simply reverting this patch? > > Sounds good. Do you know of any other V8 failures on the new test262 tests? No, this was the only one so far, but there may be more tests to come.
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.chromium.org/2859933003/ by littledan@chromium.org. The reason for reverting is: The decision for the specification was to not have this syntax, and instead the syntax before this patch.. |