|
|
Created:
6 years, 2 months ago by Diego Pino Modified:
6 years, 2 months ago CC:
v8-dev, aperez, vjaquez, gemont_igalia.com, Michael Starzinger, Erik Corry Project:
v8 Visibility:
Public. |
DescriptionUpdate identifier syntax to disallow U+2E2F
BUG=v8:2892
LOG=
Patch Set 1 #
Total comments: 1
Patch Set 2 : Fixed test. #
Messages
Total messages: 14 (3 generated)
dpino@igalia.com changed reviewers: + wingo@igalia.com
The solution seems a bit naive but there are already characters, that may not be valid as identifier categories, explicitly included (it's the case of '$' and '_'). The patch follows the same approach, explicitly excludes U2E2F as valid identifier character.
mathias@qiwi.be changed reviewers: + mathias@qiwi.be
https://codereview.chromium.org/626073002/diff/1/test/mjsunit/var.js File test/mjsunit/var.js (right): https://codereview.chromium.org/626073002/diff/1/test/mjsunit/var.js#newcode40 test/mjsunit/var.js:40: assertThrows("var \u2E2F;", SyntaxError); These two tests are identical. Did you mean to use `assertThrows("var \\u2E2F;", SyntaxError);` instead?
On 2014/10/04 12:20:47, mathias wrote: > https://codereview.chromium.org/626073002/diff/1/test/mjsunit/var.js#newcode40 > test/mjsunit/var.js:40: assertThrows("var \u2E2F;", SyntaxError); > These two tests are identical. > > Did you mean to use `assertThrows("var \\u2E2F;", SyntaxError);` instead? Right :) Fixed.
I don't much like this approach TBH. I think probably the right thing is to regenerate unicode.cc. The exceptions that are in IdentifierStart::Is are there because they are there in the spec.
danno@chromium.org changed reviewers: + danno@chromium.org, jochen@chromium.org, yangguo@chromium.org
On 2014/10/07 10:45:47, danno wrote: I'd like to regenerate unicode.cc, preferably even to unicode 7.0.0, though that would also mean that we would fail test262 due to the mongolian vowel separator. Last I checked, it still expects it as white space (category Zs). Unrelated to this however, can somebody point me to where this change is specified?
On 2014/10/07 10:52:59, Yang wrote: > can somebody point me to where this change is specified? The bug: http://code.google.com/p/v8/issues/detail?id=2892 The spec: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-names-and-keywords Particularly: IdentifierStart :: UnicodeIDStart $ _ \ UnicodeEscapeSequence I can't speak to U+2E2F particularly; mathias probably knows more.
On 2014/10/07 11:06:50, wingo wrote: > On 2014/10/07 10:52:59, Yang wrote: > > can somebody point me to where this change is specified? > > The bug: http://code.google.com/p/v8/issues/detail?id=2892 > The spec: > http://people.mozilla.org/~jorendorff/es6-draft.html#sec-names-and-keywords > > Particularly: > > IdentifierStart :: > UnicodeIDStart > $ > _ > \ UnicodeEscapeSequence > > I can't speak to U+2E2F particularly; mathias probably knows more. ES5 used to specify `IdentifierStart` and `IdentifierPart` based on a list of Unicode categories, one of which (`Lm`) happened to include U+2E2F. But ES6 (mostly) just refers to `ID_Start` and `ID_Continue` instead. U+2E2F has the `Pattern_Syntax` property and is thus not a valid `ID_Start` code point.
On 2014/10/07 12:53:14, mathias wrote: > On 2014/10/07 11:06:50, wingo wrote: > > On 2014/10/07 10:52:59, Yang wrote: > > > can somebody point me to where this change is specified? > > > > The bug: http://code.google.com/p/v8/issues/detail?id=2892 > > The spec: > > http://people.mozilla.org/~jorendorff/es6-draft.html#sec-names-and-keywords > > > > Particularly: > > > > IdentifierStart :: > > UnicodeIDStart > > $ > > _ > > \ UnicodeEscapeSequence > > > > I can't speak to U+2E2F particularly; mathias probably knows more. > > ES5 used to specify `IdentifierStart` and `IdentifierPart` based on a list of > Unicode categories, one of which (`Lm`) happened to include U+2E2F. But ES6 > (mostly) just refers to `ID_Start` and `ID_Continue` instead. > > U+2E2F has the `Pattern_Syntax` property and is thus not a valid `ID_Start` code > point. Okay, so regarding the mongolian vowel separator, ES5 says "ECMAScript implementations must recognise all of the white space characters defined in Unicode 3.0. Later editions of the Unicode Standard may define other white space characters. ECMAScript implementations may recognise white space characters from later editions of the Unicode Standard." and ES6 says "ECMAScript implementations must recognize as Whitespace code points listed in the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations may also recognize as Whitespace additional category Zs code points from subsequent editions of the Unicode Standard." The mongolian vovewl separator \u180E stopped being a white space since Unicode 6.3.0, so we still have to recognize it as white space for Javascript.
On 2014/10/07 15:12:39, Yang wrote: > On 2014/10/07 12:53:14, mathias wrote: > > On 2014/10/07 11:06:50, wingo wrote: > > > On 2014/10/07 10:52:59, Yang wrote: > > > > can somebody point me to where this change is specified? > > > > > > The bug: http://code.google.com/p/v8/issues/detail?id=2892 > > > The spec: > > > http://people.mozilla.org/~jorendorff/es6-draft.html#sec-names-and-keywords > > > > > > Particularly: > > > > > > IdentifierStart :: > > > UnicodeIDStart > > > $ > > > _ > > > \ UnicodeEscapeSequence > > > > > > I can't speak to U+2E2F particularly; mathias probably knows more. > > > > ES5 used to specify `IdentifierStart` and `IdentifierPart` based on a list of > > Unicode categories, one of which (`Lm`) happened to include U+2E2F. But ES6 > > (mostly) just refers to `ID_Start` and `ID_Continue` instead. > > > > U+2E2F has the `Pattern_Syntax` property and is thus not a valid `ID_Start` > code > > point. > > Okay, so regarding the mongolian vowel separator, ES5 says > > "ECMAScript implementations must recognise all of the white space characters > defined in Unicode 3.0. Later editions of the Unicode Standard may define other > white space characters. ECMAScript implementations may recognise white space > characters from later editions of the Unicode Standard." > > and ES6 says > > "ECMAScript implementations must recognize as Whitespace code points listed in > the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations > may also recognize as Whitespace additional category Zs code points from > subsequent editions of the Unicode Standard." > > The mongolian vovewl separator \u180E stopped being a white space since Unicode > 6.3.0, so we still have to recognize it as white space for Javascript. So I got this as alternative: https://codereview.chromium.org/638643002/
Message was sent while issue was closed.
Mark as closed. Solved by https://codereview.chromium.org/638643002/. |