|
|
Created:
9 years, 11 months ago by Martin Maly Modified:
9 years, 7 months ago CC:
v8-dev Visibility:
Public. |
DescriptionFirst part of strict mode.
- var eval | arguments
- catch (eval | arguments)
- 'with' is disabled
- function can't be named eval or arguments
- function parameter name cannot be eval or arguments
- no duplicate parameter names allowed
Add FLAG_strict_mode
Patch Set 1 #
Total comments: 24
Patch Set 2 : Lasse's code review feedback + flag #
Total comments: 4
Patch Set 3 : More CR feedback. #
Total comments: 2
Patch Set 4 : CR Feedback + Tests. #
Total comments: 12
Patch Set 5 : Fix Mozilla test. #Patch Set 6 : Removing parameter validation; #
Messages
Total messages: 16 (0 generated)
the first early draft.
I like this (i.e., it's very close to how I would have done it). The problems with detecting "use strict" can be fixed without resorting to extra passes. The rest looks well structured and follows the existing conventions well. http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { As you pointed out, this is too simple (and ECMAScript is rarely simple :). The missing parts is checking that the "use strict" is not part of a larger expression, and that it doesn't contain escapes. There are ways to do both without rescanning or prescanning. If an ExpressionStatement *starts* with a string token, and is itself a single string, then there are no wrapping parentheses. And if the string literal's start and end position are exactly strlen("use strict") + 2 apart (content + quotes), then there are no escapes in the string. The solution I have come up with is something like (from memory): Statement* stat; if (directive_prologue && peek() == Token::STRING) { var location = scanner_pos(); stat = ParseStatement(NULL, CHECK); ExpressionStatement* expr_stat = stat->AsExpressionStatement(); if (expr_stat != NULL) { Handle<Object> literal = expr_stat->handle(); if (literal->IsString()) { // It's a directive prologue. Handle<String> prologue = Handle<String>::cast(literal); if (prologue.Equals(Heap::use_strict_symbol()) && position.end_pos - position.beg_pos == 12) { // Literal location, including quotes, leaves // no room for escaped characters. temp_scope_->SetStrict(); } continue; // Don't add it to the AST. } } directive_prologue = false; } else { directive_prologue = false; stat = ParseStatement(NULL, CHECK); } I'm not entirely satisfied with the flow yet :). http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3252 src/parser.cc:3252: StrictMode strict(&strict_mode_); As stated earlier, we define the TemporaryScope just above, which has the same liveness and general meaning (collecting stuff to put into FunctionLiteral) as the StrictMode here. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3344 src/parser.cc:3344: ReportMessageAt(Scanner::Location(position, start_pos), Could you store the position of the argument that is "eval" or "arguments"? I.e., so that if thestored position is RelocInfo::kNoPosition, then there is none, and otherwise it's the position of the offending argument. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3351 src/parser.cc:3351: "strict_param_dupe", Vector<const char*>::empty()); Also here, could you store the actual position of the (second instance of) the offending variable name? http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3354 src/parser.cc:3354: } For later, I would also put the check for octal numbers here. I imagine letting the scanner hold the position of the most recent octal escape or octal number literal, and then testing at this point whether it lies between start_pos and end_pos. That will report the last octal number, not the first, but I think it's better than checking all the time. http://codereview.chromium.org/6144005/diff/1/src/parser.h File src/parser.h (right): http://codereview.chromium.org/6144005/diff/1/src/parser.h#newcode685 src/parser.h:685: bool strict_mode_; // parsing strict mode code You could (and IMO should) put this bit in the TemporaryScope in temp_scope_. TemporaryScope collects other things that are put into the FunctionLiteral as well. You can set TemporaryScope to automatically inherit the is_strict bit from its parent the same way the StrictMode class does. http://codereview.chromium.org/6144005/diff/1/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/1/src/scopes.cc#newcode417 src/scopes.cc:417: HashMap map(Match, &HashMap::DefaultAllocator, length * 2); If the length is short (e.g., five or less), just do the quadratic comparison. We are comparing symbols, so it's pretty quick, and I'm guessing that the average number of arguments to a function is around two (could be a fun thing to check :).
http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { Argh, location should be: Scanner::Location location = scanner().peek_location(); and the literal test should be: Literal* literal = expr_stat->AsLiteral(); if (literal != NULL && literal->handle()->IsString()) { Handle<String> prologue = Handle<String>::cast(literal->handle()); ...
After some thought, we think it will be a good idea to have a flag that can be used to disable strict mode detection. It should be enabled by default, but the flag will let us test whether it's strict-mode that makes something fail. It should be sufficient to check it only at the point where we set strict mode to true, and just do nothing - that would automatically make everything non-strict.
FYI, here's an instance of needing to disable strict mode: https://bugs.webkit.org/show_bug.cgi?id=48377
Thanks for the feedback, Lasse! Tried to incorporate all of it, in few places I don't know how nervous to be about additional work now happening when processing non-strict code. Also adding --strict_mode flag. Main question - how does one allocate transient memory in the parser? A memory that only sometimes is needed and when it is I'd love to free it ideally as soon as possible. The code is marked below with a comment so hopefully it'll be obvious what I am trying to do. For some reason new/delete didn't work even though the underlying operators did seem to do the right thing. Thank you! Martin http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { On 2011/01/13 07:18:40, Lasse Reichstein wrote: > As you pointed out, this is too simple (and ECMAScript is rarely simple :). > The missing parts is checking that the "use strict" is not part of a larger > expression, and that it doesn't contain escapes. There are ways to do both > without rescanning or prescanning. > > If an ExpressionStatement *starts* with a string token, and is itself a single > string, then there are no wrapping parentheses. > And if the string literal's start and end position are exactly strlen("use > strict") + 2 apart (content + quotes), then there are no escapes in the string. > > The solution I have come up with is something like (from memory): > > Statement* stat; > if (directive_prologue && peek() == Token::STRING) { > var location = scanner_pos(); > stat = ParseStatement(NULL, CHECK); > ExpressionStatement* expr_stat = stat->AsExpressionStatement(); > if (expr_stat != NULL) { > Handle<Object> literal = expr_stat->handle(); > if (literal->IsString()) { > // It's a directive prologue. > Handle<String> prologue = Handle<String>::cast(literal); > if (prologue.Equals(Heap::use_strict_symbol()) && > position.end_pos - position.beg_pos == 12) { > // Literal location, including quotes, leaves > // no room for escaped characters. > temp_scope_->SetStrict(); > } > continue; // Don't add it to the AST. > } > } > directive_prologue = false; > } else { > directive_prologue = false; > stat = ParseStatement(NULL, CHECK); > } > > I'm not entirely satisfied with the flow yet :). Done. However, we still need to add the directives to the AST. Consider: eval("Hello").length http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3252 src/parser.cc:3252: StrictMode strict(&strict_mode_); On 2011/01/13 07:18:40, Lasse Reichstein wrote: > As stated earlier, we define the TemporaryScope just above, which has the same > liveness and general meaning (collecting stuff to put into FunctionLiteral) as > the StrictMode here. Done. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3344 src/parser.cc:3344: ReportMessageAt(Scanner::Location(position, start_pos), It is definitely possible. I was trying to avoid any unnecessary work that would affect non-strict mode code. In this case, checking parameter names for eval/arguments will have to happen regardless of strict mode because of: function foo(eval, arguments) { // Seen not in strict mode but must ERR "use strict"; } This check is relatively cheap so I will add it. On 2011/01/13 07:18:40, Lasse Reichstein wrote: > Could you store the position of the argument that is "eval" or "arguments"? > I.e., so that if thestored position is RelocInfo::kNoPosition, then there is > none, and otherwise it's the position of the offending argument. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3351 src/parser.cc:3351: "strict_param_dupe", Vector<const char*>::empty()); I believe it can be done, also at a relatively small cost to the non-strict mode parsing. The only glitch is being able to allocate HashMap temporarily so that only functions with >N parameters use hashmap for duplicate checks. How can I allocate such temporary hash map inside V8 parser? I am hoping to put it onto TemporaryScope as well and allocate it only when needed, and ideally free it as soon as the arguments have been processed. On 2011/01/13 07:18:40, Lasse Reichstein wrote: > Also here, could you store the actual position of the (second instance of) the > offending variable name? http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3354 src/parser.cc:3354: } Great idea. that will also handle the case where "use strict" is preceded by another directive that does contain octal literal. Will do when I implement the octal detection. On 2011/01/13 07:18:40, Lasse Reichstein wrote: > For later, I would also put the check for octal numbers here. > I imagine letting the scanner hold the position of the most recent octal escape > or octal number literal, and then testing at this point whether it lies between > start_pos and end_pos. > That will report the last octal number, not the first, but I think it's better > than checking all the time. http://codereview.chromium.org/6144005/diff/1/src/parser.h File src/parser.h (right): http://codereview.chromium.org/6144005/diff/1/src/parser.h#newcode685 src/parser.h:685: bool strict_mode_; // parsing strict mode code On 2011/01/13 07:18:40, Lasse Reichstein wrote: > You could (and IMO should) put this bit in the TemporaryScope in temp_scope_. > TemporaryScope collects other things that are put into the FunctionLiteral as > well. > You can set TemporaryScope to automatically inherit the is_strict bit from its > parent the same way the StrictMode class does. Done. http://codereview.chromium.org/6144005/diff/1/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/1/src/scopes.cc#newcode417 src/scopes.cc:417: HashMap map(Match, &HashMap::DefaultAllocator, length * 2); On 2011/01/13 07:18:40, Lasse Reichstein wrote: > If the length is short (e.g., five or less), just do the quadratic comparison. > We are comparing symbols, so it's pretty quick, and I'm guessing that the > average number of arguments to a function is around two (could be a fun thing to > check :). Done. http://codereview.chromium.org/6144005/diff/15001/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/15001/src/parser.cc#newcode340 src/parser.cc:340: // TODO(mmaly): How to allocate temporary map and free it when not needed? I am trying to allocate the hash map only when function has >N parameters and ideally free it as soon as parameter processing is done (i.e. before we even hit the destructor). Is there a heap to use for such transient memory? the new+delete didn't work and C++ complained that object not allocated is being freed. http://codereview.chromium.org/6144005/diff/15001/src/parser.cc#newcode351 src/parser.cc:351: Moved the code from Scope to TemporaryScope since here is where I am hoping to allocate and store the transient hash map. http://codereview.chromium.org/6144005/diff/15001/src/parser.cc#newcode3324 src/parser.cc:3324: This code runs even when not processing strict mode code, similarly for the eval_loc and dupe_loc which take up stack space either way. With slightly more complicated logic I could bring it down to just 1 Scanner::Location and a bool (because errors are reported in sequence and I therefore only need to remember location of the error that will be reported first) but am not sure how aggressively I should optimize. http://codereview.chromium.org/6144005/diff/15001/src/parser.cc#newcode3409 src/parser.cc:3409: int position = function_token_position != RelocInfo::kNoPosition This is still necessary because property get/set goes through here with RelocInfo::kNoPosition. For now leaving as is but would like to find better solution, once I get the feel for the right balance between perf and good errors.
http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { I don't see how that example applies. The "Hello" String is a string literal. When parsed by eval, it will be an identifier. In neither case will it be seen as a Directive Prologue. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3344 src/parser.cc:3344: ReportMessageAt(Scanner::Location(position, start_pos), Yes, we need to either detect the problems when they are introduced (to know the location) before we know that it's strict code, or be able to detect them properly later. I prefer just detecting them as soon as possible, and just remember the first (or last, I don't care) problem that would cause strict mode to fail. And I think it's much simpler to simply detect strict mode violations when they occur, and then check at the end whether there was any, if the function is strict. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3351 src/parser.cc:3351: "strict_param_dupe", Vector<const char*>::empty()); I think I would just drop the HashMap and do the simple, quadratic algorithm in all cases. We are comparing symbols, so it's a pretty damn fast check, and I doubt we'll see any function with more than 10 formal parameters anyway. I.e., let's worry about that only if it becomes a problem. http://codereview.chromium.org/6144005/diff/1/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/1/src/scopes.cc#newcode417 src/scopes.cc:417: HashMap map(Match, &HashMap::DefaultAllocator, length * 2); And thinking more about it, just do the simple thing in all cases.
Thank you for reviewing the code for me! Here it is updated per your feedback. Martin http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { Sorry, the example was incorrect eval("'hello'").length is what is affected by dropping expression statements that in the directive prologue. On 2011/01/14 11:42:05, Lasse Reichstein wrote: > I don't see how that example applies. The "Hello" String is a string literal. > When parsed by eval, it will be an identifier. In neither case will it be seen > as a Directive Prologue. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3344 src/parser.cc:3344: ReportMessageAt(Scanner::Location(position, start_pos), Sounds good. On 2011/01/14 11:42:05, Lasse Reichstein wrote: > Yes, we need to either detect the problems when they are introduced (to know the > location) before we know that it's strict code, or be able to detect them > properly later. > > I prefer just detecting them as soon as possible, and just remember the first > (or last, I don't care) problem that would cause strict mode to fail. > And I think it's much simpler to simply detect strict mode violations when they > occur, and then check at the end whether there was any, if the function is > strict. http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode3351 src/parser.cc:3351: "strict_param_dupe", Vector<const char*>::empty()); On 2011/01/14 11:42:05, Lasse Reichstein wrote: > I think I would just drop the HashMap and do the simple, quadratic algorithm in > all cases. > We are comparing symbols, so it's a pretty damn fast check, and I doubt we'll > see any function with more than 10 formal parameters anyway. I.e., let's worry > about that only if it becomes a problem. Done. http://codereview.chromium.org/6144005/diff/1/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/1/src/scopes.cc#newcode417 src/scopes.cc:417: HashMap map(Match, &HashMap::DefaultAllocator, length * 2); On 2011/01/14 11:42:05, Lasse Reichstein wrote: > And thinking more about it, just do the simple thing in all cases. Done.
LGTM. http://codereview.chromium.org/6144005/diff/1/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/1/src/parser.cc#newcode1084 src/parser.cc:1084: if (peek() == Token::STRING) { Ack, yes, now I see.
http://codereview.chromium.org/6144005/diff/20001/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/20001/src/parser.cc#newcode1076 src/parser.cc:1076: if (directive_prologue && peek() == Token::STRING) { So I guess this could just be if (directive_prologue && peek() != Token::STRING) { directive_prologue = false; } Statement* stat = parseStatement(NULL, CHECK_OK); if (directive_prologue) { // if not String only ExpressionStatement, directive_prologue = false; // else if is "use strict" ExpressionStatement, set strict mode. } ...
Made the change and added tests to cover the currently implemented strict mode checks. http://codereview.chromium.org/6144005/diff/20001/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/20001/src/parser.cc#newcode1076 src/parser.cc:1076: if (directive_prologue && peek() == Token::STRING) { We still need to keep track of the token location but other than that, this does work. Made the change. On 2011/01/17 10:23:41, Lasse Reichstein wrote: > So I guess this could just be > > if (directive_prologue && peek() != Token::STRING) { > directive_prologue = false; > } > Statement* stat = parseStatement(NULL, CHECK_OK); > if (directive_prologue) { > // if not String only ExpressionStatement, directive_prologue = false; > // else if is "use strict" ExpressionStatement, set strict mode. > } > ...
LGTM. Let's try to commit it and see the impact! http://codereview.chromium.org/6144005/diff/32001/src/messages.js File src/messages.js (right): http://codereview.chromium.org/6144005/diff/32001/src/messages.js#newcode205 src/messages.js:205: strict_function_name: "Function name may not be eval or arguments in strict mode", Eventually we must check (if you haven't already) what error messages JSC gives in strict mode. So far they haven't released a browser with strict mode, so there is no rush. http://codereview.chromium.org/6144005/diff/32001/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/32001/src/parser.cc#newcode326 src/parser.cc:326: strict_mode_ = (parent_ != NULL) ? parent_->strict_mode_ : false; Can be written (parent_ != NULL) && parent_->strict_mode_ Probably doesn't make any difference, though. http://codereview.chromium.org/6144005/diff/32001/src/parser.cc#newcode1101 src/parser.cc:1101: temp_scope_->EnableStrictMode(); You can set directive_prologue to false here, since there is no more reason to look at the directive prologue. On the other hand, any realistic code won't have any directive prologue except "use strict", if any. http://codereview.chromium.org/6144005/diff/32001/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/32001/src/scopes.cc#newcode417 src/scopes.cc:417: for (int i = 0, length = num_parameters(); i < length; i ++) { Perhaps ASSERT that both name and parameter(i)->name() are symbols. They must be, but better safe than sorry. http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js File test/mjsunit/strict-mode.js (right): http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js... test/mjsunit/strict-mode.js:58: } How about adding a case where there are more directive prologues than "use strict", and it's neither first nor last? http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js... test/mjsunit/strict-mode.js:88: CheckStrictMode("var arguments;", SyntaxError) Test that we catches: var o = {set foo(eval) {}}; and likewise for "arguments". (It uses ParseFunctionLiteral, so it should just work).
Thanks, fixed these issues, added couple more tests and committed. Thank you! Martin http://codereview.chromium.org/6144005/diff/32001/src/messages.js File src/messages.js (right): http://codereview.chromium.org/6144005/diff/32001/src/messages.js#newcode205 src/messages.js:205: strict_function_name: "Function name may not be eval or arguments in strict mode", Agreed. Opened a bug on this: http://code.google.com/p/v8/issues/detail?id=1052 On 2011/01/18 13:40:59, Lasse Reichstein wrote: > Eventually we must check (if you haven't already) what error messages JSC gives > in strict mode. So far they haven't released a browser with strict mode, so > there is no rush. http://codereview.chromium.org/6144005/diff/32001/src/parser.cc File src/parser.cc (right): http://codereview.chromium.org/6144005/diff/32001/src/parser.cc#newcode326 src/parser.cc:326: strict_mode_ = (parent_ != NULL) ? parent_->strict_mode_ : false; On 2011/01/18 13:40:59, Lasse Reichstein wrote: > Can be written > (parent_ != NULL) && parent_->strict_mode_ > Probably doesn't make any difference, though. Done. http://codereview.chromium.org/6144005/diff/32001/src/parser.cc#newcode1101 src/parser.cc:1101: temp_scope_->EnableStrictMode(); On 2011/01/18 13:40:59, Lasse Reichstein wrote: > You can set directive_prologue to false here, since there is no more reason to > look at the directive prologue. > > On the other hand, any realistic code won't have any directive prologue except > "use strict", if any. Done. http://codereview.chromium.org/6144005/diff/32001/src/scopes.cc File src/scopes.cc (right): http://codereview.chromium.org/6144005/diff/32001/src/scopes.cc#newcode417 src/scopes.cc:417: for (int i = 0, length = num_parameters(); i < length; i ++) { On 2011/01/18 13:40:59, Lasse Reichstein wrote: > Perhaps ASSERT that both name and parameter(i)->name() are symbols. They must > be, but better safe than sorry. Done. http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js File test/mjsunit/strict-mode.js (right): http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js... test/mjsunit/strict-mode.js:58: } On 2011/01/18 13:40:59, Lasse Reichstein wrote: > How about adding a case where there are more directive prologues than "use > strict", and it's neither first nor last? Done. http://codereview.chromium.org/6144005/diff/32001/test/mjsunit/strict-mode.js... test/mjsunit/strict-mode.js:88: CheckStrictMode("var arguments;", SyntaxError) On 2011/01/18 13:40:59, Lasse Reichstein wrote: > Test that we catches: > var o = {set foo(eval) {}}; > and likewise for "arguments". > (It uses ParseFunctionLiteral, so it should just work). Done.
The commit with n^2 algorithm for parameter validation caused timeout in Mozilla tests (function with 65536 parameters, amounting to ~4G pairs to check). I am adding back the hashmap based checking. During parsing of parameters, all locations are stored in the list and later parameters are checked for dupes via hash map. This favors non-strict functions. The next step in optimizing may have to be parsing in non-strict mode and if we detect "use strict", roll back parser and reparse the function in strict mode. I haven't explored the V8 facility to do something like this so it may not even be possible.
Methinks parameter validation doesn't belong in the parser. After parsing we resolve variables in scopes.cc, which is already hashing and canonicalizing variable references. Wouldn't that be the right place to check all things scope-related? We should avoid adding extra passes and auxiliary data structures to the parser.
Removing the parameter validation for now, will find the right approach in another patch. Thank you! Martin |