|
|
Created:
3 years, 9 months ago by bradnelson Modified:
3 years, 9 months ago CC:
v8-reviews_googlegroups.com, Karl Target Ref:
refs/pending/heads/master Project:
v8 Visibility:
Public. |
Description[wasm][asm.js] Adding custom asm.js lexer.
Adding a custom lexer for asm.js parsing.
It takes advantage of a number of asm.js properties to simply things:
* Assumes 'use asm' is the only string.
* Does not handle unicode for now (tools don't emit it).
* Combines global + local string table with lexer.
R=marja@chromium.org,vogelheim@chromium.org,kschimpf@chromium.org
BUG=v8:4203
BUG=v8:6090
Review-Url: https://codereview.chromium.org/2751693002
Cr-Commit-Position: refs/heads/master@{#43874}
Committed: https://chromium.googlesource.com/v8/v8/+/4c3217e13292a3aab056b5800678754c8dac8bfe
Patch Set 1 #Patch Set 2 : fix #Patch Set 3 : fix warning #
Total comments: 88
Patch Set 4 : revised #Patch Set 5 : drop unused #Patch Set 6 : fix warning #Patch Set 7 : check #
Total comments: 39
Patch Set 8 : fix #Patch Set 9 : i #Patch Set 10 : fix #
Total comments: 14
Patch Set 11 : fix #Patch Set 12 : fix #Patch Set 13 : fix #
Total comments: 1
Messages
Total messages: 77 (55 generated)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Carved off a part of this: https://codereview.chromium.org/2733743002/ I've added some amount of tests, but will likely come back with more soon (had been leaning on the various asm.js tests, but that's subpar).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...) v8_win_nosnap_shared_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_nosnap_shared_rel_ng...) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/24127)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Some initial comments. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:80: } else if (ch == kEndOfInput) { (general comment) The structure is getting a bit confusing... it's hard to keep track of which ifs / elses we're in. Adding some comments explaining which construct we're currently parsing could help; now the reader needs to read a couple of if's deep to even figure that out :) https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:82: break; Why break, why not return? (Now it's not trivial to see, when we're somewhere deeper, that we actually don't do anything more in this func.) https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:100: if (och != ch) { Lost here... what's this? Ahh, it's checking the ending quote. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:122: if (ch == '*') { if ch == '/' on the line above, it cannot be '*' here. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:125: break; I'm lost here anyway, what's this block, what are we parsing here? Guessing that /* comment */ but I don't understand how this is supposed to work. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:132: } else { No idea here anymore which if this else associates with. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:181: break; Why not while(ch >= ...) { name_ += ch; ch = stream_->Advance(); } it's less nested. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:189: break; E.g,. here it would be less confusing to use return instead of break.... This is quite nested and far away from the loop at the top. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:199: if (!local_) { What's local_? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:226: if ((ch >= '0' && ch <= '9') || (ch >= 'a' && ch <= 'f') || Would it be feasible to have a helper function for scanning a number, which both Scanner and this asm scanner could use? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:263: unsigned_value_ = static_cast<uint32_t>(double_value_); Why strtod if it's guaranteed to be an integer (no dot)? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:268: // Handle mistaken parse of '.' as number. How does this relate to the "Pick out dot" above? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:12: #include "src/objects.h" Nit: You shouldn't need objects.h and objects-inl.h, forward declarations should be enough. You need handles.h though. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h" Can you split out the stream from scanner.h so that you don't need to include scanner.h? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h File src/asmjs/asm-names.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n... src/asmjs/asm-names.h:74: V(return ) \ Nit: extra space?
I'll assume kschimpf@ is the main reviewer for this CL, as I have no insight into the subsystem that is the intended client for this logic. Generally speaking, I find the logic hard to follow. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:57: #if 0 Please don't do this. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:73: token_t ch = stream_->Advance(); (Here & below.) Using token_t for individual characters seems kinda weird, particularly given that token_t value ranges seem to have very specific meaning which is mostly not related to the unicode code point at that number. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:92: token_t och = stream_->Advance(); och ? [here & below] https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:121: if (ch == '/') { +1 to Marja's comments. Also, would this work on /* .... **/ ? I take it the #118 would consume the first '*', #120 the second '*', and then the next loop iteration would proceed with the '/' and not recognize it as terminating the comment. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:159: token_t ooch = stream_->Advance(); ooch ? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:173: ch == '_' || ch == '$') { Could you introduce helper functions for the character classes you're checking here (and below)? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end); strtod may depend on the current locale. Are you really sure that's what you want? (I take it newer standards specify strtod as being "C"-locale only, but I'm not sure we can rely on that.) https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:267: if (end != name_.c_str() + name_.size()) { I'm confused. When does this happen? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:292: rewind_ = true; This doesn't update preceeded_by_newline_. Is this intentional? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:292: rewind_ = true; This doesn't update name_. Is this intentional? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:301: chname[0] = static_cast<char>(token); chname[1] = '\0' ?? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:304: for (auto i = local_names_.begin(); i != local_names_.end(); ++i) { style nitpick: I'd use the for(auto& i : local_names_) for a vanilla iteration that can be expressed that way. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h" Forward declare v8::internal::Utf16CharacterStream ? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:20: typedef intptr_t token_t; Why intptr_t? Is it intentional that the token range is platform dependent? (32/64b) https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:22: explicit AsmJsLexer(Isolate* isolate, Handle<Script> script, int start, nitpick: Drop explicit with multi-arg constructor. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; } name() / name_ appears to be the current token buffer. If so, it should have a different name. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; } What is the intended use of name()? I take it the point of this lexer implementation was that all identifiers are assigned to unique token value so that the parser won't have to consider them. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:26: void SetLocalScope(bool local) { local_ = local; } What does this do? (I mean, what's a local scope, and how does lexing for a local scope differ?) https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:54: enum { This enum has a lot of implied structure which is relied on in other methods, but isn't document anywhere. You may want to add a comment explaining the value ranges and their meaning/properties. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:63: LONG_SYMBOL_NAME_LIST(V) formatting nitpick: The indent is funky. Slightly rewrite (I think empty lines will yield a different result) or consider guiding clang-format w/ '// clang-format on|off' comments. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:78: token_t last_token_; last_token_ -> preceding_token_? (I take it it's the token we processed before the current_token_, rather than the last token in the stream.) https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... File test/cctest/asmjs/test-asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors. All rights reserved. nitpick: 2017 https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors. All rights reserved. Please consider using unittests/.., rather than cctest/... In this case, this would allow you to use test fixtures (TEST_F), which would allow to replace the rather gratuitous use of macros with a test fixture + regular mthods.
kschimpf@google.com changed reviewers: + kschimpf@google.com
https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:74: if (ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r') { Would a switch statement be cleaner here? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:83: } else if (ch < 32 || ch >= 127) { If you use a switch statement, either explicitly enumerate, or else put in default clause. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:89: const char* use_asm = "use asm"; Should this be a constexpr? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:31: int position() const; Why is this method lower case when other methods start with a capital? https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h File src/asmjs/asm-names.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n... src/asmjs/asm-names.h:18: #define STDLIB_MATH_FUNCTION_MONOMORPHIC_LIST(V) \ Consider adding a comment describing what each parameter holds? This also applies to the V macros below.
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...) v8_linux_arm_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm_rel_ng/builds/...)
Patchset #4 (id:60001) has been deleted
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/24134) v8_win_nosnap_shared_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_nosnap_shared_rel_ng...) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/24223)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
bradnelson@google.com changed reviewers: + bradnelson@google.com
https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:57: #if 0 On 2017/03/14 13:36:37, vogelheim wrote: > Please don't do this. Changed to a trace flag. This ends up being useful to figure out context when parsing falls over. Some of the asm.js programs out there like the unity benchmark don't have the asm.js source sitting in a file, but instead decompress it at runtime and eval it (so line numbers etc don't help much). https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:73: token_t ch = stream_->Advance(); On 2017/03/14 13:36:37, vogelheim wrote: > (Here & below.) Using token_t for individual characters seems kinda weird, > particularly given that token_t value ranges seem to have very specific meaning > which is mostly not related to the unicode code point at that number. Switched all of these inside to uc32. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:74: if (ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r') { On 2017/03/14 18:00:47, Karl wrote: > Would a switch statement be cleaner here? Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:80: } else if (ch == kEndOfInput) { On 2017/03/14 11:11:47, marja wrote: > (general comment) The structure is getting a bit confusing... it's hard to keep > track of which ifs / elses we're in. > > Adding some comments explaining which construct we're currently parsing could > help; now the reader needs to read a couple of if's deep to even figure that out > :) Decomposed into more functions, hope that helps. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:82: break; On 2017/03/14 11:11:47, marja wrote: > Why break, why not return? (Now it's not trivial to see, when we're somewhere > deeper, that we actually don't do anything more in this func.) Redone in more functions, avoids the break. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:83: } else if (ch < 32 || ch >= 127) { On 2017/03/14 18:00:47, Karl wrote: > If you use a switch statement, either explicitly enumerate, or else put in > default clause. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:89: const char* use_asm = "use asm"; On 2017/03/14 18:00:47, Karl wrote: > Should this be a constexpr? Changed round. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:92: token_t och = stream_->Advance(); On 2017/03/14 13:36:37, vogelheim wrote: > och ? [here & below] Renamed and refactored. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:100: if (och != ch) { On 2017/03/14 11:11:47, marja wrote: > Lost here... what's this? > > Ahh, it's checking the ending quote. renamed variable to highlight that. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:121: if (ch == '/') { On 2017/03/14 13:36:37, vogelheim wrote: > +1 to Marja's comments. > > Also, would this work on /* .... **/ ? I take it the #118 would consume the > first '*', #120 the second '*', and then the next loop iteration would proceed > with the '/' and not recognize it as terminating the comment. Yeah, this was wrong. Factor to function and fixed, and a test added. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:122: if (ch == '*') { On 2017/03/14 11:11:47, marja wrote: > if ch == '/' on the line above, it cannot be '*' here. Oops, fixed. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:125: break; On 2017/03/14 11:11:47, marja wrote: > I'm lost here anyway, what's this block, what are we parsing here? Guessing that > /* comment */ but I don't understand how this is supposed to work. This was meant to back up if you saw a * inside a /* */, but it was wrong, fixed now (and added a test). https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:132: } else { On 2017/03/14 11:11:47, marja wrote: > No idea here anymore which if this else associates with. Restructured, should be more clear now. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:159: token_t ooch = stream_->Advance(); On 2017/03/14 13:36:37, vogelheim wrote: > ooch ? Hah, terrible name, sorry, dropped variable completely as it's more clear in context now. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:173: ch == '_' || ch == '$') { On 2017/03/14 13:36:37, vogelheim wrote: > Could you introduce helper functions for the character classes you're checking > here (and below)? Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:181: break; On 2017/03/14 11:11:47, marja wrote: > Why not > > while(ch >= ...) { > name_ += ch; > ch = stream_->Advance(); > } > > it's less nested. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:189: break; On 2017/03/14 11:11:47, marja wrote: > E.g,. here it would be less confusing to use return instead of break.... This is > quite nested and far away from the loop at the top. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:199: if (!local_) { On 2017/03/14 11:11:47, marja wrote: > What's local_? Renamed. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:226: if ((ch >= '0' && ch <= '9') || (ch >= 'a' && ch <= 'f') || On 2017/03/14 11:11:47, marja wrote: > Would it be feasible to have a helper function for scanning a number, which both > Scanner and this asm scanner could use? I've added a TODO to do this. Might require some care. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end); On 2017/03/14 13:36:37, vogelheim wrote: > strtod may depend on the current locale. Are you really sure that's what you > want? > > (I take it newer standards specify strtod as being "C"-locale only, but I'm not > sure we can rely on that.) Yeah, it's a fair point these probably aren't ideal (and might not even be that fast). I've added a TODO to share code with scanner.cc The more I look at scanner.cc, the more I've wondered if it might have been better to reuse it (it seems to have pretty reasonable performance choices). Are there any aspects of it likely to be much more expensive than what this scanner is doing? I had wanted to keep concerns separated as much as possible, but there's a good bit of duplication (especially if I start using UnicodeCache). I'm tempted to make the substitution now, but also worry that will churn the parser a good bit. If you guys are ok with it, I'm thinking of this for the moment, then refactoring it to reuse scanner to check there performance is ok? By the way, thanks for the reviews, sorry there's so much at once. Normally I'd have proceeded incrementally, but the point was to prove out that there's meaningful performance win, so I needed to have something complete enough to run a sizable program. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:263: unsigned_value_ = static_cast<uint32_t>(double_value_); On 2017/03/14 11:11:47, marja wrote: > Why strtod if it's guaranteed to be an integer (no dot)? Asm.js uses 1e2 for 100 (as an integer :-) Added a comment. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:267: if (end != name_.c_str() + name_.size()) { On 2017/03/14 13:36:37, vogelheim wrote: > I'm confused. When does this happen? When a number failed to parse, added a comment + example in the code. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:268: // Handle mistaken parse of '.' as number. On 2017/03/14 11:11:47, marja wrote: > How does this relate to the "Pick out dot" above? Reworded. The idea here is that if the number parse doesn't consume all the characters, something went wrong, except if the first character is a '.', in which case we back up and emit just the '.' token. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:292: rewind_ = true; On 2017/03/14 13:36:37, vogelheim wrote: > This doesn't update name_. Is this intentional? Clearing it for good measure here (didn't want to pay for it each turn of Next()). Added comments about Rewind + IdentifierString not working together. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:292: rewind_ = true; On 2017/03/14 13:36:37, vogelheim wrote: > This doesn't update preceeded_by_newline_. Is this intentional? Clobbering for good measure here, also commented about this limitation. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:301: chname[0] = static_cast<char>(token); On 2017/03/14 13:36:37, vogelheim wrote: > chname[1] = '\0' ?? Done. Whoops. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:304: for (auto i = local_names_.begin(); i != local_names_.end(); ++i) { On 2017/03/14 13:36:37, vogelheim wrote: > style nitpick: I'd use the for(auto& i : local_names_) for a vanilla iteration > that can be expressed that way. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:12: #include "src/objects.h" On 2017/03/14 11:11:47, marja wrote: > Nit: You shouldn't need objects.h and objects-inl.h, forward declarations should > be enough. You need handles.h though. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h" On 2017/03/14 11:11:47, marja wrote: > Can you split out the stream from scanner.h so that you don't need to include > scanner.h? Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h" On 2017/03/14 13:36:37, vogelheim wrote: > Forward declare v8::internal::Utf16CharacterStream ? Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:20: typedef intptr_t token_t; On 2017/03/14 13:36:38, vogelheim wrote: > Why intptr_t? Is it intentional that the token range is platform dependent? > (32/64b) Not particularly, other than that the string table size could in principle push things up to container size. Should never be that large in practice. Changed to int32_t and added overflow checks. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:22: explicit AsmJsLexer(Isolate* isolate, Handle<Script> script, int start, On 2017/03/14 13:36:38, vogelheim wrote: > nitpick: Drop explicit with multi-arg constructor. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; } On 2017/03/14 13:36:37, vogelheim wrote: > name() / name_ appears to be the current token buffer. If so, it should have a > different name. Divided up its uses, renamed the public part to IdentifierString() https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; } On 2017/03/14 13:36:37, vogelheim wrote: > What is the intended use of name()? > > I take it the point of this lexer implementation was that all identifiers are > assigned to unique token value so that the parser won't have to consider them. So the vast majority of places just the token id can be used. However in a few places (exports, imports), the string of the name is used to decide names in the generated code. But you know there from context you'll need it as you parse, so no need to carry it around with each token in general. I support I could make lookup more efficient, but figured just keeping the last one works too. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:26: void SetLocalScope(bool local) { local_ = local; } On 2017/03/14 13:36:38, vogelheim wrote: > What does this do? (I mean, what's a local scope, and how does lexing for a > local scope differ?) Renamed, commented, and clarified. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:31: int position() const; On 2017/03/14 18:00:47, Karl wrote: > Why is this method lower case when other methods start with a capital? Changed. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:54: enum { On 2017/03/14 13:36:37, vogelheim wrote: > This enum has a lot of implied structure which is relied on in other methods, > but isn't document anywhere. You may want to add a comment explaining the value > ranges and their meaning/properties. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:63: LONG_SYMBOL_NAME_LIST(V) On 2017/03/14 13:36:38, vogelheim wrote: > formatting nitpick: The indent is funky. Slightly rewrite (I think empty lines > will yield a different result) or consider guiding clang-format w/ '// > clang-format on|off' comments. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n... src/asmjs/asm-lexer.h:78: token_t last_token_; On 2017/03/14 13:36:37, vogelheim wrote: > last_token_ -> preceding_token_? > > (I take it it's the token we processed before the current_token_, rather than > the last token in the stream.) Yes. Renamed. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h File src/asmjs/asm-names.h (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n... src/asmjs/asm-names.h:18: #define STDLIB_MATH_FUNCTION_MONOMORPHIC_LIST(V) \ On 2017/03/14 18:00:47, Karl wrote: > Consider adding a comment describing what each parameter holds? > > This also applies to the V macros below. Done. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n... src/asmjs/asm-names.h:74: V(return ) \ On 2017/03/14 11:11:48, marja wrote: > Nit: extra space? Done. https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... File test/cctest/asmjs/test-asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors. All rights reserved. On 2017/03/14 13:36:38, vogelheim wrote: > nitpick: 2017 Done. https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-... test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors. All rights reserved. On 2017/03/14 13:36:38, vogelheim wrote: > Please consider using unittests/.., rather than cctest/... > > In this case, this would allow you to use test fixtures (TEST_F), which would > allow to replace the rather gratuitous use of macros with a test fixture + > regular mthods. Done. Thanks for the suggestion!
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win_compile_dbg on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/3...)
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win_compile_dbg on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/3...)
lgtm Generally looks good. I think the strtod issue is a real bug, though. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end); On 2017/03/15 07:53:03, bradn wrote: > On 2017/03/14 13:36:37, vogelheim wrote: > > strtod may depend on the current locale. Are you really sure that's what you > > want? > > > > (I take it newer standards specify strtod as being "C"-locale only, but I'm not > > sure we can rely on that.) > > Yeah, it's a fair point these probably aren't ideal (and might not even be that > fast). > I've added a TODO to share code with scanner.cc Note that this is a correctness, not a performance issue. If the clib's strtod implementation is indeed LOCALE dependent, then this will fail in byzantine ways that are difficult to reproduce: The same program & same input will work fine on e.g. a US installation (where strtod would parse 1.23), while it would raise a syntax error on a German one (where strtod would reject 1.23, as it expects 1,23). (And yes, I've seen a real compiler be broken that way, where you had to change the locale away from de_DE in order to compile a program using floating point constants. Amusingly, the company making that compiler had "International" in their name.) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:10: #include "src/parsing/scanner-character-streams.h" I don't see scanner-character-streams.h being used. Am I missing something? https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:50: void AsmJsLexer::Next() { I find this method much nicer to read now. Thanks. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:65: if (Token() != 0) { nitpick: No real problem here, but this logic is a little weird. I think I'd either just handle a 0 token in Name, or just have the last else being an "else if (Token() != 0)". (Maybe Name(0) should be "{uninitialized}". I don't think 0 is a valid unicode code point.) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:89: if (ch == '\n') { nitpick: This is weird. If you have a switch-case anyhow, and if you want to treat one branch separately, why not just have a separate case for it? case '\n': preceded_... break; case ' ': case '\t': .... https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:130: } else if (ch >= 32 && ch < 127) { [Not sure this is an issue, but... ] How many of these are legitimate tokens anyhow? Alphabetic chars + numeric ones are handles above, as are some operators. So these would be some legitimate tokens (+, -, *, square + curly + round braces), but some others would just be syntax errors (#)? https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:185: if (token == kUnsigned) { Why not handle all of these inside the switch right above? https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:199: int AsmJsLexer::GetPosition() const { return static_cast<int>(stream_->pos()); } Does this work if rewind_ is set? If not, maybe add DCHECK(!rewind_). https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:212: identifier_string_ = ""; identifier_string_.clear(); (STL is bizarre, but.. The way you've written it is very clear to read, but unfortunately will actually call a copy operation, and will iterate over the empty input string, only to figure out that it's empty. Not sure if compilers are being clever about it, but .clear() always does the right thing.) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:342: if (ch == '*') { Your choice, but I think this if-branch would be a bit more compact if replaced with this: while (ch == '*') { ch = stream_->Advance(); if (ch == '/' || ch == kEndOfInput) return; } https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:350: } else if (ch == kEndOfInput) { I think this potentially swallows a syntax error w/ an unclosed comment at the end of a file. (E.g. xxx /* yyy <EOF>) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:37: // NOTE: Doesn't work with Rewind(). I strongly prefer DCHECKs to a comment, as those are harder to ignore and are arguably better at documenting the intent, too. Maybe add DCHECK(!rewind_) to both GetIdentifierString and IsPrecededByNewline. https://codereview.chromium.org/2751693002/diff/140001/test/unittests/unittes... File test/unittests/unittests.gyp (right): https://codereview.chromium.org/2751693002/diff/140001/test/unittests/unittes... test/unittests/unittests.gyp:11: 'asmjs/asm-lexer-unittest.cc', order nitpick: "api" < "asmjs"
https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#... src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end); On 2017/03/15 07:53:03, bradn wrote: > The more I look at scanner.cc, the more I've wondered if it might have been > better to reuse it (it seems to have pretty reasonable performance choices). Are > there any aspects of it likely to be much more expensive than what this scanner > is doing? > > I had wanted to keep concerns separated as much as possible, but there's a good > bit of duplication (especially if I start using UnicodeCache). > > I'm tempted to make the substitution now, but also worry that will churn the > parser a good bit. > If you guys are ok with it, I'm thinking of this for the moment, then > refactoring it to reuse scanner to check there performance is ok? My gut feeling is that parser & scanner are tightly coupled, and we should keep this work + existing parser either entirely separate or entirely integrated. Performance considerations: I suspect that for the asm.js case, both scanner & parser would pretty much always stay in the 'fast paths', and a re-write of those parts won't actually buy us that much overall. Note that what we call the 'parser' is really the parser + AST builder, and what we call the 'preparser' is the full parser + an 'empty backend'. The AST building is what I think you're trying to get rid of, and I indeed can't see any value in it for the asm.js-to-wasm conversion. I don't think the parsing part of the parser is actually that slow: A rough performance estimate of preparser-vs-parser is 4:1, meaning that ~75% of "parse" time is actually spent in the AST. If you were to implement a class AsmJsParser : public ParserBase<AsmJSParser> {...} similar to the PreParser, I'd expect the resulting parsing speed to be roughly comparable to the pre-parser, for an approximate 4x performance win. In fairness: ParserBase wasn't really meant to be used that way, so 1, it's a huge amount of boilerplate to re-use it and 2, there'd almost certainly be some corner cases we didn't think of that and that you'd stumble over and that would require additional work to resolve. Also, 3, I think you're already quite heavily invested into the idea of having a separate parser, so I'm not sure this is really up for consideration. One more thing: Notice that the current parser/lexer already performs similar tricks to yours, but somewhat differently: There are all-ASCII fast paths. Identifiers are being de-duped & internalized in the AstValueFactory, rather than in the Scanner itself, but in both cases you end up with one insertion into a table and then just comparing the int/pointer to do equality comparison.
More comments; I didn't have a detailed look but I think I don't need to, as vogelheim@ did. (No need to wait for my l-g-t-m.) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:9: #include "src/objects.h" Why is objects.h needed? https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:11: #include "src/parsing/scanner.h" Hmm, you're still including scanner.h even though my previous comment about it was marked done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:21: class AsmJsLexer { This class could use a comment explaining the language it lexes in more detail. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:25: AsmJsLexer(); Also, it's a bit confusing that we have Scanner (not Lexer) and AsmJsLexer. Is calling this AsmJsScanner feasible?
https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:163: return i.first.c_str(); Why not just: return i.first; https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:168: return i.first.c_str(); Same here. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:173: return i.first.c_str(); Same here.
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:9: #include "src/objects.h" On 2017/03/15 12:34:49, marja wrote: > Why is objects.h needed? There was a Handle<String> used in scanner.h inline. Moved into the .cc file. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:10: #include "src/parsing/scanner-character-streams.h" On 2017/03/15 12:07:41, vogelheim wrote: > I don't see scanner-character-streams.h being used. Am I missing something? Dropped. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:11: #include "src/parsing/scanner.h" On 2017/03/15 12:34:49, marja wrote: > Hmm, you're still including scanner.h even though my previous comment about it > was marked done. That was in the header. This is needed here because the Utf16CharacterStream's declaration is in scanner.h https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:50: void AsmJsLexer::Next() { On 2017/03/15 12:07:40, vogelheim wrote: > I find this method much nicer to read now. Thanks. :-) https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:65: if (Token() != 0) { On 2017/03/15 12:07:41, vogelheim wrote: > nitpick: No real problem here, but this logic is a little weird. I think I'd > either just handle a 0 token in Name, or just have the last else being an "else > if (Token() != 0)". > > (Maybe Name(0) should be "{uninitialized}". I don't think 0 is a valid unicode > code point.) Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:89: if (ch == '\n') { On 2017/03/15 12:07:41, vogelheim wrote: > nitpick: This is weird. If you have a switch-case anyhow, and if you want to > treat one branch separately, why not just have a separate case for it? > > case '\n': > preceded_... > break; > case ' ': > case '\t': > .... Hah, yeah good point (missed that in the refactor). Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:130: } else if (ch >= 32 && ch < 127) { On 2017/03/15 12:07:41, vogelheim wrote: > [Not sure this is an issue, but... ] > > How many of these are legitimate tokens anyhow? Alphabetic chars + numeric ones > are handles above, as are some operators. So these would be some legitimate > tokens (+, -, *, square + curly + round braces), but some others would just be > syntax errors (#)? Listed out the single char ones. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:163: return i.first.c_str(); On 2017/03/15 15:04:13, Karl wrote: > Why not just: > > return i.first; Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:168: return i.first.c_str(); On 2017/03/15 15:04:13, Karl wrote: > Same here. Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:173: return i.first.c_str(); On 2017/03/15 15:04:13, Karl wrote: > Same here. Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:185: if (token == kUnsigned) { On 2017/03/15 12:07:40, vogelheim wrote: > Why not handle all of these inside the switch right above? Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:199: int AsmJsLexer::GetPosition() const { return static_cast<int>(stream_->pos()); } On 2017/03/15 12:07:40, vogelheim wrote: > Does this work if rewind_ is set? If not, maybe add DCHECK(!rewind_). Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:212: identifier_string_ = ""; On 2017/03/15 12:07:41, vogelheim wrote: > identifier_string_.clear(); > > (STL is bizarre, but.. The way you've written it is very clear to read, but > unfortunately will actually call a copy operation, and will iterate over the > empty input string, only to figure out that it's empty. Not sure if compilers > are being clever about it, but .clear() always does the right thing.) Yep. Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:342: if (ch == '*') { On 2017/03/15 12:07:41, vogelheim wrote: > Your choice, but I think this if-branch would be a bit more compact if replaced > with this: > > while (ch == '*') { > ch = stream_->Advance(); > if (ch == '/' || ch == kEndOfInput) > return; > } Ah, yeah, that's better. Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:350: } else if (ch == kEndOfInput) { On 2017/03/15 12:07:40, vogelheim wrote: > I think this potentially swallows a syntax error w/ an unclosed comment at the > end of a file. (E.g. xxx /* yyy <EOF>) Ah, yes. Fixed and added a test. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h File src/asmjs/asm-lexer.h (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:21: class AsmJsLexer { On 2017/03/15 12:34:50, marja wrote: > This class could use a comment explaining the language it lexes in more detail. Done. https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:25: AsmJsLexer(); On 2017/03/15 12:34:50, marja wrote: > Also, it's a bit confusing that we have Scanner (not Lexer) and AsmJsLexer. Is > calling this AsmJsScanner feasible? Renamed https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.h#... src/asmjs/asm-lexer.h:37: // NOTE: Doesn't work with Rewind(). On 2017/03/15 12:07:41, vogelheim wrote: > I strongly prefer DCHECKs to a comment, as those are harder to ignore and are > arguably better at documenting the intent, too. Maybe add DCHECK(!rewind_) to > both GetIdentifierString and IsPrecededByNewline. Done. https://codereview.chromium.org/2751693002/diff/140001/test/unittests/unittes... File test/unittests/unittests.gyp (right): https://codereview.chromium.org/2751693002/diff/140001/test/unittests/unittes... test/unittests/unittests.gyp:11: 'asmjs/asm-lexer-unittest.cc', On 2017/03/15 12:07:41, vogelheim wrote: > order nitpick: "api" < "asmjs" Done.
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...)
lgtm - except for the thing in parsing/scanner.cc; I don't get that. The bots appear to be unhappy, but those look like minor issues. (A static_cast in one case, and maybe some V8_EXPORT_PRIVATE in the other.) Thanks for patiently working through the many comments! If Karl is happy, too, we should be good to go. https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.cc File src/asmjs/asm-scanner.cc (right): https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.... src/asmjs/asm-scanner.cc:333: if (ch == kEndOfInput) { I think you can just drop this if. If ch is kEndOfInput here, it would exit the while loop and run into the if right after it, which does the same check. https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.h File src/asmjs/asm-scanner.h (right): https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.... src/asmjs/asm-scanner.h:44: // Get raw string for current indentifier. indentifier -> identifier Unless it's a string that consists of whitespace only. :-) https://codereview.chromium.org/2751693002/diff/200001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/2751693002/diff/200001/src/flag-definitions.h... src/flag-definitions.h:551: DEFINE_IMPLICATION(fast_validate_asm, validate_asm) I don't see these being used in this CL. Is this intentional? (Should be fine, since you'll almost certainly need those flags in a subsequent CL anyhow.) https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc File src/parsing/scanner.cc (right): https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc... src/parsing/scanner.cc:1164: next_.location.beg_pos = source_pos() - 2; // We already consumed } wut? I don't understand this change. https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc... src/parsing/scanner.cc:1168: Handle<String> Scanner::SourceUrl(Isolate* isolate) const { Is this because we previously relied on v8::internal::String being declared outside of scanner.h? (Either way, thanks. We have way too much stuff in the headers anyhow.) https://codereview.chromium.org/2751693002/diff/200001/src/v8.gyp File src/v8.gyp (right): https://codereview.chromium.org/2751693002/diff/200001/src/v8.gyp#newcode422 src/v8.gyp:422: 'asmjs/asm-names.h', order nitpick: "names" < "scanner" https://codereview.chromium.org/2751693002/diff/200001/test/unittests/unittes... File test/unittests/unittests.gyp (right): https://codereview.chromium.org/2751693002/diff/200001/test/unittests/unittes... test/unittests/unittests.gyp:11: 'asmjs/asm-scanner-unittest.cc', api < asm
lgtm. The code looks good, except for the small issues noted by vogelheim.
The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.cc File src/asmjs/asm-scanner.cc (right): https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.... src/asmjs/asm-scanner.cc:333: if (ch == kEndOfInput) { On 2017/03/16 12:46:47, vogelheim wrote: > I think you can just drop this if. > > If ch is kEndOfInput here, it would exit the while loop and run into the if > right after it, which does the same check. Done. https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.h File src/asmjs/asm-scanner.h (right): https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.... src/asmjs/asm-scanner.h:44: // Get raw string for current indentifier. On 2017/03/16 12:46:48, vogelheim wrote: > indentifier -> identifier > > Unless it's a string that consists of whitespace only. :-) Done. https://codereview.chromium.org/2751693002/diff/200001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/2751693002/diff/200001/src/flag-definitions.h... src/flag-definitions.h:551: DEFINE_IMPLICATION(fast_validate_asm, validate_asm) On 2017/03/16 12:46:48, vogelheim wrote: > I don't see these being used in this CL. Is this intentional? > > (Should be fine, since you'll almost certainly need those flags in a subsequent > CL anyhow.) Bad merge, will remove for this CL. https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc File src/parsing/scanner.cc (right): https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc... src/parsing/scanner.cc:1164: next_.location.beg_pos = source_pos() - 2; // We already consumed } On 2017/03/16 12:46:48, vogelheim wrote: > wut? I don't understand this change. Mismatch on my two branches, removed. https://codereview.chromium.org/2751693002/diff/200001/src/parsing/scanner.cc... src/parsing/scanner.cc:1168: Handle<String> Scanner::SourceUrl(Isolate* isolate) const { On 2017/03/16 12:46:48, vogelheim wrote: > Is this because we previously relied on v8::internal::String being declared > outside of scanner.h? > > (Either way, thanks. We have way too much stuff in the headers anyhow.) Yep. https://codereview.chromium.org/2751693002/diff/200001/src/v8.gyp File src/v8.gyp (right): https://codereview.chromium.org/2751693002/diff/200001/src/v8.gyp#newcode422 src/v8.gyp:422: 'asmjs/asm-names.h', On 2017/03/16 12:46:48, vogelheim wrote: > order nitpick: "names" < "scanner" Done. https://codereview.chromium.org/2751693002/diff/200001/test/unittests/unittes... File test/unittests/unittests.gyp (right): https://codereview.chromium.org/2751693002/diff/200001/test/unittests/unittes... test/unittests/unittests.gyp:11: 'asmjs/asm-scanner-unittest.cc', On 2017/03/16 12:46:48, vogelheim wrote: > api < asm Done.
https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc... src/asmjs/asm-lexer.cc:11: #include "src/parsing/scanner.h" On 2017/03/16 00:21:47, bradn wrote: > On 2017/03/15 12:34:49, marja wrote: > > Hmm, you're still including scanner.h even though my previous comment about it > > was marked done. > > That was in the header. > This is needed here because the Utf16CharacterStream's declaration is in > scanner.h My orig. comment suggested moving the streams out of scanner.h. It's weird that a complete separate scanner needs to include scanner.h. But as I can see this CL is close to landing, would doing a follow-up be OK?
The CQ bit was unchecked by bradnelson@google.com
The CQ bit was checked by bradnelson@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to the patchset: https://codereview.chromium.org/2751693002/#ps220001 (title: "fix")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by bradnelson@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to the patchset: https://codereview.chromium.org/2751693002/#ps240001 (title: "fix")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...) v8_linux64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_rel_ng/builds/22707) v8_linux64_verify_csa_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_verify_csa_rel_n...) v8_mac_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_mac_rel_ng/builds/19195)
The CQ bit was checked by bradnelson@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to the patchset: https://codereview.chromium.org/2751693002/#ps260001 (title: "fix")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 260001, "attempt_start_ts": 1489684914090050, "parent_rev": "18c77ce51bcfd20f7821aad581749f7c15fb550a", "commit_rev": "4c3217e13292a3aab056b5800678754c8dac8bfe"}
Message was sent while issue was closed.
Description was changed from ========== [wasm][asm.js] Adding custom asm.js lexer. Adding a custom lexer for asm.js parsing. It takes advantage of a number of asm.js properties to simply things: * Assumes 'use asm' is the only string. * Does not handle unicode for now (tools don't emit it). * Combines global + local string table with lexer. R=marja@chromium.org,vogelheim@chromium.org,kschimpf@chromium.org BUG=v8:4203 BUG=v8:6090 ========== to ========== [wasm][asm.js] Adding custom asm.js lexer. Adding a custom lexer for asm.js parsing. It takes advantage of a number of asm.js properties to simply things: * Assumes 'use asm' is the only string. * Does not handle unicode for now (tools don't emit it). * Combines global + local string table with lexer. R=marja@chromium.org,vogelheim@chromium.org,kschimpf@chromium.org BUG=v8:4203 BUG=v8:6090 Review-Url: https://codereview.chromium.org/2751693002 Cr-Commit-Position: refs/heads/master@{#43874} Committed: https://chromium.googlesource.com/v8/v8/+/4c3217e13292a3aab056b5800678754c8da... ==========
Message was sent while issue was closed.
Committed patchset #13 (id:260001) as https://chromium.googlesource.com/v8/v8/+/4c3217e13292a3aab056b5800678754c8da...
Message was sent while issue was closed.
Some post-commit stuff I noticed when reviewing the parser part.. https://codereview.chromium.org/2751693002/diff/260001/src/asmjs/asm-scanner.h File src/asmjs/asm-scanner.h (right): https://codereview.chromium.org/2751693002/diff/260001/src/asmjs/asm-scanner.... src/asmjs/asm-scanner.h:100: // [-10000 .. -10000-kMaxIdentifierCount) :: Local identifiers Nit: you probably meant: [-10000-kMaxIdentifierCount, -10000)
Message was sent while issue was closed.
https://codereview.chromium.org/2770803002/
Message was sent while issue was closed.
I meant: https://codereview.chromium.org/2769013002 |