Issue 2751693002: [wasm][asm.js] Adding custom asm.js lexer.

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-14 05:19:25 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/1

3 years, 9 months ago (2017-03-14 05:19:36 UTC) #2

bradnelson

Carved off a part of this: https://codereview.chromium.org/2733743002/ I've added some amount of tests, but will ...

3 years, 9 months ago (2017-03-14 05:21:41 UTC) #3

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-14 05:21:44 UTC) #4

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14619) v8_win_nosnap_shared_rel_ng on ...

3 years, 9 months ago (2017-03-14 05:21:45 UTC) #5

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-14 05:23:34 UTC) #6

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/20001

3 years, 9 months ago (2017-03-14 05:23:44 UTC) #7

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-14 05:27:34 UTC) #8

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14621)

3 years, 9 months ago (2017-03-14 05:27:35 UTC) #9

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-14 05:28:20 UTC) #10

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/40001

3 years, 9 months ago (2017-03-14 05:28:25 UTC) #11

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-14 05:57:34 UTC) #12

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 9 months ago (2017-03-14 05:57:35 UTC) #13

marja

Some initial comments. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#newcode80 src/asmjs/asm-lexer.cc:80: } else if (ch == kEndOfInput) ...

3 years, 9 months ago (2017-03-14 11:11:48 UTC) #14

vogelheim

I'll assume kschimpf@ is the main reviewer for this CL, as I have no insight ...

3 years, 9 months ago (2017-03-14 13:36:38 UTC) #15

Karl

kschimpf@google.com changed reviewers: + kschimpf@google.com

3 years, 9 months ago (2017-03-14 18:00:46 UTC) #16

Karl

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#newcode74 src/asmjs/asm-lexer.cc:74: if (ch == ' ' || ch == '\t' ...

3 years, 9 months ago (2017-03-14 18:00:47 UTC) #17

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-15 07:24:12 UTC) #18

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/60001

3 years, 9 months ago (2017-03-15 07:24:16 UTC) #19

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-15 07:26:26 UTC) #20

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14726) v8_linux_arm_rel_ng on ...

3 years, 9 months ago (2017-03-15 07:26:27 UTC) #21

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-15 07:36:24 UTC) #23

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/80001

3 years, 9 months ago (2017-03-15 07:36:31 UTC) #24

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-15 07:39:41 UTC) #25

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_win64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/24134) v8_win_nosnap_shared_rel_ng on ...

3 years, 9 months ago (2017-03-15 07:39:42 UTC) #26

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-15 07:46:25 UTC) #27

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/100001

3 years, 9 months ago (2017-03-15 07:46:33 UTC) #28

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-15 07:51:02 UTC) #29

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14730)

3 years, 9 months ago (2017-03-15 07:51:03 UTC) #30

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-15 07:52:16 UTC) #31

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/120001

3 years, 9 months ago (2017-03-15 07:52:21 UTC) #32

bradn

bradnelson@google.com changed reviewers: + bradnelson@google.com

3 years, 9 months ago (2017-03-15 07:53:00 UTC) #33

bradn

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#newcode57 src/asmjs/asm-lexer.cc:57: #if 0 On 2017/03/14 13:36:37, vogelheim wrote: > Please ...

3 years, 9 months ago (2017-03-15 07:53:04 UTC) #34

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc
File src/asmjs/asm-lexer.cc (right):

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:57: #if 0
On 2017/03/14 13:36:37, vogelheim wrote:
> Please don't do this.

Changed to a trace flag.
This ends up being useful to figure out context when parsing falls over.
Some of the asm.js programs out there like the unity benchmark don't  have the
asm.js source sitting in a file, but instead decompress it at runtime and eval
it (so line numbers etc don't help much).

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:73: token_t ch = stream_->Advance();
On 2017/03/14 13:36:37, vogelheim wrote:
> (Here & below.) Using token_t for individual characters seems kinda weird,
> particularly given that token_t value ranges seem to have very specific
meaning
> which is mostly not related to the unicode code point at that number.

Switched all of these inside to uc32.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:74: if (ch == ' ' || ch == '\t' || ch == '\n' || ch ==
'\r') {
On 2017/03/14 18:00:47, Karl wrote:
> Would a switch statement be cleaner here?

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:80: } else if (ch == kEndOfInput) {
On 2017/03/14 11:11:47, marja wrote:
> (general comment) The structure is getting a bit confusing... it's hard to
keep
> track of which ifs / elses we're in.
> 
> Adding some comments explaining which construct we're currently parsing could
> help; now the reader needs to read a couple of if's deep to even figure that
out
> :)

Decomposed into more functions, hope that helps.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:82: break;
On 2017/03/14 11:11:47, marja wrote:
> Why break, why not return? (Now it's not trivial to see, when we're somewhere
> deeper, that we actually don't do anything more in this func.)

Redone in more functions, avoids the break.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:83: } else if (ch < 32 || ch >= 127) {
On 2017/03/14 18:00:47, Karl wrote:
> If you use a switch statement, either explicitly enumerate, or else put in
> default clause.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:89: const char* use_asm = "use asm";
On 2017/03/14 18:00:47, Karl wrote:
> Should this be a constexpr?

Changed round.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:92: token_t och = stream_->Advance();
On 2017/03/14 13:36:37, vogelheim wrote:
> och ? [here & below]

Renamed and refactored.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:100: if (och != ch) {
On 2017/03/14 11:11:47, marja wrote:
> Lost here... what's this?
> 
> Ahh, it's checking the ending quote.

renamed variable to highlight that.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:121: if (ch == '/') {
On 2017/03/14 13:36:37, vogelheim wrote:
> +1 to Marja's comments. 
> 
> Also, would this work on /* .... **/ ? I take it the #118 would consume the
> first '*', #120 the second '*', and then the next loop iteration would proceed
> with the '/' and not recognize it as terminating the comment.

Yeah, this was wrong. Factor to function and fixed, and a test added.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:122: if (ch == '*') {
On 2017/03/14 11:11:47, marja wrote:
> if ch == '/' on the line above, it cannot be '*' here.

Oops, fixed.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:125: break;
On 2017/03/14 11:11:47, marja wrote:
> I'm lost here anyway, what's this block, what are we parsing here? Guessing
that
> /* comment */ but I don't understand how this is supposed to work.

This was meant to back up if you saw a * inside a /* */, but it was wrong, fixed
now (and added a test).

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:132: } else {
On 2017/03/14 11:11:47, marja wrote:
> No idea here anymore which if this else associates with.

Restructured, should be more clear now.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:159: token_t ooch = stream_->Advance();
On 2017/03/14 13:36:37, vogelheim wrote:
> ooch ?

Hah, terrible name, sorry, dropped variable completely as it's more clear in
context now.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:173: ch == '_' || ch == '$') {
On 2017/03/14 13:36:37, vogelheim wrote:
> Could you introduce helper functions for the character classes you're checking
> here (and below)?

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:181: break;
On 2017/03/14 11:11:47, marja wrote:
> Why not
> 
> while(ch >= ...) {
>   name_ += ch;
>   ch = stream_->Advance();
> }
> 
> it's less nested.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:189: break;
On 2017/03/14 11:11:47, marja wrote:
> E.g,. here it would be less confusing to use return instead of break.... This
is
> quite nested and far away from the loop at the top.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:199: if (!local_) {
On 2017/03/14 11:11:47, marja wrote:
> What's local_?

Renamed.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:226: if ((ch >= '0' && ch <= '9') || (ch >= 'a' && ch <=
'f') ||
On 2017/03/14 11:11:47, marja wrote:
> Would it be feasible to have a helper function for scanning a number, which
both
> Scanner and this asm scanner could use?

I've added a TODO to do this.
Might require some care.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end);
On 2017/03/14 13:36:37, vogelheim wrote:
> strtod may depend on the current locale. Are you really sure that's what you
> want?
> 
> (I take it newer standards specify strtod as being "C"-locale only, but I'm
not
> sure we can rely on that.)

Yeah, it's a fair point these probably aren't ideal (and might not even be that
fast).
I've added a TODO to share code with scanner.cc

The more I look at scanner.cc, the more I've wondered if it might have been
better to reuse it (it seems to have pretty reasonable performance choices). Are
there any aspects of it likely to be much more expensive than what this scanner
is doing?

I had wanted to keep concerns separated as much as possible, but there's a good
bit of duplication (especially if I start using UnicodeCache).

I'm tempted to make the substitution now, but also worry that will churn the
parser a good bit.
If you guys are ok with it, I'm thinking of this for the moment, then
refactoring it to reuse scanner to check there performance is ok?

By the way, thanks for the reviews, sorry there's so much at once.
Normally I'd have proceeded incrementally, but the point was to prove out that
there's meaningful performance win, so I needed to have something complete
enough to run a sizable program.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:263: unsigned_value_ =
static_cast<uint32_t>(double_value_);
On 2017/03/14 11:11:47, marja wrote:
> Why strtod if it's guaranteed to be an integer (no dot)?

Asm.js uses 1e2 for 100 (as an integer :-)
Added a comment.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:267: if (end != name_.c_str() + name_.size()) {
On 2017/03/14 13:36:37, vogelheim wrote:
> I'm confused. When does this happen?

When a number failed to parse, added a comment + example in the code.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:268: // Handle mistaken parse of '.' as number.
On 2017/03/14 11:11:47, marja wrote:
> How does this relate to the "Pick out dot" above?

Reworded.
The idea here is that if the number parse doesn't consume all the characters,
something went wrong, except if the first character is a '.', in which case we
back up and emit just the '.' token.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:292: rewind_ = true;
On 2017/03/14 13:36:37, vogelheim wrote:
> This doesn't update name_. Is this intentional?

Clearing it for good measure here (didn't want to pay for it each turn of
Next()).

Added comments about Rewind + IdentifierString not working together.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:292: rewind_ = true;
On 2017/03/14 13:36:37, vogelheim wrote:
> This doesn't update preceeded_by_newline_. Is this intentional?

Clobbering for good measure here, also commented about this limitation.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:301: chname[0] = static_cast<char>(token);
On 2017/03/14 13:36:37, vogelheim wrote:
> chname[1] = '\0' ??

Done.
Whoops.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:304: for (auto i = local_names_.begin(); i !=
local_names_.end(); ++i) {
On 2017/03/14 13:36:37, vogelheim wrote:
> style nitpick: I'd use the for(auto& i : local_names_) for a vanilla iteration
> that can be expressed that way.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h
File src/asmjs/asm-lexer.h (right):

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:12: #include "src/objects.h"
On 2017/03/14 11:11:47, marja wrote:
> Nit: You shouldn't need objects.h and objects-inl.h, forward declarations
should
> be enough. You need handles.h though.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h"
On 2017/03/14 11:11:47, marja wrote:
> Can you split out the stream from scanner.h so that you don't need to include
> scanner.h?

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:13: #include "src/parsing/scanner.h"
On 2017/03/14 13:36:37, vogelheim wrote:
> Forward declare v8::internal::Utf16CharacterStream ?

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:20: typedef intptr_t token_t;
On 2017/03/14 13:36:38, vogelheim wrote:
> Why intptr_t? Is it intentional that the token range is platform dependent?
> (32/64b)

Not particularly, other than that the string table size could in principle push
things up to container size. Should never be that large in practice. Changed to
int32_t and added overflow checks.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:22: explicit AsmJsLexer(Isolate* isolate, Handle<Script>
script, int start,
On 2017/03/14 13:36:38, vogelheim wrote:
> nitpick: Drop explicit with multi-arg constructor.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; }
On 2017/03/14 13:36:37, vogelheim wrote:
> name() / name_ appears to be the current token buffer. If so, it should have a
> different name.

Divided up its uses, renamed the public part to IdentifierString()

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:25: const std::string& name() const { return name_; }
On 2017/03/14 13:36:37, vogelheim wrote:
> What is the intended use of name()?
> 
> I take it the point of this lexer implementation was that all identifiers are
> assigned to unique token value so that the parser won't have to consider them.

So the vast majority of places just the token id can be used.
However in a few places (exports, imports), the string of the name is used to
decide names in the generated code. But you know there from context you'll need
it as you parse, so no need to carry it around with each token in general.
I support I could make lookup more efficient, but figured just keeping the last
one works too.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:26: void SetLocalScope(bool local) { local_ = local; }
On 2017/03/14 13:36:38, vogelheim wrote:
> What does this do? (I mean, what's a local scope, and how does lexing for a
> local scope differ?)

Renamed, commented, and clarified.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:31: int position() const;
On 2017/03/14 18:00:47, Karl wrote:
> Why is this method lower case when other methods start with a capital?

Changed.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:54: enum {
On 2017/03/14 13:36:37, vogelheim wrote:
> This enum has a lot of implied structure which is relied on in other methods,
> but isn't document anywhere. You may want to add a comment explaining the
value
> ranges and their meaning/properties.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:63: LONG_SYMBOL_NAME_LIST(V)
On 2017/03/14 13:36:38, vogelheim wrote:
> formatting nitpick: The indent is funky. Slightly rewrite (I think empty lines
> will yield a different result) or consider guiding clang-format w/ '//
> clang-format on|off' comments.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.h#n...
src/asmjs/asm-lexer.h:78: token_t last_token_;
On 2017/03/14 13:36:37, vogelheim wrote:
> last_token_ -> preceding_token_?
> 
> (I take it it's the token we processed before the current_token_, rather than
> the last token in the stream.)

Yes.
Renamed.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h
File src/asmjs/asm-names.h (right):

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n...
src/asmjs/asm-names.h:18: #define STDLIB_MATH_FUNCTION_MONOMORPHIC_LIST(V) \
On 2017/03/14 18:00:47, Karl wrote:
> Consider adding a comment describing what each parameter holds?
> 
> This also applies to the V macros below.

Done.

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-names.h#n...
src/asmjs/asm-names.h:74: V(return )                 \
On 2017/03/14 11:11:48, marja wrote:
> Nit: extra space?

Done.

https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-...
File test/cctest/asmjs/test-asm-lexer.cc (right):

https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-...
test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors.
All rights reserved.
On 2017/03/14 13:36:38, vogelheim wrote:
> nitpick: 2017

Done.

https://codereview.chromium.org/2751693002/diff/40001/test/cctest/asmjs/test-...
test/cctest/asmjs/test-asm-lexer.cc:1: // Copyright 2016 the V8 project authors.
All rights reserved.
On 2017/03/14 13:36:38, vogelheim wrote:
> Please consider using unittests/.., rather than cctest/...
> 
> In this case, this would allow you to use test fixtures (TEST_F), which would
> allow to replace the rather gratuitous use of macros with a test fixture +
> regular mthods.

Done.
Thanks for the suggestion!

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-15 07:55:51 UTC) #35

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_win_compile_dbg on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/34188)

3 years, 9 months ago (2017-03-15 07:55:53 UTC) #36

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-15 08:03:43 UTC) #37

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/140001

3 years, 9 months ago (2017-03-15 08:03:53 UTC) #38

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-15 08:10:43 UTC) #39

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_win_compile_dbg on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/34191)

3 years, 9 months ago (2017-03-15 08:10:44 UTC) #40

vogelheim

lgtm Generally looks good. I think the strtod issue is a real bug, though. https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc ...

3 years, 9 months ago (2017-03-15 12:07:41 UTC) #41

vogelheim

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#newcode254 src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end); On 2017/03/15 07:53:03, bradn wrote: ...

3 years, 9 months ago (2017-03-15 12:10:11 UTC) #42

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc
File src/asmjs/asm-lexer.cc (right):

https://codereview.chromium.org/2751693002/diff/40001/src/asmjs/asm-lexer.cc#...
src/asmjs/asm-lexer.cc:254: double_value_ = strtod(name_.c_str(), &end);
On 2017/03/15 07:53:03, bradn wrote:
> The more I look at scanner.cc, the more I've wondered if it might have been
> better to reuse it (it seems to have pretty reasonable performance choices).
Are
> there any aspects of it likely to be much more expensive than what this
scanner
> is doing?
> 
> I had wanted to keep concerns separated as much as possible, but there's a
good
> bit of duplication (especially if I start using UnicodeCache).
> 
> I'm tempted to make the substitution now, but also worry that will churn the
> parser a good bit.
> If you guys are ok with it, I'm thinking of this for the moment, then
> refactoring it to reuse scanner to check there performance is ok?

My gut feeling is that parser & scanner are tightly coupled, and we should keep
this work + existing parser either entirely separate or entirely integrated.

Performance considerations: I suspect that for the asm.js case, both scanner &
parser would pretty much always stay in the 'fast paths', and a re-write of
those parts won't actually buy us that much overall.

Note that what we call the 'parser' is really the parser + AST builder, and what
we call the 'preparser' is the full parser + an 'empty backend'. The AST
building is what I think you're trying to get rid of, and I indeed can't see any
value in it for the asm.js-to-wasm conversion.

I don't think the parsing part of the parser is actually that slow: A rough
performance estimate of preparser-vs-parser is 4:1, meaning that ~75% of "parse"
time is actually spent in the AST. If you were to implement a
  class AsmJsParser : public ParserBase<AsmJSParser> {...}
similar to the PreParser, I'd expect the resulting parsing speed to be roughly
comparable to the pre-parser, for an approximate 4x performance win.

In fairness: ParserBase wasn't really meant to be used that way, so 1, it's a
huge amount of boilerplate to re-use it and 2, there'd almost certainly be some
corner cases we didn't think of that and that you'd stumble over and that would
require additional work to resolve. Also, 3, I think you're already quite
heavily invested into the idea of having a separate parser, so I'm not sure this
is really up for consideration.

One more thing: Notice that the current parser/lexer already performs similar
tricks to yours, but somewhat differently: There are all-ASCII fast paths.
Identifiers are being de-duped & internalized in the AstValueFactory, rather
than in the Scanner itself, but in both cases you end up with one insertion into
a table and then just comparing the int/pointer to do equality comparison.

marja

More comments; I didn't have a detailed look but I think I don't need to, ...

3 years, 9 months ago (2017-03-15 12:34:50 UTC) #43

Karl

https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc#newcode163 src/asmjs/asm-lexer.cc:163: return i.first.c_str(); Why not just: return i.first; https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc#newcode168 src/asmjs/asm-lexer.cc:168: ...

3 years, 9 months ago (2017-03-15 15:04:13 UTC) #44

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-16 00:21:12 UTC) #45

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/160001

3 years, 9 months ago (2017-03-16 00:21:20 UTC) #46

bradn

PTAL https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc#newcode9 src/asmjs/asm-lexer.cc:9: #include "src/objects.h" On 2017/03/15 12:34:49, marja wrote: > ...

3 years, 9 months ago (2017-03-16 00:21:47 UTC) #47

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-16 00:22:44 UTC) #48

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/180001

3 years, 9 months ago (2017-03-16 00:22:50 UTC) #49

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-16 00:24:31 UTC) #50

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/200001

3 years, 9 months ago (2017-03-16 00:24:40 UTC) #51

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-16 00:28:50 UTC) #52

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14831)

3 years, 9 months ago (2017-03-16 00:28:51 UTC) #53

vogelheim

lgtm - except for the thing in parsing/scanner.cc; I don't get that. The bots appear ...

3 years, 9 months ago (2017-03-16 12:46:48 UTC) #54

Karl

lgtm. The code looks good, except for the small issues noted by vogelheim.

3 years, 9 months ago (2017-03-16 15:33:16 UTC) #55

bradnelson

The CQ bit was checked by bradnelson@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-16 17:02:25 UTC) #56

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/220001

3 years, 9 months ago (2017-03-16 17:02:27 UTC) #57

bradn

https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.cc File src/asmjs/asm-scanner.cc (right): https://codereview.chromium.org/2751693002/diff/200001/src/asmjs/asm-scanner.cc#newcode333 src/asmjs/asm-scanner.cc:333: if (ch == kEndOfInput) { On 2017/03/16 12:46:47, vogelheim ...

3 years, 9 months ago (2017-03-16 17:03:16 UTC) #58

marja

https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc File src/asmjs/asm-lexer.cc (right): https://codereview.chromium.org/2751693002/diff/140001/src/asmjs/asm-lexer.cc#newcode11 src/asmjs/asm-lexer.cc:11: #include "src/parsing/scanner.h" On 2017/03/16 00:21:47, bradn wrote: > On ...

3 years, 9 months ago (2017-03-16 17:05:33 UTC) #59

bradn

The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to ...

3 years, 9 months ago (2017-03-16 17:09:31 UTC) #62

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/220001

3 years, 9 months ago (2017-03-16 17:09:39 UTC) #63

bradn

The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to ...

3 years, 9 months ago (2017-03-16 17:13:04 UTC) #65

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/240001

3 years, 9 months ago (2017-03-16 17:13:11 UTC) #66

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-16 17:15:40 UTC) #67

commit-bot: I haz the power

Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/builds/14918) v8_linux64_rel_ng on master.tryserver.v8 (JOB_FAILED, ...

3 years, 9 months ago (2017-03-16 17:15:41 UTC) #68

bradn

The patchset sent to the CQ was uploaded after l-g-t-m from kschimpf@google.com, vogelheim@chromium.org Link to ...

3 years, 9 months ago (2017-03-16 17:21:54 UTC) #70

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2751693002/260001

3 years, 9 months ago (2017-03-16 17:22:02 UTC) #71

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 260001, "attempt_start_ts": 1489684914090050, "parent_rev": "18c77ce51bcfd20f7821aad581749f7c15fb550a", "commit_rev": "4c3217e13292a3aab056b5800678754c8dac8bfe"}

3 years, 9 months ago (2017-03-16 18:10:46 UTC) #72

commit-bot: I haz the power

Description was changed from ========== [wasm][asm.js] Adding custom asm.js lexer. Adding a custom lexer for ...

3 years, 9 months ago (2017-03-16 18:10:53 UTC) #73

commit-bot: I haz the power

Committed patchset #13 (id:260001) as https://chromium.googlesource.com/v8/v8/+/4c3217e13292a3aab056b5800678754c8dac8bfe

3 years, 9 months ago (2017-03-16 18:10:54 UTC) #74

marja

Some post-commit stuff I noticed when reviewing the parser part.. https://codereview.chromium.org/2751693002/diff/260001/src/asmjs/asm-scanner.h File src/asmjs/asm-scanner.h (right): https://codereview.chromium.org/2751693002/diff/260001/src/asmjs/asm-scanner.h#newcode100 ...

3 years, 9 months ago (2017-03-20 14:29:24 UTC) #75

Message was sent while issue was closed.

I meant: https://codereview.chromium.org/2769013002

Issue 2751693002: [wasm][asm.js] Adding custom asm.js lexer. (Closed)

Description

Patch Set 1 #

Patch Set 2 : fix #

Patch Set 3 : fix warning #

Patch Set 4 : revised #

Patch Set 5 : drop unused #

Patch Set 6 : fix warning #

Patch Set 7 : check #

Patch Set 8 : fix #

Patch Set 9 : i #

Patch Set 10 : fix #

Patch Set 11 : fix #

Patch Set 12 : fix #

Patch Set 13 : fix #

Messages