|
|
Created:
6 years ago by arv (Not doing code reviews) Modified:
6 years ago CC:
v8-dev Base URL:
https://chromium.googlesource.com/v8/v8.git@master Project:
v8 Visibility:
Public. |
DescriptionSimplify template literal raw string creation
BUG=v8:3710
LOG=Y
R=dslomov@chromium.org, marja@chromium.org
Patch Set 1 #
Total comments: 8
Patch Set 2 : Capture the raw characters in the scanner #Patch Set 3 : Add DCHECK in PushBack and remove some unused functions #
Total comments: 1
Patch Set 4 : git rebase #Patch Set 5 : Reduce length in PushBack instead #
Total comments: 1
Patch Set 6 : Add harmony unicode test #
Messages
Total messages: 38 (5 generated)
PTAL Instead of using an UTF8 encoder use a StringCharacterStream.
On 2014/12/01 19:48:53, arv wrote: > PTAL > > Instead of using an UTF8 encoder use a StringCharacterStream. few comments, but it looks good, much simpler.
caitpotter88@gmail.com changed reviewers: + caitpotter88@gmail.com
https://codereview.chromium.org/768203002/diff/1/src/parser.cc File src/parser.cc (right): https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5314 src/parser.cc:5314: for (int from_index = 0; from_index < length && stream.HasMore(); are there cases where the character stream would not have enough characters (eg stream.HasMore() returns false)? It seems like it should always be available, but maybe not. https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5326 src/parser.cc:5326: normalized_raw_chars[to_index++] = ch; move `last_was_cr = false;` here I think? (then can remove the `continue` also)
dslomov@chromium.org changed reviewers: - caitpotter88@gmail.com
I have one q: why length computation is correct? https://codereview.chromium.org/768203002/diff/1/src/parser.cc File src/parser.cc (right): https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5314 src/parser.cc:5314: for (int from_index = 0; from_index < length && stream.HasMore(); Can you assert that at the end of this loop, always !stream.HasMore() (I believe that should be true if this algo is correct?)
On 2014/12/01 19:58:05, Dmitry Lomov (chromium) wrote: > I have one q: why length computation is correct? > > https://codereview.chromium.org/768203002/diff/1/src/parser.cc > File src/parser.cc (right): > > https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5314 > src/parser.cc:5314: for (int from_index = 0; from_index < length && > stream.HasMore(); > Can you assert that at the end of this loop, always !stream.HasMore() (I believe > that should be true if this algo is correct?) Amazing midair comment collision :)
https://codereview.chromium.org/768203002/diff/1/src/parser.cc File src/parser.cc (right): https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5311 src/parser.cc:5311: StringCharacterStream stream(String::cast(script()->source()), span_start); Oops; this problem was already present before this CL: We might not have the source string yet. In the streaming parsing case, we don't have it. :/ Can we get the string from somewhere else? Can we delay constructing the raw strings until they're really needed? (I probably thought that they're constructed on demand, not parse time, when I originally thought this would work...)
caitpotter88@gmail.com changed reviewers: + caitpotter88@gmail.com
https://codereview.chromium.org/768203002/diff/1/src/parser.cc File src/parser.cc (right): https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5311 src/parser.cc:5311: StringCharacterStream stream(String::cast(script()->source()), span_start); On 2014/12/01 20:01:34, marja wrote: > Oops; this problem was already present before this CL: We might not have the > source string yet. In the streaming parsing case, we don't have it. :/ > > Can we get the string from somewhere else? Can we delay constructing the raw > strings until they're really needed? (I probably thought that they're > constructed on demand, not parse time, when I originally thought this would > work...) I see. What happens is, we are setting up some array literals to pass to GetTemplateCallSite(), which will use them to produce the objects it needs --- perhaps we could pass the offsets to that function instead, and use a runtime function to fetch the strings? But the trouble with this is that it would always be a runtime operation, and we'd have to do it every single time a tagged template is invoked.... An alternative solution would be to ensure that source is available, somehow...
Parsing can be done on a background thread, so we cannot access anything on the heap (Handle<String>s) anyway :( So, like you say, we could store only offsets here and then do the raw string construction run-time... what do you mean by "every single time a tagged template is invoked"; would we need to do it in cases where the user doesn't access t.raw too?
On 2014/12/01 20:11:59, marja wrote: > Parsing can be done on a background thread, so we cannot access anything on the > heap (Handle<String>s) anyway :( > > So, like you say, we could store only offsets here and then do the raw string > construction run-time... what do you mean by "every single time a tagged > template is invoked"; would we need to do it in cases where the user doesn't > access t.raw too? But the user will. I think the right solution is for scanner to generate raw strings in addition to cooked strings. (My understanding is that you cannot recover a raw string from a cooked string, right?)
https://codereview.chromium.org/768203002/diff/1/src/parser.cc File src/parser.cc (right): https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5311 src/parser.cc:5311: StringCharacterStream stream(String::cast(script()->source()), span_start); On 2014/12/01 20:05:15, caitp wrote: > On 2014/12/01 20:01:34, marja wrote: > > Oops; this problem was already present before this CL: We might not have the > > source string yet. In the streaming parsing case, we don't have it. :/ > > > > Can we get the string from somewhere else? Can we delay constructing the raw > > strings until they're really needed? (I probably thought that they're > > constructed on demand, not parse time, when I originally thought this would > > work...) > > I see. > > What happens is, we are setting up some array literals to pass to > GetTemplateCallSite(), which will use them to produce the objects it needs --- > perhaps we could pass the offsets to that function instead, and use a runtime > function to fetch the strings? > > But the trouble with this is that it would always be a runtime operation, and > we'd have to do it every single time a tagged template is invoked.... > > An alternative solution would be to ensure that source is available, somehow... We'll have to think a bit more about this. The source code up to the end of the template literal has been parsed before we get here. Maybe we should just get the source from the Scanner here. https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5314 src/parser.cc:5314: for (int from_index = 0; from_index < length && stream.HasMore(); On 2014/12/01 19:58:05, Dmitry Lomov (chromium) wrote: > Can you assert that at the end of this loop, always !stream.HasMore() (I believe > that should be true if this algo is correct?) The stream is on the entire script so it will most likely have more characters. https://codereview.chromium.org/768203002/diff/1/src/parser.cc#newcode5326 src/parser.cc:5326: normalized_raw_chars[to_index++] = ch; On 2014/12/01 19:57:36, caitp wrote: > move `last_was_cr = false;` here I think? (then can remove the `continue` also) We need to reset it even if last_was_cr was true. I tried a different forms for this logic and none of them turned out nice.
On 2014/12/01 20:15:33, Dmitry Lomov (chromium) wrote: > On 2014/12/01 20:11:59, marja wrote: > > Parsing can be done on a background thread, so we cannot access anything on > the > > heap (Handle<String>s) anyway :( > > > > So, like you say, we could store only offsets here and then do the raw string > > construction run-time... what do you mean by "every single time a tagged > > template is invoked"; would we need to do it in cases where the user doesn't > > access t.raw too? > > But the user will. > I think the right solution is for scanner to generate raw strings in addition to > cooked strings. > (My understanding is that you cannot recover a raw string from a cooked string, > right?) Right. But the scanner has has access to the underlying string. I'll look into getting the data straight from the scanner instead.
On 2014/12/01 20:20:12, arv wrote: > On 2014/12/01 20:15:33, Dmitry Lomov (chromium) wrote: > > On 2014/12/01 20:11:59, marja wrote: > > > Parsing can be done on a background thread, so we cannot access anything on > > the > > > heap (Handle<String>s) anyway :( > > > > > > So, like you say, we could store only offsets here and then do the raw > string > > > construction run-time... what do you mean by "every single time a tagged > > > template is invoked"; would we need to do it in cases where the user doesn't > > > access t.raw too? > > > > But the user will. > > I think the right solution is for scanner to generate raw strings in addition > to > > cooked strings. > > (My understanding is that you cannot recover a raw string from a cooked > string, > > right?) > > Right. But the scanner has has access to the underlying string. I'll look into > getting the data straight from the scanner instead. I guess the scanner does not have access to the underlying source after all. It only has a stream and caches the data as needed.
Yeah, basically the LiteralBuffers are the only thing the Scanner has. :/ So I guess we'd need the "gather raw literals into a separate LiteralBuffer" solution after all, or something similar.
PTAL I added a LiteralBuffer and now capture all raw characters during ScanTemplateSpan. This is partially based on an early patchset by Caitlin: https://codereview.chromium.org/663683006/diff/80001/src/scanner.h?context=10...
On 2014/12/02 20:48:06, arv wrote: > PTAL > > I added a LiteralBuffer and now capture all raw characters during > ScanTemplateSpan. This is partially based on an early patchset by Caitlin: > > https://codereview.chromium.org/663683006/diff/80001/src/scanner.h?context=10... I was just reading it --- I think updating the raw literal during Advance() is pretty novel and a much simpler way to do it, it looks like it should work well. Even though the raw literal recording is now unrelated to other escape sequence parsing, I think it would be a good idea to add some extra TV/TRV tests which depend on --harmony-unicode, just to make sure it is working as expected (just in case we're using PushBack() in there for some reason, or something like that).
Add DCHECK in PushBack and remove some unused functions
On 2014/12/02 at 20:54:50, caitpotter88 wrote: > On 2014/12/02 20:48:06, arv wrote: > > PTAL > > > > I added a LiteralBuffer and now capture all raw characters during > > ScanTemplateSpan. This is partially based on an early patchset by Caitlin: > > > > https://codereview.chromium.org/663683006/diff/80001/src/scanner.h?context=10... > > I was just reading it --- I think updating the raw literal during Advance() is pretty novel and a much simpler way to do it, it looks like it should work well. > > Even though the raw literal recording is now unrelated to other escape sequence parsing, I think it would be a good idea to add some extra TV/TRV tests which depend on --harmony-unicode, just to make sure it is working as expected (just in case we're using PushBack() in there for some reason, or something like that). Good idea. I was a bit worried about PushBack too. I added a DCHECK to make sure that we do not do a PushBack when we are capturing raw literals. I'll have to rebase before adding the harmony-unicode tests.
git rebase
https://codereview.chromium.org/768203002/diff/40001/src/scanner.h File src/scanner.h (right): https://codereview.chromium.org/768203002/diff/40001/src/scanner.h#newcode582 src/scanner.h:582: DCHECK(ch < 0 || !capturing_raw_literal_); I feel like PushBack() is okay --- it's more just that you want to make sure PushBack() is closely followed by a ReduceRawLiteralLength() if raw literals are being captured, am I wrong about that?
On 2014/12/02 at 21:37:36, caitpotter88 wrote: > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h > File src/scanner.h (right): > > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h#newcode582 > src/scanner.h:582: DCHECK(ch < 0 || !capturing_raw_literal_); > I feel like PushBack() is okay --- it's more just that you want to make sure PushBack() is closely followed by a ReduceRawLiteralLength() if raw literals are being captured, am I wrong about that? You are right that PushBack is OK if ReduceRawLiteralLength is called afterwards. Two options: 1. Keep as is with the DCHECK, basically disabling PushBack during ScanTemplateSpan. 2. Add code in PushBack that calls ReduceRawLiteralLength if needed.
On 2014/12/02 at 21:47:28, arv wrote: > On 2014/12/02 at 21:37:36, caitpotter88 wrote: > > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h > > File src/scanner.h (right): > > > > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h#newcode582 > > src/scanner.h:582: DCHECK(ch < 0 || !capturing_raw_literal_); > > I feel like PushBack() is okay --- it's more just that you want to make sure PushBack() is closely followed by a ReduceRawLiteralLength() if raw literals are being captured, am I wrong about that? > > You are right that PushBack is OK if ReduceRawLiteralLength is called afterwards. Two options: > > 1. Keep as is with the DCHECK, basically disabling PushBack during ScanTemplateSpan. > 2. Add code in PushBack that calls ReduceRawLiteralLength if needed. I settled for 2 since that is cleaner. Regarding --harmony-unicode. Lets do that work in another CL.
Reduce length in PushBack instead
On 2014/12/02 21:47:28, arv wrote: > On 2014/12/02 at 21:37:36, caitpotter88 wrote: > > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h > > File src/scanner.h (right): > > > > https://codereview.chromium.org/768203002/diff/40001/src/scanner.h#newcode582 > > src/scanner.h:582: DCHECK(ch < 0 || !capturing_raw_literal_); > > I feel like PushBack() is okay --- it's more just that you want to make sure > PushBack() is closely followed by a ReduceRawLiteralLength() if raw literals are > being captured, am I wrong about that? > > You are right that PushBack is OK if ReduceRawLiteralLength is called > afterwards. Two options: > > 1. Keep as is with the DCHECK, basically disabling PushBack during > ScanTemplateSpan. > 2. Add code in PushBack that calls ReduceRawLiteralLength if needed. hmm... the DCHECK is going to be a constraint that might get in the way of future code --- but I guess it's probably okay for now, worry about it when it's a problem?
https://codereview.chromium.org/768203002/diff/80001/src/scanner.h File src/scanner.h (right): https://codereview.chromium.org/768203002/diff/80001/src/scanner.h#newcode587 src/scanner.h:587: if (capturing_raw_literal_) ReduceRawLiteralLength(2); Oh, I didn't realize you already changed this. I guess it's up to you
Add harmony unicode test
Add harmony unicode test
Patchset #6 (id:100001) has been deleted
On 2014/12/02 22:06:01, caitp wrote: > https://codereview.chromium.org/768203002/diff/80001/src/scanner.h > File src/scanner.h (right): > > https://codereview.chromium.org/768203002/diff/80001/src/scanner.h#newcode587 > src/scanner.h:587: if (capturing_raw_literal_) ReduceRawLiteralLength(2); > Oh, I didn't realize you already changed this. I guess it's up to you I found this one to be safer.
I added a test for --harmony-unicode since it required no changes to the actual code.
lgtm
The CQ bit was checked by arv@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/768203002/120001
Message was sent while issue was closed.
Committed patchset #6 (id:120001)
Message was sent while issue was closed.
On 2014/12/03 14:17:20, I haz the power (commit-bot) wrote: > Committed patchset #6 (id:120001) Woo! |