|
|
DescriptionRework scanner-character-streams.
- Smaller, more consistent streams API (Advance, Back, pos, Seek)
- Remove implementations from the header, in favor of creation functions.
Observe:
- Performance:
- All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a
body of only a few instructions. I expect most calls to end up there.
- There used to be performance problems w/ bookmarking, particularly
with copying too much data on SetBookmark w/ UTF-8 streaming streams.
All those copies are gone.
- The old streaming streams implementation used to copy data even for
2-byte input. It no longer does.
- The only remaining 'slow' method is the Seek(.) slow case for utf-8
streaming streams. I don't expect this to be called a lot; and even if,
I expect it to be offset by the gains in the (vastly more frequent)
calls to the other methods or the 'fast path'.
- If it still bothers us, there are several ways to speed it up.
- API & code cleanliness:
- I want to remove the 'old' API in a follow-up CL, which should mostly
delete code, or replace it 1:1.
- In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink
for streaming streams.
- The "bookmark" is now always implemented (and mostly very fast), so we
should be able to use it for more things.
- Testing & correctness:
- The unit tests now cover all stream implementations,
and are pretty good and triggering all the edge cases.
- Vastly more DCHECKs of the invariants.
BUG=v8:4947
Committed: https://crrev.com/642d6d314c7934a05cd2386e4b10fca3267769fc
Cr-Commit-Position: refs/heads/master@{#39464}
Patch Set 1 #Patch Set 2 : Some fixes, and marching down the very long road to make all compilers happy. #
Total comments: 36
Patch Set 3 : Add comments to document Utf8ExternalStreamingStream's internal state. #
Total comments: 36
Patch Set 4 : Marja's feedback, round 1. #
Total comments: 31
Patch Set 5 : Feedback, round 2. #
Total comments: 22
Patch Set 6 : Behold! A re-write of utf-8 handling (+ improved tests + fixes) #Patch Set 7 : Fix compiler issue. #Patch Set 8 : Niko's feedback and fix compile even harder #
Total comments: 24
Patch Set 9 : Marja's feedback. #
Messages
Total messages: 85 (66 generated)
Description was changed from ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. BUG=v8:4947 ========== to ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. - Testing & correctness: - The unit tests now cover all stream implementations, and are pretty good and triggering all the edge cases. - Vastly more DCHECKs of the invariants. BUG=v8:4947 ==========
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_mips64el_compile_rel on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_mips64el_compile_r...) v8_linux_mipsel_compile_rel on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_mipsel_compile_rel...)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_presubmit on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/23383)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win_nosnap_shared_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_nosnap_shared_rel_ng...) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/13753)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_mipsel_compile_rel on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_mipsel_compile_rel...) v8_win64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/13728) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/13756)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/13729) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/13757)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...) v8_linux_gcc_compile_rel on master.tryserver.v8 (JOB_FAILED, no build URL) v8_presubmit on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/23399)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #2 (id:20001) has been deleted
Patchset #2 (id:40001) has been deleted
Patchset #2 (id:60001) has been deleted
Patchset #2 (id:80001) has been deleted
Patchset #2 (id:100001) has been deleted
Patchset #3 (id:140001) has been deleted
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #2 (id:120001) has been deleted
vogelheim@chromium.org changed reviewers: + marja@chromium.org, nikolaos@chromium.org
marja, nikolaos: Please have a look, and please let me know what you think.
"stuff I don't understand" overflowed, maybe continue offline? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:202: size_t byte_pos; This could use a comment: is byte_pos the position inside the chunk or from the beginning? It's not clear at that point for somebody reading the code. Without further information, I'd assume it's the position inside the chunk (so, we'd use it for indexing into a chunk), but then there is char_pos too, and that's probably a global char pos, so now I'm not sure about byte_pos either. Also maybe rename to global_byte_pos / byte_pos_inside_chunk depending on which it is :) ... and maybe my first interpretation of ChunkPos was incorrect too; I assumed it's a pair "chunk + position inside it" but maybe it's not? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:212: unibrow::Utf8::Utf8IncrementalBuffer buffer; This could use a comment, what is buffer? Why does every chunk need one? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:218: size_t Utf8ExternalStreamingStream::FillBuffer(size_t position) { I'd also specify in the API which positions the params are; this is char_position, right? Would it make sense to call it char_position? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:228: size_t iter = 0; Nit: "it" would be a more commonly used name. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:265: unibrow::Utf8::Utf8IncrementalBuffer buffer = Umm, what? I don't understand what this if does. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:290: current_.chunk_no++; Nit: ++current_.chunk_no; https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:293: if (current_.char_pos - position == 0) { Why not if (current_.char_pos == position) ? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:60: buffer_cursor_--; Nit: --buffer_cursor_; https://codereview.chromium.org/2314663002/diff/160001/src/unicode.cc File src/unicode.cc (right): https://codereview.chromium.org/2314663002/diff/160001/src/unicode.cc#newcode313 src/unicode.cc:313: // All other cases: Why does it make sense to do NonAsciiSequenceLength(next) if buffer is not 0, that is, next is a continuation character? I'd reorder this code like this: fast path (which you have now) if (buffer == 0) { this is the start of a multi-byte character NonAsciiSequenceLength(next) etc } else { expect a continuation character, update the buffer based on that (otherwise kBadChar) }
fyi, added comments to document Utf8ExternalStreamingStream's internal state, as a quick fix to Marja's comments. (More work forthcoming.. please keep the comments coming. Thanks.)
https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:234: unibrow::Utf8::Utf8IncrementalBuffer buffer; Clarify that buffer contains the characters from the *previous* chunk Also rename to partial_character_buffer or sth https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h#... src/parsing/scanner.h:35: class Utf16CharacterStream { General comment from offline discussion: going back one character fills the buffer again! Back(); Back(); is inefficient https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h#... src/parsing/scanner.h:110: // This shouldn't ever be called if new_pos is inside the current buffer. Comment from nikolaos@: this is unclear.
https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:202: size_t byte_pos; On 2016/09/07 09:17:57, marja wrote: > This could use a comment: is byte_pos the position inside the chunk or from the > beginning? It's not clear at that point for somebody reading the code. Without > further information, I'd assume it's the position inside the chunk (so, we'd use > it for indexing into a chunk), but then there is char_pos too, and that's > probably a global char pos, so now I'm not sure about byte_pos either. Also > maybe rename to global_byte_pos / byte_pos_inside_chunk depending on which it is > :) > > ... and maybe my first interpretation of ChunkPos was incorrect too; I assumed > it's a pair "chunk + position inside it" but maybe it's not? Comments provided about the struct. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:212: unibrow::Utf8::Utf8IncrementalBuffer buffer; On 2016/09/07 09:17:57, marja wrote: > This could use a comment, what is buffer? Why does every chunk need one? Done. Renamed w/ clearer name (there are too many things called "buffer"..) + comment about the struct & intended use. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:218: size_t Utf8ExternalStreamingStream::FillBuffer(size_t position) { On 2016/09/07 09:17:57, marja wrote: > I'd also specify in the API which positions the params are; this is > char_position, right? Would it make sense to call it char_position? APIs are always about characters; the API has no notion of bytes or encodings. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:228: size_t iter = 0; On 2016/09/07 09:17:57, marja wrote: > Nit: "it" would be a more commonly used name. Done. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:265: unibrow::Utf8::Utf8IncrementalBuffer buffer = On 2016/09/07 09:17:57, marja wrote: > Umm, what? I don't understand what this if does. Renamed, to hopefully make it more understandable. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:293: if (current_.char_pos - position == 0) { On 2016/09/07 09:17:57, marja wrote: > Why not if (current_.char_pos == position) ? Done. https://codereview.chromium.org/2314663002/diff/160001/src/unicode.cc File src/unicode.cc (right): https://codereview.chromium.org/2314663002/diff/160001/src/unicode.cc#newcode313 src/unicode.cc:313: // All other cases: On 2016/09/07 09:17:57, marja wrote: > Why does it make sense to do NonAsciiSequenceLength(next) if buffer is not 0, > that is, next is a continuation character? > > I'd reorder this code like this: > > fast path (which you have now) > > if (buffer == 0) { > this is the start of a multi-byte character > NonAsciiSequenceLength(next) > etc > } else { > expect a continuation character, update the buffer based on that (otherwise > kBadChar) > } Done. Your version makes much more sense. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h#... src/parsing/scanner.h:35: class Utf16CharacterStream { On 2016/09/07 11:55:11, marja wrote: > General comment from offline discussion: going back one character fills the > buffer again! > > Back(); > Back(); > > is inefficient Added Back2 / PushBack2 to fix this. Actual use of these would be in a future CL. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner.h#... src/parsing/scanner.h:110: // This shouldn't ever be called if new_pos is inside the current buffer. On 2016/09/07 11:55:11, marja wrote: > Comment from nikolaos@: this is unclear. Done.
Patchset #4 (id:200001) has been deleted
I'm publishing my comments so far, because it turns out that they are in two different patch sets (as this CL evolves) and none of them is the current one... :-) https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:46: } else if (ReadBlock()) { I suppose you did not like how it was before: if (V8_LIKELY(...) || ReadBlock()) https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:58: inline void Back() { A comment explaining what this should do would be welcome. It's straightforward in the most likely case, not in the other one. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:62: ReadBlockAt(pos() - 1); We discussed this offline. This can make a series of consecutive Back() pretty expensive. It could be "void Back(int num = 1)" and use "pos() - num" here. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:67: return buffer_pos_ + (buffer_cursor_ - buffer_start_); This calculation is quite expensive and the pos() method is called very often from the parser. I suppose, the scanner delegates it here. Maybe it would make sense to measure this against having a separate current_pos_ field (like the pos_ in old line 59) and making sure that this is consistent. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:86: Back(); Why not simply this? DCHECK_EQ(code_unit, static_cast<uc32>(*(buffer_cursor_++))); After Back() it is guaranteed that buffer_cursor_ < buffer_end_, isn't it? In a quite unfrequent case, this Advance-Back will result in an extra buffer loaded, right? https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:89: bool SetBookmark() { I don't understand why the bookmark should be stored in the stream. A simpler interface would be to use "bookmark = pos()" directly to obtain a bookmark, and something like "Seek(bookmark)" to reset to that. If we still want to have streams that do not support bookmarking (?) then we could have a SetBookmark method returning either pos() or NO_BOOKMARK (see below). https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:94: DCHECK(bookmark_ != (size_t)-1); How about something like this? (with a better name, I guess) static const size_t NO_BOOKMARK = numeric_limits<size_t>::max(); https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:106: bookmark_((size_t)-1) {} Again, NO_BOOKMARK here. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:114: buffer_cursor_ = buffer_start_; I don't understand this. Maybe related to my next comment. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:119: // position pos() of the input. Returns true if data is available; false if How can this not be true? pos() is calculated in such a way to enforce that. I suppose this comment should change and describe more clearly what ReadBlock is expected to. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:22: // source (ReadBlock can be called with pos_ pointing to any position, What is pos_? Is it buffer_pos_? https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:33: virtual size_t FillBuffer(size_t position) = 0; Add a comment explaining what FillBuffer is expected to do. As I understand it, it reads up to KBufferSize elements (2-byte characters) from the source, starting from 'position', and stores them in the buffer_, starting from the beginning and overwriting its current contents. It returns the number of elements read. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:44: DCHECK_EQ(buffer_start_, buffer_); // Please nobody mess w/ buffer_start_. Could this comment go to the top of the class, as an invariant that should be respected? Or if it is just meant to be respected when ReadBlock is called, add the comment just before ReadBlock. For some reason, I suspect that this is an invariant of Utf16CharacterStream (that buffer_start_ is not expected to change), not this class. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:78: buffer_pos_ = start_position; Although we keep end_position (in length_), we don't keep start_position. It is later possible that we "SeekAt" positions in the string that are before start_position. Do we care about this? https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:89: } Essentially: size_t length = std::min(length_ - from_pos, kBufferSize); Or Min instead of std::min, I'm not sure why we have that. PS: That's how you have it in ExternalOneByteStringUtf16CharacterStream::FillBuffer. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:96: // ExternalTwoByteStringUtf16CharacterStream. Add a comment? How is this different from the previous one? (Obviously, no buffer is needed.) https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:133: // UTF16 buffer to read characters from an external latin1 string. This looks very much like GenericStringUtf16CharacterStream. Can they be merged, or is there a reason (efficiency?) why they must be separate? The only difference I saw was in FillBuffer (the for loop vs. the call to String::WriteToFlat<uc16>. I'd expect a similar call to work for uint8 too. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:139: int end_position); These parameters are of type size_t in other classes, e.g., GenericStringUtf16CharacterStream. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:179: // Utf8ExternalStreamingStream - chunked streaming of Utf-8 data. nit: Make the format of the comment consistent. Possibly for the future, as we discussed this offline. In this class, and the first of the two that follow, we could use the chunks themselves as buffers. Instead, we have an extra buffer in BufferedUtf16CharacterStream. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:195: chunks_.clear(); I think this line is redundant, as this method is private and only called from the destructor. I guess the whole method is redundant. The for loop could go directly in the destructor. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:241: size_t Utf8ExternalStreamingStream::FillBuffer(size_t position) { For the future: I think that this translation between the logical position (i.e., the parameter, #of UTF8 characters) and the physical position (cpos) should not be necessary. The parameter should be something more abstract that, in the case of the UTF8 stream, it would provide directly the physical position. The loop in lines 253-260 is fairly expensive. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:249: Chunk& current = chunks_[chunk]; Consider renaming this "current" to something else, not to confuse this with "current_" (e.g., look at line 265). https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:337: }; Possibly for the future, as we discussed this offline. This (and the following) is very similar to (some of) the contents of Utf8ExternalStreamingStream. Can it be merged? The common denominator is the "chunking" behavior. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:395: // OneByteExternalStreamingStream Add a comment? https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:423: // TwoByteExternalStreamingStream Add a comment?
My last comments for this round. In general, I think it looks good enough for now. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:180: return length; This for loop should be equivalent to an instance of CopyCharsUnsigned, which I presume is what some instance of String::WriteToFlat does. Compare with the method in line 84. https://codereview.chromium.org/2314663002/diff/220001/src/unicode.h File src/unicode.h (right): https://codereview.chromium.org/2314663002/diff/220001/src/unicode.h#newcode163 src/unicode.h:163: Utf8IncrementalBuffer& buffer); I think this reference should be turned into a pointer, according to the code style. I'm not sure if it's enforced by the CQ.
Getting better! I was focusing on the utf8 streaming case, where the complexity lies. I got the impression that nikolaos@ read through the remaining parts (I only skimmed for now), so maybe that's a good division of labor? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:44: DCHECK_EQ(buffer_start_, buffer_); // Please nobody mess w/ buffer_start_. Hmm, nikolaos@ commented on this offline I think... so hmm, who updates buffer_start_? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:246: size_t byte_pos; It's confusing that you above use byte_pos and char_pos to index into a chunk (if I understood this correctly) and here you use byte_pos and char_pos to indicate where the chunk begins in the byte / char stream. Can you rename one set of them? Maybe keep the above byte_pos / char_pos and rename this one to chunk_beginning_byte_pos or something along those lines but less horrible. (Also, when reading the code, if there's someobject.byte_pos, and byte_pos means something different depending on which one someobject is, it's confusing to the reader.) https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:256: // If the desired position is *not* at the current_ position, then we need Could this function be structured differently so that it would be easier to understand? Idea 1: fast path first, what if the desired position is at the current position? Idea 2: separate the chunk finding to a helper function? (I can't tell in advance whether these would work...) https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:263: Chunk& current = chunks_[chunk]; Could you rename current to current_chunk, just to be extra clear (related to the above whine that I'm losing track what current.byte_pos etc are)? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:264: unibrow::Utf8::Utf8IncrementalBuffer b = current.incomplete_first_char; I had to think for a while why we need to do this here; maybe add a comment? E.g.,: Read through the chunk character by character to find the right position. The chunk might begin with an incomplete character, whose beginning is already stored in incomplete_first_char. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:276: // of the last block. That should only happen if we seek forward (which ... of the last chunk? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:285: } Can finding the position be moved to a helper function, or will that mess up with the recursion thing? Clearly this function does two things: first it finds the right position, then it reads data from that position. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:296: Chunk& current_chunk = chunks_[current_.chunk_no]; Oops, now we have current_chunk.. It's confusing that we have: current_ current current_chunk in this function! https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:305: : unibrow::Utf8::Utf8IncrementalBuffer(0); Umm, what? Why? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:428: Chunk& chunk = chunks_[FindChunk(chunks_, source_, position)]; Here you already have a FindChunk helper, so I think you should have one for the hard case too. :) https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:476: What happens if we have data where the last character is incomplete? https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:486: bool lonley_byte = (chunks_[chunk_no].byte_pos == (2 * position + 1)); typo: lonley https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner.h#... src/parsing/scanner.h:53: buffer_cursor_++; Hmm, I commented on this before, right? foo++ / foo-- -> ++foo / --foo in several places. https://codereview.chromium.org/2314663002/diff/220001/src/unicode.cc File src/unicode.cc (right): https://codereview.chromium.org/2314663002/diff/220001/src/unicode.cc#newcode307 src/unicode.cc:307: uchar Utf8::ValueOfIncremental(byte next, Utf8IncrementalBuffer& buffer) { Much better now!
https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:46: } else if (ReadBlock()) { On 2016/09/07 13:28:28, nickie wrote: > I suppose you did not like how it was before: > if (V8_LIKELY(...) || ReadBlock()) I liked it, but: V8_LIKELY supplies information about the if-else to the compiler. I'm not sure the compiler can transfer this explicit likeli-hood information on a sub-expression to the branches it might generate from the surrounding expression. Admittedly, I intuited that, and I can't find documentation that is explicit about this. I've never seen a V8_(UN)LIKELY or __builtin_expect that did not cover the entire condition, though. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:58: inline void Back() { On 2016/09/07 13:28:28, nickie wrote: > A comment explaining what this should do would be welcome. > It's straightforward in the most likely case, not in the other one. Done. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:60: buffer_cursor_--; On 2016/09/07 09:17:57, marja wrote: > Nit: --buffer_cursor_; Why? [Here and elsewhere.] ------ I find the prefix-version odd to read. The style guide has no opinion: V8 style references Chromium style which references Google style, which says: "For simple scalar (non-object) values there is no reason to prefer one form and we allow either." [Ref: https://google.github.io/styleguide/cppguide.html#Preincrement_and_Predecrement ] https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:62: ReadBlockAt(pos() - 1); On 2016/09/07 13:28:28, nickie wrote: > We discussed this offline. This can make a series of consecutive Back() pretty > expensive. It could be "void Back(int num = 1)" and use "pos() - num" here. That's almost the same as Seek. :) The Scanner only uses num = 1..2, so I added a Back2(). https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:67: return buffer_pos_ + (buffer_cursor_ - buffer_start_); On 2016/09/07 13:28:28, nickie wrote: > This calculation is quite expensive and the pos() method is called very often > from the parser. I suppose, the scanner delegates it here. > > Maybe it would make sense to measure this against having a separate current_pos_ > field (like the pos_ in old line 59) and making sure that this is consistent. Hmm. I think I should try it both ways. (My thinking was: pos() is fully inline-able, and should be only 2 instructions on x64, on data that is already in cache (and oftentimes also in a register) at that point. The alternative, an additional variable + 1 instruction to maintain might not be faster.) https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:89: bool SetBookmark() { On 2016/09/07 13:28:28, nickie wrote: > I don't understand why the bookmark should be stored in the stream. A simpler > interface would be to use "bookmark = pos()" directly to obtain a bookmark, and > something like "Seek(bookmark)" to reset to that. If we still want to have > streams that do not support bookmarking (?) then we could have a SetBookmark > method returning either pos() or NO_BOOKMARK (see below). You're exactly right: The bookmark doesn't belong in the stream (any more). But I was desperately trying to keep the CL small-ish, and hence wanted to avoid touching the clients of the stream just yet, and to remove the "// Legacy API" in a separate, upcoming CL. That would exactly what you suggest. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:94: DCHECK(bookmark_ != (size_t)-1); On 2016/09/07 13:28:28, nickie wrote: > How about something like this? (with a better name, I guess) > static const size_t NO_BOOKMARK = numeric_limits<size_t>::max(); Done. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:106: bookmark_((size_t)-1) {} On 2016/09/07 13:28:28, nickie wrote: > Again, NO_BOOKMARK here. Done. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:114: buffer_cursor_ = buffer_start_; On 2016/09/07 13:28:28, nickie wrote: > I don't understand this. Maybe related to my next comment. See below. This wants to set the position for ReadBlock() (which reads it from pos()), so it must change all the vars that make up pos() to accomplish this. https://codereview.chromium.org/2314663002/diff/160001/src/parsing/scanner.h#... src/parsing/scanner.h:119: // position pos() of the input. Returns true if data is available; false if On 2016/09/07 13:28:29, nickie wrote: > How can this not be true? > pos() is calculated in such a way to enforce that. > I suppose this comment should change and describe more clearly what ReadBlock is > expected to. It's meant to explain the post-condition for ReadBlock, and in particular remind the implementer that they can/must update both buffer_pos and buffer_start_. I think the source of confusion is that pos() is really a derivative of three values, so any method that updates any of those needs to take care to consider them jointly. I updated the comments to reflect this. ------ Maybe the best way of fixing this is to drop the ReadBlock() signature, and always use something like FillBuffer(size_t). If so, I'll do it in a separate step. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:22: // source (ReadBlock can be called with pos_ pointing to any position, On 2016/09/07 13:28:29, nickie wrote: > What is pos_? Is it buffer_pos_? No, pos(). Updated comment. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:33: virtual size_t FillBuffer(size_t position) = 0; On 2016/09/07 13:28:29, nickie wrote: > Add a comment explaining what FillBuffer is expected to do. > As I understand it, it reads up to KBufferSize elements (2-byte characters) from > the source, starting from 'position', and stores them in the buffer_, starting > from the beginning and overwriting its current contents. It returns the number > of elements read. Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:44: DCHECK_EQ(buffer_start_, buffer_); // Please nobody mess w/ buffer_start_. On 2016/09/07 13:28:29, nickie wrote: > Could this comment go to the top of the class, as an invariant that should be > respected? Or if it is just meant to be respected when ReadBlock is called, add > the comment just before ReadBlock. > > For some reason, I suspect that this is an invariant of Utf16CharacterStream > (that buffer_start_ is not expected to change), not this class. Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:78: buffer_pos_ = start_position; On 2016/09/07 13:28:29, nickie wrote: > Although we keep end_position (in length_), we don't keep start_position. It is > later possible that we "SeekAt" positions in the string that are before > start_position. Do we care about this? You're right, that's an error condition we don't check for. I adopted this from the previous implementation. Not sure whether we should add a check. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:89: } On 2016/09/07 13:28:29, nickie wrote: > Essentially: > size_t length = std::min(length_ - from_pos, kBufferSize); > Or Min instead of std::min, I'm not sure why we have that. > PS: That's how you have it in > ExternalOneByteStringUtf16CharacterStream::FillBuffer. Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:96: // ExternalTwoByteStringUtf16CharacterStream. On 2016/09/07 13:28:29, nickie wrote: > Add a comment? How is this different from the previous one? (Obviously, no > buffer is needed.) Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:133: // UTF16 buffer to read characters from an external latin1 string. On 2016/09/07 13:28:29, nickie wrote: > This looks very much like GenericStringUtf16CharacterStream. > Can they be merged, or is there a reason (efficiency?) why they must be > separate? The only difference I saw was in FillBuffer (the for loop vs. the > call to String::WriteToFlat<uc16>. I'd expect a similar call to work for uint8 > too. Yes, this would work. I need to check the performance impact. (These were separated out long ago; I haven't double-checked the impact.) https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:139: int end_position); On 2016/09/07 13:28:29, nickie wrote: > These parameters are of type size_t in other classes, e.g., > GenericStringUtf16CharacterStream. Good point. Made them all size_t. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:179: // Utf8ExternalStreamingStream - chunked streaming of Utf-8 data. On 2016/09/07 13:28:29, nickie wrote: > nit: Make the format of the comment consistent. > > Possibly for the future, as we discussed this offline. In this class, and the > first of the two that follow, we could use the chunks themselves as buffers. > Instead, we have an extra buffer in BufferedUtf16CharacterStream. Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:195: chunks_.clear(); On 2016/09/07 13:28:29, nickie wrote: > I think this line is redundant, as this method is private and only called from > the destructor. I guess the whole method is redundant. The for loop could go > directly in the destructor. Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:234: unibrow::Utf8::Utf8IncrementalBuffer buffer; On 2016/09/07 11:55:11, marja wrote: > Clarify that buffer contains the characters from the *previous* chunk > > Also rename to partial_character_buffer or sth > Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:241: size_t Utf8ExternalStreamingStream::FillBuffer(size_t position) { On 2016/09/07 13:28:29, nickie wrote: > For the future: I think that this translation between the logical position > (i.e., the parameter, #of UTF8 characters) and the physical position (cpos) > should not be necessary. The parameter should be something more abstract that, > in the case of the UTF8 stream, it would provide directly the physical position. > The loop in lines 253-260 is fairly expensive. Hmm. I agree on the "expensive" part. I think parameterizing the 'position' would be a very logical API, but since that would force the client to dynamically deal with different types of positions, I doubt it's easily (or efficiently) translatable into C++. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:249: Chunk& current = chunks_[chunk]; On 2016/09/07 13:28:29, nickie wrote: > Consider renaming this "current" to something else, not to confuse this with > "current_" (e.g., look at line 265). Done. (current_chunk) https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:337: }; On 2016/09/07 13:28:29, nickie wrote: > Possibly for the future, as we discussed this offline. This (and the following) > is very similar to (some of) the contents of Utf8ExternalStreamingStream. Can > it be merged? The common denominator is the "chunking" behavior. I don't see an easy way to merge this... they require different info per chunk, and search the chunks according to difference criteria. We can make this a templatized structure, but I doubt it's worth the effort. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:395: // OneByteExternalStreamingStream On 2016/09/07 13:28:29, nickie wrote: > Add a comment? Done. https://codereview.chromium.org/2314663002/diff/180001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:423: // TwoByteExternalStreamingStream On 2016/09/07 13:28:29, nickie wrote: > Add a comment? Done.
Patchset #5 (id:240001) has been deleted
https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:44: DCHECK_EQ(buffer_start_, buffer_); // Please nobody mess w/ buffer_start_. On 2016/09/08 09:46:23, marja wrote: > Hmm, nikolaos@ commented on this offline I think... so hmm, who updates > buffer_start_? None of BufferedUtf16CharacterStream's sub-classes. Updated comments. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:180: return length; On 2016/09/07 13:48:44, nickie wrote: > This for loop should be equivalent to an instance of CopyCharsUnsigned, which I > presume is what some instance of String::WriteToFlat does. Compare with the > method in line 84. Yes, nice catch. Done. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:246: size_t byte_pos; On 2016/09/08 09:46:23, marja wrote: > It's confusing that you above use byte_pos and char_pos to index into a chunk > (if I understood this correctly) and here you use byte_pos and char_pos to > indicate where the chunk begins in the byte / char stream. Can you rename one > set of them? Maybe keep the above byte_pos / char_pos and rename this one to > chunk_beginning_byte_pos or something along those lines but less horrible. > > (Also, when reading the code, if there's someobject.byte_pos, and byte_pos means > something different depending on which one someobject is, it's confusing to the > reader.) I don't get the confusing part. Essentially, a position in the UTF-8 world comprises both a byte_pos and a char_pos (and a Utf8IncrementalBuffer, in case we allow our position to point into the middle of an UTF-8 char). And so if I store a chunk, I need to store it's start position (position == byte pos + char pos + buffer); if I want to store the current position (also position == byte_pos + char_pos + buffer). To clarify this, I could add seperate struct Position for this purpose. I'm just not convinced that would really make things more readable. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:256: // If the desired position is *not* at the current_ position, then we need On 2016/09/08 09:46:23, marja wrote: > Could this function be structured differently so that it would be easier to > understand? > > Idea 1: fast path first, what if the desired position is at the current > position? > > Idea 2: separate the chunk finding to a helper function? > > (I can't tell in advance whether these would work...) I'm skeptical. The fast path is what happens after the non-fast path has run. Putting it first would be awkward. A separate chunk finding function would also be awkward, since the callee has to do some of the utf-8 processing (to find the current position); or maybe the caller has to, but then the helper function would return a partial result, e.g. maybe the last block, but it can't be sure (since it didn't skip ahead to the desired position.) Also, as mentioned somewhere below to a similar comment: All the helper would do is to move straight-line code out of the way, without either adding an abstraction; or increasing re-use; or simplifying things. It would merely add an indirection which - at least in my view - doesn't really clarify things all that much. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:263: Chunk& current = chunks_[chunk]; On 2016/09/08 09:46:23, marja wrote: > Could you rename current to current_chunk, just to be extra clear (related to > the above whine that I'm losing track what current.byte_pos etc are)? Done. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:264: unibrow::Utf8::Utf8IncrementalBuffer b = current.incomplete_first_char; On 2016/09/08 09:46:23, marja wrote: > I had to think for a while why we need to do this here; maybe add a comment? > > E.g.,: > > Read through the chunk character by character to find the right position. The > chunk might begin with an incomplete character, whose beginning is already > stored in incomplete_first_char. Done. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:276: // of the last block. That should only happen if we seek forward (which On 2016/09/08 09:46:23, marja wrote: > ... of the last chunk? Done. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:285: } On 2016/09/08 09:46:23, marja wrote: > Can finding the position be moved to a helper function, or will that mess up > with the recursion thing? Clearly this function does two things: first it finds > the right position, then it reads data from that position. See above. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:296: Chunk& current_chunk = chunks_[current_.chunk_no]; On 2016/09/08 09:46:23, marja wrote: > Oops, now we have current_chunk.. > > It's confusing that we have: > current_ > current > current_chunk in this function! Done... I guess? It's now current_chunk and current_ https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:305: : unibrow::Utf8::Utf8IncrementalBuffer(0); On 2016/09/08 09:46:23, marja wrote: > Umm, what? Why? As the method comment says: "and decoding from the beginning of the using incomplete_first_char) will yield the stream's char_pos-th unicode character." That's what we're dooing here. If we decode from the beginning of a chunk, we'll use the incomplete_first_char from that chunk. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:428: Chunk& chunk = chunks_[FindChunk(chunks_, source_, position)]; On 2016/09/08 09:46:23, marja wrote: > Here you already have a FindChunk helper, so I think you should have one for the > hard case too. :) Nack. Here, I had two identical uses of it, so this case removes unnecessary code duplication. In the other case, the code is substantially different and is only used once, to this would merely add an indirection. I guess to some extent it's a matter of taste, but indirection (without any other benefit) oftentimes makes things harder to read, at least in my opinion. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:476: On 2016/09/08 09:46:23, marja wrote: > What happens if we have data where the last character is incomplete? For the last chunk, the rounding (below: number_of_chars = ... / 2;) would exclude the last half-char. A subsequent call to ReadBlock would try to read the 2nd char of that half-char, and would determine in the line below that we're out of data. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:486: bool lonley_byte = (chunks_[chunk_no].byte_pos == (2 * position + 1)); On 2016/09/08 09:46:23, marja wrote: > typo: lonley Done. https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/220001/src/parsing/scanner.h#... src/parsing/scanner.h:53: buffer_cursor_++; On 2016/09/08 09:46:23, marja wrote: > Hmm, I commented on this before, right? foo++ / foo-- -> ++foo / --foo in > several places. Why? [Brief discussion in previous round of answers; admittedly only after you sent this one. :) ] https://codereview.chromium.org/2314663002/diff/220001/src/unicode.h File src/unicode.h (right): https://codereview.chromium.org/2314663002/diff/220001/src/unicode.h#newcode163 src/unicode.h:163: Utf8IncrementalBuffer& buffer); On 2016/09/07 13:48:44, nickie wrote: > I think this reference should be turned into a pointer, according to the code > style. I'm not sure if it's enforced by the CQ. Done.
LGTM, modulo some old comments (left for subsequent CLs or as food for thought) and the ongoing discussion with Marja. Following Marja's comment, I focused on everything but the UTF8 part. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:13: #include "src/utils.h" // for Mem[Copy|Move].* I can't see any Mem(Copy|Move) in the code that changed. Am I missing something? https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:27: ~BufferedUtf16CharacterStream() override; Is the destructor necessary in this class? The destructor (line) 46 is empty, as in the case of the base class. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:39: // buffer_start_ should always point to buffer_ Maybe introduce "The base class's " in the beginning of the line, and "." at the end. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:69: ~GenericStringUtf16CharacterStream() override; Same. Is the empty destructor necessary? https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:83: (end_position - start_position)); nit: why use parentheses here? https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:109: ~ExternalTwoByteStringUtf16CharacterStream() override; Same. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:147: ~ExternalOneByteStringUtf16CharacterStream() override; Same. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:487: bool loneley_byte = (chunks_[chunk_no].byte_pos == (2 * position + 1)); nit: lonely https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:58: // Return the scanner by one character. This effectively undoes the most Consider changing the first sentence to "Go back one UTF-16 code unit in the input stream." https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:63: // block is requested. nit: Move "Otherwise..." to the third line of the comment. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:71: // Return the scanner by two characters, the same as calling Back() twice. Same. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:73: // this will be done only twice. I think you mean "guarantees that this will not be done twice".
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL. @marja: The utf-8 part is largely re-written. Now with many, smaller functions + commentary. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:13: #include "src/utils.h" // for Mem[Copy|Move].* On 2016/09/09 11:21:48, nickie wrote: > I can't see any Mem(Copy|Move) in the code that changed. Am I missing > something? Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:27: ~BufferedUtf16CharacterStream() override; On 2016/09/09 11:21:48, nickie wrote: > Is the destructor necessary in this class? The destructor (line) 46 is empty, > as in the case of the base class. I think so. This thing will be deleted via its super-class, so there needs to be a virtual destructor. Honestly, I'm not entirely sure what the compiler will generate. The style guide isn't very clear on this... https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:39: // buffer_start_ should always point to buffer_ On 2016/09/09 11:21:48, nickie wrote: > Maybe introduce "The base class's " in the beginning of the line, and "." at the > end. Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:83: (end_position - start_position)); On 2016/09/09 11:21:48, nickie wrote: > nit: why use parentheses here? Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:487: bool loneley_byte = (chunks_[chunk_no].byte_pos == (2 * position + 1)); On 2016/09/09 11:21:48, nickie wrote: > nit: lonely Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h File src/parsing/scanner.h (right): https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:58: // Return the scanner by one character. This effectively undoes the most On 2016/09/09 11:21:48, nickie wrote: > Consider changing the first sentence to "Go back one UTF-16 code unit in the > input stream." Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:63: // block is requested. On 2016/09/09 11:21:48, nickie wrote: > nit: Move "Otherwise..." to the third line of the comment. Done. (Not sure if "cl format" will put it back...) https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:71: // Return the scanner by two characters, the same as calling Back() twice. On 2016/09/09 11:21:48, nickie wrote: > Same. Done. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner.h#... src/parsing/scanner.h:73: // this will be done only twice. On 2016/09/09 11:21:48, nickie wrote: > I think you mean "guarantees that this will not be done twice". Done.
Still LGTM, but I did not review carefully the UTF8 part. https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/260001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:27: ~BufferedUtf16CharacterStream() override; On 2016/09/14 11:28:19, vogelheim wrote: > On 2016/09/09 11:21:48, nickie wrote: > > Is the destructor necessary in this class? The destructor (line) 46 is empty, > > as in the case of the base class. > > I think so. This thing will be deleted via its super-class, so there needs to be > a virtual destructor. Honestly, I'm not entirely sure what the compiler will > generate. The style guide isn't very clear on this... There has to be a virtual destructor but it does not have to be overriden if the desired behavior is identical to that of the parent class (here both destructors do nothing). I'm suggesting to remove line 27 here (and in the similar cases below), not the empty virtual destructor of Utf16CharacterStream.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux64_gyp_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_gyp_rel_ng/build...)
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/09/14 11:48:18, nickie wrote: > There has to be a virtual destructor but it does not have to be overriden if the > desired behavior is identical to that of the parent class (here both destructors > do nothing). I'm suggesting to remove line 27 here (and in the similar cases > below), not the empty virtual destructor of Utf16CharacterStream. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
scanner-character-streams.cc + test-scanner-streams.cc LGTM % comments This new version is much clearer, it's much easier to understand what the helper functions do. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:208: struct ChunkPosition { Now this doesn't contain the chunk number any more, so "position" would be a better name. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:217: struct Position { And this could be called PositionAndChunk or sth. Because the above struct *is* Position and this is just adding information to it. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:243: ScriptCompiler::ExternalSourceStream* source_stream_; I like this new version, now the structs and their members are clear to me after first reading through it! https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:282: // The buffer_ is writeable, but buffer_*_ members are const. So we get a Typo: writable https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:283: // non-const pointer into buffer that points to the same char as buffer_end_. Hmm, what, why? Do we need to write into it? No? https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:333: return; Hmm? Why don't we need to get more data to find "position"? https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:341: } Style nit: I'd write this like: size_t chunk_no = chunks_.size() - 1; while (chunk_no > 0 && chunks_[chunk_no].start.chars > position) { --chunk_no; } https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:366: unibrow::Utf8::Utf8IncrementalBuffer(0)}}; This is great! https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:387: SkipToPosition(position); One thing I still find surprising in this CL is that SkipToPosition is not guaranteed to actually find the position, and it also doesn't return a bool, but the caller is required to check whether it really skipped to the right position. But I can somehow see why you did it like that... so idk. An alternative would be: bool position_found = SkipToPosition(position); while (!position_found) { if (FetchChunk()) { position_found = SkipToPosition(position); } else { // out of data break; } } In this version, also the out-of-data condition (which confused me) is clearer. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:425: const uint8_t* chunk; I'd call it "data" instead of "chunk" (because the struct is already called "Chunk"). https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:506: Chunk& chunk = chunks_[FindChunk(chunks_, source_, position)]; Why not const Chunk& ? https://codereview.chromium.org/2314663002/diff/320001/test/cctest/parsing/te... File test/cctest/parsing/test-scanner-streams.cc (right): https://codereview.chromium.org/2314663002/diff/320001/test/cctest/parsing/te... test/cctest/parsing/test-scanner-streams.cc:280: // th cases. typo
Patchset #9 (id:340001) has been deleted
https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... File src/parsing/scanner-character-streams.cc (right): https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:208: struct ChunkPosition { On 2016/09/15 08:25:16, marja wrote: > Now this doesn't contain the chunk number any more, so "position" would be a > better name. Hmm.. I called it StreamPosition. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:217: struct Position { On 2016/09/15 08:25:16, marja wrote: > And this could be called PositionAndChunk or sth. Because the above struct *is* > Position and this is just adding information to it. I'd prefer to keep this one as Position, since - 1, it's the shorter name and it gets used a lot more, and - 2, I do think of this as the 'true' position. If this were an STL-like design, the ..::iterator would be this Position. The other "position" is really more a helper so I can conveniently define Chunk + Position, and to highlight where they are similar. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:243: ScriptCompiler::ExternalSourceStream* source_stream_; On 2016/09/15 08:25:16, marja wrote: > I like this new version, now the structs and their members are clear to me after > first reading through it! Acknowledged. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:282: // The buffer_ is writeable, but buffer_*_ members are const. So we get a On 2016/09/15 08:25:16, marja wrote: > Typo: writable Done. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:283: // non-const pointer into buffer that points to the same char as buffer_end_. On 2016/09/15 08:25:16, marja wrote: > Hmm, what, why? Do we need to write into it? No? Yes, we need to write into the buffer. See lines #296 and #298-299. Also, it's called FillBufferFromCurrentChunk, because we really do want to write into the buffer. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:333: return; On 2016/09/15 08:25:16, marja wrote: > Hmm? Why don't we need to get more data to find "position"? Done. Yeah... that's a left over from an older version, where I tried to write it so there was only one location that would fetch new chunks. Changed. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:341: } On 2016/09/15 08:25:16, marja wrote: > Style nit: I'd write this like: > size_t chunk_no = chunks_.size() - 1; > while (chunk_no > 0 && chunks_[chunk_no].start.chars > position) { > --chunk_no; > } Done. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:366: unibrow::Utf8::Utf8IncrementalBuffer(0)}}; On 2016/09/15 08:25:17, marja wrote: > This is great! Acknowledged. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:387: SkipToPosition(position); On 2016/09/15 08:25:16, marja wrote: > One thing I still find surprising in this CL is that SkipToPosition is not > guaranteed to actually find the position, and it also doesn't return a bool, but > the caller is required to check whether it really skipped to the right position. > But I can somehow see why you did it like that... so idk. > > An alternative would be: > bool position_found = SkipToPosition(position); > while (!position_found) { > if (FetchChunk()) { > position_found = SkipToPosition(position); > } else { > // out of data > break; > } > } > > In this version, also the out-of-data condition (which confused me) is clearer. Done, sort of. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:425: const uint8_t* chunk; On 2016/09/15 08:25:16, marja wrote: > I'd call it "data" instead of "chunk" (because the struct is already called > "Chunk"). Done. https://codereview.chromium.org/2314663002/diff/320001/src/parsing/scanner-ch... src/parsing/scanner-character-streams.cc:506: Chunk& chunk = chunks_[FindChunk(chunks_, source_, position)]; On 2016/09/15 08:25:16, marja wrote: > Why not const Chunk& ? Done. https://codereview.chromium.org/2314663002/diff/320001/test/cctest/parsing/te... File test/cctest/parsing/test-scanner-streams.cc (right): https://codereview.chromium.org/2314663002/diff/320001/test/cctest/parsing/te... test/cctest/parsing/test-scanner-streams.cc:280: // th cases. On 2016/09/15 08:25:17, marja wrote: > typo Done.
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by vogelheim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by vogelheim@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from nikolaos@chromium.org, marja@chromium.org Link to the patchset: https://codereview.chromium.org/2314663002/#ps360001 (title: "Marja's feedback.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. - Testing & correctness: - The unit tests now cover all stream implementations, and are pretty good and triggering all the edge cases. - Vastly more DCHECKs of the invariants. BUG=v8:4947 ========== to ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. - Testing & correctness: - The unit tests now cover all stream implementations, and are pretty good and triggering all the edge cases. - Vastly more DCHECKs of the invariants. BUG=v8:4947 ==========
Message was sent while issue was closed.
Committed patchset #9 (id:360001)
Message was sent while issue was closed.
Description was changed from ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. - Testing & correctness: - The unit tests now cover all stream implementations, and are pretty good and triggering all the edge cases. - Vastly more DCHECKs of the invariants. BUG=v8:4947 ========== to ========== Rework scanner-character-streams. - Smaller, more consistent streams API (Advance, Back, pos, Seek) - Remove implementations from the header, in favor of creation functions. Observe: - Performance: - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a body of only a few instructions. I expect most calls to end up there. - There used to be performance problems w/ bookmarking, particularly with copying too much data on SetBookmark w/ UTF-8 streaming streams. All those copies are gone. - The old streaming streams implementation used to copy data even for 2-byte input. It no longer does. - The only remaining 'slow' method is the Seek(.) slow case for utf-8 streaming streams. I don't expect this to be called a lot; and even if, I expect it to be offset by the gains in the (vastly more frequent) calls to the other methods or the 'fast path'. - If it still bothers us, there are several ways to speed it up. - API & code cleanliness: - I want to remove the 'old' API in a follow-up CL, which should mostly delete code, or replace it 1:1. - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink for streaming streams. - The "bookmark" is now always implemented (and mostly very fast), so we should be able to use it for more things. - Testing & correctness: - The unit tests now cover all stream implementations, and are pretty good and triggering all the edge cases. - Vastly more DCHECKs of the invariants. BUG=v8:4947 Committed: https://crrev.com/642d6d314c7934a05cd2386e4b10fca3267769fc Cr-Commit-Position: refs/heads/master@{#39464} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/642d6d314c7934a05cd2386e4b10fca3267769fc Cr-Commit-Position: refs/heads/master@{#39464} |