|
|
Created:
9 years, 6 months ago by sandholm Modified:
9 years, 5 months ago CC:
v8-dev Visibility:
Public. |
DescriptionImprove JSON.parse to use less memory when using escaped and non-ascii
characters.
Patch Set 1 #Patch Set 2 : '' #
Total comments: 8
Patch Set 3 : '' #
Total comments: 17
Patch Set 4 : '' #Messages
Total messages: 6 (0 generated)
No capes^H^H^H^H^Hgotos! The logic is too complex, with the same test being used to both exit the loop and later assume that it wasn't a clean exit. http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h File src/json-parser.h (right): http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode418 src/json-parser.h:418: int length = kInitialSpecialStringSize; Change ...StringSize to ...StringLength. Size sounds like it's the number of bytes, not the number of characters. http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode420 src/json-parser.h:420: isolate()->factory()->NewRawTwoByteString(length, NOT_TENURED); If NewRawTwoByteString doesn't return a Handle<SeqTwoByteString>, you should change it to do so. http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode504 src/json-parser.h:504: Handle<String> JsonParser<seq_ascii>::SlowScanJsonAsciiString( Sounds like it scans an ASCII string, not that it creates one. How about SlowScanJsonIntoAsciiString? (Otherwise I start questioning why we don't know that c0_ is ASCII inside the loop). http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode571 src/json-parser.h:571: goto outer_loop; // break out of while loop. I'm really not too keen on using gotos. Really not. But the style guide doesn't say it's not allowed, so I guess I can't prevent it. http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode587 src/json-parser.h:587: int string_size = SeqAsciiString::SizeFor(count); If the truncation would turn a lot of space into filler, why not reuse it for the two-byte string? http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode593 src/json-parser.h:593: } If you move the shrinking to a separate function, you can call it from where you do the goto, and call directly to the two-byte scan, instead of this torturous logic. http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode618 src/json-parser.h:618: break; Again, you can abstract the part below into a function, so you can call it both here and after the loop, and then call and return the result of SlowScanJsonAsciiString from here only.
I rewrote things and addressed all but one of your comments. Please have another look. On 2011/06/24 13:56:00, Lasse Reichstein wrote: > No capes^H^H^H^H^Hgotos! no gotos anymore. > > The logic is too complex, with the same test being used to both exit the loop > and later assume that it wasn't a clean exit. complex logic is gone. > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h > File src/json-parser.h (right): > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode418 > src/json-parser.h:418: int length = kInitialSpecialStringSize; > Change ...StringSize to ...StringLength. Size sounds like it's the number of > bytes, not the number of characters. done > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode420 > src/json-parser.h:420: isolate()->factory()->NewRawTwoByteString(length, > NOT_TENURED); > If NewRawTwoByteString doesn't return a Handle<SeqTwoByteString>, you should > change it to do so. done (by someone else?) > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode504 > src/json-parser.h:504: Handle<String> > JsonParser<seq_ascii>::SlowScanJsonAsciiString( > Sounds like it scans an ASCII string, not that it creates one. > How about SlowScanJsonIntoAsciiString? > (Otherwise I start questioning why we don't know that c0_ is ASCII inside the > loop). Both the Ascii and TwoByte case are now handled in one template. > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode571 > src/json-parser.h:571: goto outer_loop; // break out of while loop. > I'm really not too keen on using gotos. Really not. > But the style guide doesn't say it's not allowed, so I guess I can't prevent it. goto is gone. > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode587 > src/json-parser.h:587: int string_size = SeqAsciiString::SizeFor(count); > If the truncation would turn a lot of space into filler, why not reuse it for > the two-byte string? I think this branch is mostly handling cases where we're allocated in large object space in which case there is not much to do anyway. > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode593 > src/json-parser.h:593: } > If you move the shrinking to a separate function, you can call it from where you > do the goto, and call directly to the two-byte scan, instead of this torturous > logic. Torturous logic gone. > > http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode618 > src/json-parser.h:618: break; > Again, you can abstract the part below into a function, so you can call it both > here and after the loop, and then call and return the result of > SlowScanJsonAsciiString from here only. (sort of) done.
LGTM http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h File src/json-parser.h (right): http://codereview.chromium.org/7241023/diff/3001/src/json-parser.h#newcode111 src/json-parser.h:111: // as first part of a ConsString Comment out of date (what is ascii_count)? http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode113 src/json-parser.h:113: Handle<String> SlowScanJsonTwoByteString(Handle<String> prefix); Slow version of what? Just say what the function does, and only after that maybe say which function is supposed to use it. So it creates a new string, copies prefix[start..end] into the beginning of it (and I assume start can be non-zero when prefix is source_) and starts scanning the string and adding it after the prefix. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode171 src/json-parser.h:171: // Optimized fast case where we only have ascii characters. ASCII is an acronym when used in prose. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode459 src/json-parser.h:459: seq_two_byte->SeqTwoByteStringSet(count++, '\x0d'); Move comment to after "if" line, so it only applies if the condition is true (or make it apply to the check itself, like below: "Check whether we need to create a new string."). Comments should be full sentences (so remember "." at the end). No, we don't always follow that in practice :) http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode466 src/json-parser.h:466: for (int i = 0; i < 4; i++) { Does it lint? (Generally, us kCharSize instead of sizeof(char)). For readability, I'd prefer the operands in the opposite order: if (sizeof(SinkChar) != kCharSize || and you should probably do: if (sizeof(SinkChar) == kUC16Size || since you know that there are only those two possibilities. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode467 src/json-parser.h:467: Advance(); This could use a comment: If the sink can contain UC16 characters, or source_ contains only ASCII characters, there's no need to test whether we can store the character. Otherwise check whether the UC16 source character can fit in the ASCII sink. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode472 src/json-parser.h:472: value = value * 16 + digit; non-ASCII. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode510 src/json-parser.h:510: Handle<SeqAsciiString>::cast(new_ascii); sizeof(SinkChar) == kUC16Size http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode531 src/json-parser.h:531: } Do ASSERT that the string is at the allocation boundary. (It should be, there have been no allocations or possible interruptions since it was allocated).
Lasse, thanks for doing the review. I like all of your comments and improvements. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h File src/json-parser.h (right): http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode113 src/json-parser.h:113: Handle<String> SlowScanJsonString(Handle<String> prefix, int start, int end); On 2011/06/29 09:27:30, Lasse Reichstein wrote: > Slow version of what? Just say what the function does, and only after that maybe > say which function is supposed to use it. > > So it creates a new string, copies prefix[start..end] into the beginning of it > (and I assume start can be non-zero when prefix is source_) and starts scanning > the string and adding it after the prefix. > Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode171 src/json-parser.h:171: // Optimized fast case where we only have ascii characters. On 2011/06/29 09:27:30, Lasse Reichstein wrote: > ASCII is an acronym when used in prose. Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode459 src/json-parser.h:459: // Create new seq string On 2011/06/29 09:27:30, Lasse Reichstein wrote: > Move comment to after "if" line, so it only applies if the condition is true (or > make it apply to the check itself, like below: "Check whether we need to create > a new string."). Comments should be full sentences (so remember "." at the end). > No, we don't always follow that in practice :) Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode466 src/json-parser.h:466: if (sizeof(char) != sizeof(SinkChar) || On 2011/06/29 09:27:30, Lasse Reichstein wrote: > Does it lint? (Generally, us kCharSize instead of sizeof(char)). > For readability, I'd prefer the operands in the opposite order: > if (sizeof(SinkChar) != kCharSize || > and you should probably do: > if (sizeof(SinkChar) == kUC16Size || > since you know that there are only those two possibilities. Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode467 src/json-parser.h:467: seq_ascii || On 2011/06/29 09:27:30, Lasse Reichstein wrote: > This could use a comment: > If the sink can contain UC16 characters, or source_ contains only ASCII > characters, there's no need to test whether we can store the character. > Otherwise check whether the UC16 source character can fit in the ASCII sink. Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode472 src/json-parser.h:472: // StringType is SeqAsciiString and we just read a non-ascii char. On 2011/06/29 09:27:30, Lasse Reichstein wrote: > non-ASCII. Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode510 src/json-parser.h:510: if (sizeof(char) != sizeof(SinkChar) || value <= kMaxAsciiCharCode) { On 2011/06/29 09:27:30, Lasse Reichstein wrote: > sizeof(SinkChar) == kUC16Size Done. http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode531 src/json-parser.h:531: template ShrinkStringAtAllocationBoundary<StringType>( On 2011/06/29 09:27:30, Lasse Reichstein wrote: > Do ASSERT that the string is at the allocation boundary. > (It should be, there have been no allocations or possible interruptions since it > was allocated). That ASSERT is in ShrinkStringAtAllocationBoundary. That should suffice?
http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h File src/json-parser.h (right): http://codereview.chromium.org/7241023/diff/5001/src/json-parser.h#newcode531 src/json-parser.h:531: template ShrinkStringAtAllocationBoundary<StringType>( It'll do :) It's probably better to keep it there, than to start introducing the top of newspace here, when we don't otherwise use it. |