| 
 | 
 | 
 Chromium Code Reviews
 Chromium Code Reviews Issue 
            1875263006:
    Experimental CL on top of https://codereview.chromium.org/1812673005/  (Closed)
    
  
    Issue 
            1875263006:
    Experimental CL on top of https://codereview.chromium.org/1812673005/  (Closed) 
  | Created: 4 years, 8 months ago by jungshik at Google Modified: 4 years, 7 months ago CC: v8-reviews_googlegroups.com Base URL: https://chromium.googlesource.com/v8/v8.git@caseconv Target Ref: refs/pending/heads/master Project: v8 Visibility: Public. | DescriptionRevert to C API for perf
el-Upper is not yet supported.
BUG=
   Patch Set 1 #Patch Set 2 : Streamline the way ICU API is called. Use do-while with set_length #Patch Set 3 : copy buffer with ToWideCString: 50~60% slower #Patch Set 4 : another tweak: greek support added #
      Total comments: 2
      
     Patch Set 5 : extend DisallowAloc block #
      Total comments: 1
      
     Patch Set 6 : Use toWideCString over flattended string; do not use GetFlatContent #Patch Set 7 : do not use set_length; call conversion apis twice; add Greek test cases #Patch Set 8 : back to do-while loop with DisallowHeapAlloc inside the loop per adamk@ #
 Depends on Patchset: Messages
    Total messages: 17 (1 generated)
     
 jshin@chromium.org changed reviewers: + adamk@chromium.org, littledan@chromium.org 
 This is not a CL for review, but I'm expeirmenting with this CL on top of https://codereview.chromium.org/1812673005/ I have a couple of questions to adamk@ and littledan@ as mentioned in the other CL. 
 https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... src/runtime/runtime-i18n.cc:799: src = reinterpret_cast<const UChar*>(flat.ToUC16Vector().start()); adamk@ and littledan@: Can you answer my question in the comment above? SeqTwoByteString was created outside the block (before the block with DisallowHeapAllocation). However, the length of |result| (SetTwoByteString) is adjusted by calling |set_length|. Would this affect whether or not the memory pointed to by |src|. FlatContent just points to the buffer of a flattended string, |s| in this case which is passed from a caller. However, "realloc"(? : length adjustment) would trigger gc and make |s| go away along with the buffer pointed to bey |src|? With a debug build when I run test262/intl402/String/*, I got a crash (not in this file but at the end of a process clean-up). Program received signal SIGSEGV, Segmentation fault. 0x000000000041e30c in v8::base::NoBarrier_Load (ptr=0xffffffff) at ../../src/base/atomicops_internals_x86_gcc.h:236 236 return *ptr; (gdb) bt #0 0x000000000041e30c in v8::base::NoBarrier_Load (ptr=0xffffffff) at ../../src/base/atomicops_internals_x86_gcc.h:236 #1 0x000000000041e279 in v8::internal::HeapObject::map_word (this=0x100000000) at ../../src/objects-inl.h:1351 #2 0x000000000041dc25 in v8::internal::HeapObject::map (this=0x100000000) at ../../src/objects-inl.h:1305 #3 0x000000000041e1f5 in v8::internal::HeapObject::IsMap (this=0x100000000) at ../../src/objects-inl.h:677 #4 0x00000000008f8574 in v8::internal::NewSpace::Verify (this=0x20f0dc0) at ../../src/heap/spaces.cc:1618 #5 0x0000000000872154 in v8::internal::Heap::Verify (this=0x20f0220) at ../../src/heap/heap.cc:4625 #6 0x0000000000894e5e in v8::internal::Heap::TearDown (this=0x20f0220) at ../../src/heap/heap.cc:5384 #7 0x00000000009550aa in v8::internal::Isolate::Deinit (this=0x20f0200) at ../../src/isolate.cc:1980 #8 0x0000000000954d19 in v8::internal::Isolate::TearDown (this=0x20f0200) at ../../src/isolate.cc:1905 #9 0x000000000043b62d in v8::Isolate::Dispose (this=0x20f0200) at ../../src/api.cc:7315 #10 0x000000000041684a in v8::Shell::Main (argc=17, argv=0x7fffffffd648) at ../../src/d8.cc:2503 #11 0x000000000041cf72 in main (argc=17, argv=0x7fffffffd648) at ../../src/d8.cc:2516 
 https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... src/runtime/runtime-i18n.cc:799: src = reinterpret_cast<const UChar*>(flat.ToUC16Vector().start()); What you're doing here is basically defeating the point of putting DisallowHeapAllocation in this block: you can't rely on flat (or things vended from it) being valid if a heap allocation has happened since GetFlatContent() was called. But I don't see any V8 heap allocation below, so it's possible you can fix this just by widening the contents of the DisallowHeapAllocation block? 
 Thanks a lot for the answer. On 2016/04/14 01:56:35, adamk wrote: > https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... > File src/runtime/runtime-i18n.cc (right): > > https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... > src/runtime/runtime-i18n.cc:799: src = reinterpret_cast<const > UChar*>(flat.ToUC16Vector().start()); > What you're doing here is basically defeating the point of putting > DisallowHeapAllocation in this block: you can't rely on flat (or things vended > from it) being valid if a heap allocation has happened since GetFlatContent() > was called. Yeah, I understood that much. My questions are: 1) Will |flat.ToUC16Vector().start()| be valid even after |FlatContent flat| (obtained with GetFlatContent()) goes out of scope as long as the underlying flattened string (in this case, it's |s|) is around (i.e. not gc'd out) and NO v8-heap allocation is made ? 2) Does |String::set_length()| trigger a heap allocation (especially if the new length is larger than the old length)? In PS#4 (and PS #5), SeqTwoByteString is created BEFORE 'DisallowHeapAllocation' block. Then, its size is reset with set_length (or synchronized_set_length) AFTER 'DisallowHeapAllocation' block. My debug build does not hit DCHECK for 'disallowed heap allocation' at |set_length|. However, when tearing down isolate, it crashes with the stack trace shown in my previous comment. > But I don't see any V8 heap allocation below, so it's possible you can fix this > just by widening the contents of the DisallowHeapAllocation block? PS #5 extended the DisallowHeapAllocation block to include |set_length| (or |synchronized_set_length| ), but it still crashes when tearing down isolate with the same stack trace. BTW, where is the definition of |set_length| and |synchronized_set_length|? I found callers and declarations, but couldn't find any definition. 
 On 2016/04/14 06:21:21, jshin (jungshik at google) wrote: > Thanks a lot for the answer. > > On 2016/04/14 01:56:35, adamk wrote: > > > https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... > > File src/runtime/runtime-i18n.cc (right): > > > > > https://codereview.chromium.org/1875263006/diff/60001/src/runtime/runtime-i18... > > src/runtime/runtime-i18n.cc:799: src = reinterpret_cast<const > > UChar*>(flat.ToUC16Vector().start()); > > What you're doing here is basically defeating the point of putting > > DisallowHeapAllocation in this block: you can't rely on flat (or things vended > > from it) being valid if a heap allocation has happened since GetFlatContent() > > was called. > > Yeah, I understood that much. My questions are: > > 1) Will |flat.ToUC16Vector().start()| be valid even after |FlatContent flat| > (obtained with GetFlatContent()) goes out of scope as long as > the underlying flattened string (in this case, it's |s|) is around (i.e. not > gc'd out) and NO v8-heap allocation is made ? > > 2) Does |String::set_length()| trigger a heap allocation (especially if the new > length is larger than the old length)? In PS#4 (and PS #5), > SeqTwoByteString is created BEFORE 'DisallowHeapAllocation' block. Then, its > size is reset with set_length (or synchronized_set_length) AFTER > 'DisallowHeapAllocation' block. > > My debug build does not hit DCHECK for 'disallowed heap allocation' at > |set_length|. However, when tearing down isolate, it crashes with the stack > trace shown in my previous comment. > > > > But I don't see any V8 heap allocation below, so it's possible you can fix > this > > just by widening the contents of the DisallowHeapAllocation block? > > PS #5 extended the DisallowHeapAllocation block to include |set_length| (or > |synchronized_set_length| ), but it still crashes when tearing down isolate with > the same stack trace. BTW, this crash (whose stack is given earlier) happens ONLY when '--verify-heap' is passed to a debug build of d8. (not to say that it's not a worry but to clarify). $ out/Debug/d8 --verify-heap -e "...." 
 |set_length| appears to be the cause of crash with '--verify-heap' command line
flag.
1. out/Debug/d8  -e  ' print ("\u00df\ufb01\ufb02".toUpperCase());'
Length change: The uppercased version is longer than the input. 
2. out/Debug/d8 --verify-heap -e  'print("\u03c1\u03c0".toUpperCase());'
No length change. set_length() would be no-op.
Both #1 and #2 gave the correct result when run without '--verify-heap'. 
With '--verify-heap', #1 crashes, while #2 does not.
https://codereview.chromium.org/1875263006/diff/80001/src/runtime/runtime-i18...
File src/runtime/runtime-i18n.cc (right):
https://codereview.chromium.org/1875263006/diff/80001/src/runtime/runtime-i18...
src/runtime/runtime-i18n.cc:812: result->synchronized_set_length(target_length);
|set_length| is called over |result| ( which is SeqTwoByteString created before
entering the DisallowHeapAllocation block). Would this be problematic? 
In PS#5,  it's inside the DisallowHeapAllocation block. In PS#4, set_length()
was outside the block.
 On 2016/04/14 07:46:32, jshin (jungshik at google) wrote:
> |set_length| appears to be the cause of crash with '--verify-heap' command
line
> flag.
> 
> 1. out/Debug/d8  -e  ' print ("\u00df\ufb01\ufb02".toUpperCase());'
> 
> Length change: The uppercased version is longer than the input. 
> 
> 2. out/Debug/d8 --verify-heap -e  'print("\u03c1\u03c0".toUpperCase());'
> 
> No length change. set_length() would be no-op.
> 
> 
> Both #1 and #2 gave the correct result when run without '--verify-heap'. 
> 
> With '--verify-heap', #1 crashes, while #2 does not.
The above was done with PS #5.  
In PS #6, I gave up using GetFlatContent (No DisallowHeapAllocation any more)
and switched to using a slower ToWideCString. However, I still get a crash when
isolate is torn down at the end.   So, am I not supposed to use
|String::set_length()| ? 
Program received signal SIGILL, Illegal instruction.
v8::base::OS::Abort () at ../../src/base/platform/platform-posix.cc:240
240         V8_IMMEDIATE_CRASH();
(gdb) bt
#0  v8::base::OS::Abort () at ../../src/base/platform/platform-posix.cc:240
#1  0x000000000101aaf8 in V8_Fatal (file=0x12caf13 "../../src/heap/spaces.cc",
line=1611, format=0x128825b "Check failed: %s.")
    at ../../src/base/logging.cc:116
#2  0x00000000008f8549 in v8::internal::NewSpace::Verify (this=0x20f0e90) at
../../src/heap/spaces.cc:1610
#3  0x0000000000872154 in v8::internal::Heap::Verify (this=0x20f02f0) at
../../src/heap/heap.cc:4625
#4  0x0000000000894e5e in v8::internal::Heap::TearDown (this=0x20f02f0) at
../../src/heap/heap.cc:5384
#5  0x00000000009550aa in v8::internal::Isolate::Deinit (this=0x20f02d0) at
../../src/isolate.cc:1980
#6  0x0000000000954d19 in v8::internal::Isolate::TearDown (this=0x20f02d0) at
../../src/isolate.cc:1905
#7  0x000000000043b62d in v8::Isolate::Dispose (this=0x20f02d0) at
../../src/api.cc:7315
#8  0x000000000041684a in v8::Shell::Main (argc=4, argv=0x7fffffffdf38) at
../../src/d8.cc:2503
#9  0x000000000041cf72 in main (argc=4, argv=0x7fffffffdf38) at
../../src/d8.cc:2516
 Because I don't know how to avoid a crash with '--verify-heap' when |set_length| is used, I'll go back to PS #1 (that does not use set_length) and enclose "as much as possible" within the DisallowAllocation block although it's a bit messy. 
 On 2016/04/14 09:10:28, jshin (jungshik at google) wrote: > Because I don't know how to avoid a crash with '--verify-heap' when |set_length| > is used, I'll go back to PS #1 (that does not use set_length) and enclose "as > much as possible" within the DisallowAllocation block although it's a bit messy. Another alternative is PS #24/25 at https://codereview.chromium.org/1812673005/. It's simple and neat, but it's 50% slower for a non-Latin-1 input. 
 Can't respond to each of the last messages in detail, but I missed what sort of object you were calling set_length() on. set_length() on a SeqString doesn't actually change the length of the allocated buffer, and so can't be used to grow the string. To shrink the string, you'd want to use SeqString::Truncate. In general high-level runtime code like this shouldn't be trying to grow or shrink existing heap objects in-place. 
 On 2016/04/14 16:09:56, adamk wrote: > Can't respond to each of the last messages in detail, but I missed what sort of > object you were calling set_length() on. set_length() on a SeqString doesn't > actually change the length of the allocated buffer, and so can't be used to grow > the string. To shrink the string, you'd want to use SeqString::Truncate. In > general high-level runtime code like this shouldn't be trying to grow or shrink > existing heap objects in-place. Thanks a lot for the answer !! It's now clear that I cannot use |set_length| in this CL because it cannot grow the SeqString (in a proper way). A bit more detailed comment in class::String in objects.h would be helpful to novices like me especially considering that everything appears to work properly unless I run with '--verify-heap'. :-) Could you answer this question as well? Sorry to bother you so much. 1) Will the memory pointed to by |flat.ToUC16Vector().start()| be valid even after |FlatContent flat| (obtained with GetFlatContent()) goes out of scope as long as the underlying flattened string (for which GetFlatContent() is run to obtain |flat|) is around (i.e. not> gc'd out) and v8-heap allocation NOT made ? It looks like it is (no crash/check/error even with '--verify-heap'), but I want to make sure. Thanks again. 
 On 2016/04/14 17:39:08, jshin (jungshik at google) wrote: > On 2016/04/14 16:09:56, adamk wrote: > > Can't respond to each of the last messages in detail, but I missed what sort > of > > object you were calling set_length() on. set_length() on a SeqString doesn't > > actually change the length of the allocated buffer, and so can't be used to > grow > > the string. To shrink the string, you'd want to use SeqString::Truncate. In > > general high-level runtime code like this shouldn't be trying to grow or > shrink > > existing heap objects in-place. > > Thanks a lot for the answer !! > > It's now clear that I cannot use |set_length| in this CL because > it cannot grow the SeqString (in a proper way). A bit more detailed comment in > class::String in > objects.h would be helpful to novices like me especially considering that > everything appears > to work properly unless I run with '--verify-heap'. :-) There's a general lack of good separation between different parts of the interfaces in objects.h: set_length needs to be exposed so that during allocation it can be set appropriately, but that doesn't mean it's safe to set in general. While I admit a comment there might be helpful, there are many such places in objects.h that would need similar warnings. A note on code style: the fact that set_length is in lower_case_style is to suggest that it's a trivial operation, just setting the length field. And for V8 heap objects in general, the "length" field has a special meaning, as it's used by the GC to determine the extent of an object's body. > > Could you answer this question as well? Sorry to bother you so much. > > 1) Will the memory pointed to by |flat.ToUC16Vector().start()| be valid even > after |FlatContent flat| > (obtained with GetFlatContent()) goes out of scope as long as > the underlying flattened string (for which GetFlatContent() is run to obtain > |flat|) is around > (i.e. not> gc'd out) and v8-heap allocation NOT made ? > > It looks like it is (no crash/check/error even with '--verify-heap'), but I want > to make sure. I don't know the answer for sure, but it doesn't sound like a good pattern to adopt, just from usual C++ principles and the warnings around GetFlatContent. The return value of GetFlatContent really seems to be meant to be consumed ASAP. > Thanks again. 
 On 2016/04/14 21:22:29, adamk wrote: > On 2016/04/14 17:39:08, jshin (jungshik at google) wrote: > > On 2016/04/14 16:09:56, adamk wrote: > > > Can't respond to each of the last messages in detail, but I missed what sort > > of > > > object you were calling set_length() on. set_length() on a SeqString doesn't > > > actually change the length of the allocated buffer, and so can't be used to > > grow > > > the string. To shrink the string, you'd want to use SeqString::Truncate. In > > > general high-level runtime code like this shouldn't be trying to grow or > > shrink > > > existing heap objects in-place. > > > > Thanks a lot for the answer !! > > > > It's now clear that I cannot use |set_length| in this CL because > > it cannot grow the SeqString (in a proper way). A bit more detailed comment in > > class::String in > > objects.h would be helpful to novices like me especially considering that > > everything appears > > to work properly unless I run with '--verify-heap'. :-) > > There's a general lack of good separation between different parts of the > interfaces in objects.h: set_length needs to be exposed so that during > allocation it can be set appropriately, but that doesn't mean it's safe to set > in general. While I admit a comment there might be helpful, there are many such > places in objects.h that would need similar warnings. > > A note on code style: the fact that set_length is in lower_case_style is to > suggest that it's a trivial operation, just setting the length field. And for V8 > heap objects in general, the "length" field has a special meaning, as it's used > by the GC to determine the extent of an object's body. Thank you for the explanation. > > > > > Could you answer this question as well? Sorry to bother you so much. > > > > 1) Will the memory pointed to by |flat.ToUC16Vector().start()| be valid even > > after |FlatContent flat| > > (obtained with GetFlatContent()) goes out of scope as long as > > the underlying flattened string (for which GetFlatContent() is run to obtain > > |flat|) is around > > (i.e. not> gc'd out) and v8-heap allocation NOT made ? > > > > It looks like it is (no crash/check/error even with '--verify-heap'), but I > want > > to make sure. > > I don't know the answer for sure, but it doesn't sound like a good pattern to > adopt, just from usual C++ principles and the warnings around GetFlatContent. > The return value of GetFlatContent really seems to be meant to be consumed ASAP. I would avoid it because it does not sound like a good pattern (I agree with you), but I don't see a good performant way to avoid that. (and PS #7 seems to be ok, although that does not guarantee that it'll be always ok). A few considerations are enumerated below. 1. I cannot use set_length to change the string length. That means, I have to create a new destination/result string (second time) with an adjusted length while still having to access the memory location pointed to by |flat.ToUC16Vector().start()| for the source/input. And this has to happen outside the DisallowHeapAllocation block (inside which GetFlatContent() is called). If String::FlatContent had a public default ctor, this could be worked around by having one created before entering the DisallowHeapAllocation block. 2. If I use ICU's C++ API/UnicodeString, I can defer the creation of SeqString until the very end (PS #27 at https://codereview.chromium.org/1812673005/ does that) and the rest of the code can be enclosed within DisallowHeapAllocation block to avoid this issue entirely. However, the 'constant' set-up time for this approach (icu::UnicodeString) is rather high and a perf degradation for short strings is significant percentage-point-wise ( 1200 ms vs 1850 ms for 10^7 operations. Hmm.. it's 0.06 us per call. Do we need to worry? ) 3. Yet another alternative is to use a public API to extract |Uchar*| buffer from |String| (i.e. v8::String::Value ). It's slower than the current approach - PS #7 (I forgot how much), but I can avoid an issue with GetFlatContent and DisallowHeapAllocation. I'll see how much this slows down things. If the perf difference (in #2 above) is nothing to worry about, I'll just go ahead with PS #27 at https://codereview.chromium.org/1812673005. 
 On 2016/04/14 23:50:04, jshin (jungshik at google) wrote: > 1. I cannot use set_length to change the string length. That means, I have to > create a new destination/result string (second time) > with an adjusted length while still having to access the memory location pointed > to by |flat.ToUC16Vector().start()| for the source/input. > And this has to happen outside the DisallowHeapAllocation block (inside which > GetFlatContent() is called). > If String::FlatContent had a public default ctor, this could be worked around by > having one created before entering the DisallowHeapAllocation block. > > 2. If I use ICU's C++ API/UnicodeString, I can defer the creation of SeqString > until the very end (PS #27 at > https://codereview.chromium.org/1812673005/ does that) and the rest of the code > can be enclosed within DisallowHeapAllocation block > to avoid this issue entirely. However, the 'constant' set-up time for this > approach (icu::UnicodeString) is rather high and a perf > degradation for short strings is significant percentage-point-wise ( 1200 ms vs > 1850 ms for 10^7 operations. Hmm.. it's 0.06 us per > call. Do we need to worry? ) > > 3. Yet another alternative is to use a public API to extract |Uchar*| buffer > from |String| (i.e. v8::String::Value ). It's slower than the current > approach - PS #7 (I forgot how much), but I can avoid an issue with > GetFlatContent and DisallowHeapAllocation. I'll see how much this slows down > things. I just measured the perf of #3 (toLowerCase with a non-Latin1 input) and it's the slowest of all three (1200ms vs 1900ms vs 2300 ms for a short 2 char input and the slope with longer strings seem steeper). So, it's out of consideration. > If the perf difference (in #2 above) is nothing to worry about, I'll just go > ahead with PS #27 at https://codereview.chromium.org/1812673005. 
 On 2016/04/14 23:50:04, jshin (jungshik at google) wrote:
> On 2016/04/14 21:22:29, adamk wrote:
> > On 2016/04/14 17:39:08, jshin (jungshik at google) wrote:
> > > On 2016/04/14 16:09:56, adamk wrote:
> > > > Can't respond to each of the last messages in detail, but I missed what
> sort
> > > of
> > > > object you were calling set_length() on. set_length() on a SeqString
> doesn't
> > > > actually change the length of the allocated buffer, and so can't be used
> to
> > > grow
> > > > the string. To shrink the string, you'd want to use SeqString::Truncate.
> In
> > > > general high-level runtime code like this shouldn't be trying to grow or
> > > shrink
> > > > existing heap objects in-place.
> > > 
> > > Thanks a lot for the answer !!  
> > > 
> > > It's now clear that I cannot use |set_length| in this CL because
> > > it cannot grow the SeqString (in a proper way). A bit more detailed
comment
> in
> > > class::String in 
> > > objects.h would be helpful to novices like me especially considering that
> > > everything appears
> > > to work properly unless I run with '--verify-heap'. :-) 
> > 
> > There's a general lack of good separation between different parts of the
> > interfaces in objects.h: set_length needs to be exposed so that during
> > allocation it can be set appropriately, but that doesn't mean it's safe to
set
> > in general. While I admit a comment there might be helpful, there are many
> such
> > places in objects.h that would need similar warnings.
> > 
> > A note on code style: the fact that set_length is in lower_case_style is to
> > suggest that it's a trivial operation, just setting the length field. And
for
> V8
> > heap objects in general, the "length" field has a special meaning, as it's
> used
> > by the GC to determine the extent of an object's body.
> 
> Thank you for the explanation. 
> > 
> > > 
> > > Could you answer this question as well? Sorry to bother you so much. 
> > > 
> > > 1) Will the memory pointed to by |flat.ToUC16Vector().start()| be valid
even
> > > after |FlatContent flat|
> > > (obtained with GetFlatContent()) goes out of scope as long as 
> > > the underlying flattened string (for which GetFlatContent() is run to
obtain
> > > |flat|) is around 
> > > (i.e. not> gc'd out) and v8-heap allocation NOT made ?  
> > > 
> > > It looks like it is (no crash/check/error even with '--verify-heap'), but
I
> > want
> > > to make sure. 
> > 
> > I don't know the answer for sure, but it doesn't sound like a good pattern
to
> > adopt, just from usual C++ principles and the warnings around
GetFlatContent.
> > The return value of GetFlatContent really seems to be meant to be consumed
> ASAP.
> 
> I would avoid it because it does not sound like a good pattern (I agree with
> you), but 
> I don't see a good performant way to avoid that. (and PS #7 seems to be ok,
> although that
> does not guarantee that it'll be always ok). A few considerations are
enumerated
> below. 
> 
> 1. I cannot use set_length to change the string length. That means, I have to
> create a new destination/result string (second time) 
> with an adjusted length while still having to access the memory location
pointed
> to by |flat.ToUC16Vector().start()| for the source/input.
> And this has to happen outside the DisallowHeapAllocation block (inside which
> GetFlatContent() is called). 
> If String::FlatContent had a public default ctor, this could be worked around
by
> having one created before entering the DisallowHeapAllocation block. 
There's no way to do this. If you create a second string, that invalidates the
FlatContent. I'm still trying to swap this whole thing into my head (was
previously just answering your specific questions), but one way to structure
this code would be, in pseudo-code:
'''
do {
  length = expected_length
  result = allocate_string(length)
  {
    DisallowHeapAllocation no_gc;
    FlatContent flat = src_string->GetFlatContent();
    src = flat.ToWhatever()
    attempt_conversion(src, result, length, &required_length) 
  }
  expected_length = required_length
} while (required_length > length)
if (required_length < length) {
  result = SeqString::Truncate(result, required_length)
}
'''
In the case where the length was guessed correctly up-front, this is
straight-line code that only does a single allocation. If the required length is
shorter, we use the already-provided truncation method to attempt to truncate
the result in-place. Finally, if more space is required, we go back to the top.
This doesn't violate GetFlatContent's contract, and only allocates an extra
string if it's required. It also doesn't require any repetition nor does it
require any weird scoping of locals, since the flat content is only accessed
inside the DisallowHeapAllocation block.
> 2. If I use ICU's C++ API/UnicodeString, I can defer the creation of SeqString
> until the very end (PS #27 at
> https://codereview.chromium.org/1812673005/ does that) and the rest of the
code
> can be enclosed within DisallowHeapAllocation block 
> to avoid this issue entirely. However, the 'constant' set-up time for this
> approach (icu::UnicodeString) is rather high and a perf
> degradation for short strings is significant percentage-point-wise ( 1200 ms
vs
> 1850 ms for 10^7 operations. Hmm.. it's  0.06 us per
> call. Do we need to worry? )
I don't know how to make a decision about what the right thing to do here is,
and if my above suggestion still works after conversion to real C++ code and
it's faster, then it seems fine to avoid this other C++ API. But if my
suggestion doesn't work for some reason, it seems reasonable to me to start with
easy-to-understand code that passes tests, and then do followup work to try to
improve its performance.
> 3. Yet another alternative is to use a public API to extract |Uchar*| buffer
> from |String|  (i.e. v8::String::Value ). It's slower than the current 
> approach - PS #7 (I forgot how much), but I can avoid an issue with
> GetFlatContent and DisallowHeapAllocation. I'll see how much this slows down
> things.
> 
> 
> 
> If the perf difference (in #2 above) is nothing to worry about, I'll just go
> ahead with PS #27 at https://codereview.chromium.org/1812673005.
 On 2016/04/15 00:20:29, adamk wrote:
> There's no way to do this. If you create a second string, that invalidates the
> FlatContent. I'm still trying to swap this whole thing into my head (was
> previously just answering your specific questions), but one way to structure
> this code would be, in pseudo-code:
> 
> '''
> do {
>   length = expected_length
>   result = allocate_string(length)
>   {
>     DisallowHeapAllocation no_gc;
>     FlatContent flat = src_string->GetFlatContent();
>     src = flat.ToWhatever()
>     attempt_conversion(src, result, length, &required_length) 
>   }
>   expected_length = required_length
> } while (required_length > length)
> 
> if (required_length < length) {
>   result = SeqString::Truncate(result, required_length)
> }
Thanks a lot for the suggestion. It works well. 
I was too fixated on getting the pointer to the input string (|src|) before
starting any real work to see this 
approach. It works well except that I have to live with a little bit of the code
duplication ( GetFlatContent, UC16Vector, etc) for Greek uppercasing
and all other cases el-Upper has to be handled separately for now for the reason
given in the comment in the CL. I also got |src| to point to
a buffer obtained with |ToWideCString| in rare case of OneByteString coming to
this function at the beginning so that it can be shared by both 
el-Upper and the rest and it does not have to be reinitialized in the second
time 'do-while' loop is run (buffer overflow case). 
I'll move this back to the original CL
(https://codereview.chromium.org/1812673005/) and ask for review there. (I'll be
ooo Friday so that it may be early next week). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
