Issue 1812673005: Use ICU case conversion/transliterator for case conversion behind a flag

Dan Ehrenberg

littledan@chromium.org changed reviewers: + littledan@chromium.org

4 years, 9 months ago (2016-03-17 20:38:35 UTC) #1

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/60001/test/intl/testcfg.py File test/intl/testcfg.py (right): https://codereview.chromium.org/1812673005/diff/60001/test/intl/testcfg.py#newcode57 test/intl/testcfg.py:57: flags = ["--allow-natives-syntax", "--icu-case-mapping"] + \ BTW other testcfg.py's ...

4 years, 9 months ago (2016-03-17 20:38:36 UTC) #2

jungshik at Google

In the latest patch set, I addressed Yang's comments (from https://codereview.chromium.org/1544023002 ; compare ps#1 and ...

4 years, 8 months ago (2016-04-07 18:57:11 UTC) #3

jungshik at Google

On 2016/04/07 18:57:11, jshin (jungshik at google) wrote: > In the latest patch set, I ...

4 years, 8 months ago (2016-04-07 19:41:43 UTC) #4

adamk

Some answers about GetFlatContent... https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-strings.cc File src/runtime/runtime-strings.cc (right): https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-strings.cc#newcode1094 src/runtime/runtime-strings.cc:1094: String::FlatContent flat = s->GetFlatContent(); You'll ...

4 years, 8 months ago (2016-04-07 20:50:55 UTC) #6

jungshik at Google

On 2016/04/07 20:50:55, adamk wrote: > Some answers about GetFlatContent... > > https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-strings.cc > File ...

4 years, 8 months ago (2016-04-07 22:56:31 UTC) #7

On 2016/04/07 20:50:55, adamk wrote:
> Some answers about GetFlatContent...
> 
>
https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-st...
> File src/runtime/runtime-strings.cc (right):
> 
>
https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-st...
> src/runtime/runtime-strings.cc:1094: String::FlatContent flat =
> s->GetFlatContent();
> You'll want to add a DisallowHeapAllocation object on the stack here, like:
> 
> DisallowHeapAllocation no_gc;
> 
> What the DCHECK in GetFlatContent is doing is keeping you from holding onto a
> pointer to the flat content across a GC, which might invalidate it.

Thanks a lot !!

I also discovered |DisallowHeapAllocation no_gc| right above other callers of
GetFlatContent
and experimented with it.

As you wrote, now I'm getting DCHECK failure in other places where heap
allocation *is*
required. I'll follow your suggestion below to isolate DisallowHeapAllocation.

> 
> Once you've added the above DisallowHeapAllocation, you'll need to make sure
> that you do avoid allocations while it's on the stack (otherwise you'll get
> other DCHECK failures). Specifically this would be the NewStringFromTwoByte()
> call down at the bottom; the usual thing to do is put the rest of the function
> inside an inner block, and allocate the DisallowHeapAllocation in there.
> 
> However...
> 
>
https://codereview.chromium.org/1812673005/diff/140001/src/runtime/runtime-st...
> src/runtime/runtime-strings.cc:1106: // This UnicodeString ctor has
> copy-on-write semantics. It starts as a
> ...this comment worries me. Does this mean that converted.getBuffer() below
> could be pointing at |src|? That doesn't sound valid to me, since
> NewStringFromTwoByte needs to read from getBuffer _after_ doing an allocation,
> which might cause GC. This function needs to make sure the FlatContent isn't
> referenced when an allocation occurs.

An allocation will happen in converted.toUpper() and converted.toLower() while
referring to |src|. By the time, converted.getBuffer() is called, |converted|
will have a new buffer (different from |src). That new buffer can be either in
stack or heap (it's icu::UniocdeString's internal representation based on the
length of a string). 

Anyway, it looks like I have to give up on 'copy-on-write' optimization.

jungshik at Google

Forgot to ask for another look. All the tests (that have been disabled) now pass ...

4 years, 8 months ago (2016-04-08 18:09:33 UTC) #8

jungshik at Google

I have a new plan. I'd rather fix both to{Upper,Lower}Case (locale-independent / root locale) and ...

4 years, 8 months ago (2016-04-08 19:44:42 UTC) #9

Dan Ehrenberg

Alright, overwriting SGTM. https://codereview.chromium.org/1812673005/diff/60001/test/intl/testcfg.py File test/intl/testcfg.py (right): https://codereview.chromium.org/1812673005/diff/60001/test/intl/testcfg.py#newcode57 test/intl/testcfg.py:57: flags = ["--allow-natives-syntax", "--icu-case-mapping"] + \ ...

4 years, 8 months ago (2016-04-08 21:13:46 UTC) #10

jungshik at Google

https://codereview.chromium.org/1812673005/diff/300001/src/runtime/runtime-i18n.cc File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1812673005/diff/300001/src/runtime/runtime-i18n.cc#newcode819 src/runtime/runtime-i18n.cc:819: false, reinterpret_cast<const UChar*>(*string_value), length); a question: Using this public ...

4 years, 8 months ago (2016-04-11 17:47:30 UTC) #11

jungshik at Google

Description was changed from ========== Call out to ICU for case conversion under a new ...

4 years, 8 months ago (2016-04-11 20:20:31 UTC) #12

Description was changed from

==========
Call out to ICU for case conversion under a new flag


patch from issue 1544023002 at patchset 200001
(http://crrev.com/1544023002#ps200001) by littledan@


V8 has its own implementation of case conversion, in unibrow, which
has a couple bugs. This patch  calls out to Unicode case conversion
from String.prototype.toLowerCase() and String.prototype.toUpperCase()
if the --icu-case-mapping flag is passed. On no-intl builds, the flag
is disabled. A fast-path for some latin1 strings is taken from Blink
in an attempt to avoid performance regressions, and microbenchmarks
show that it is competitive with the old implementation. This new behavior
fixes a number of test262 tests, as well as a Kangax test.

R=yangguo
BUG=v8:4476
LOG=Y
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, intl builds

In non-intl builds, they're still handled by unibrow with a few issues.


V8 has its own implementation of case conversion, in unibrow, which
has a couple bugs. This patch  calls out to Unicode case conversion
from String.prototype.toLowerCase() and String.prototype.toUpperCase()
if the --icu-case-mapping flag is passed. On no-intl builds, the flag
is disabled. A fast-path for some latin1 strings is taken from Blink
in an attempt to avoid performance regressions, and microbenchmarks
show that it is competitive with the old implementation. This new behavior
fixes a number of test262 tests, as well as a Kangax test.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582
    http://unicode.org/cldr/trac/ticket/6921
    http://unicode.org/cldr/trac/ticket/7039

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 8 months ago (2016-04-11 20:26:45 UTC) #13

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, intl builds

In non-intl builds, they're still handled by unibrow with a few issues.


V8 has its own implementation of case conversion, in unibrow, which
has a couple bugs. This patch  calls out to Unicode case conversion
from String.prototype.toLowerCase() and String.prototype.toUpperCase()
if the --icu-case-mapping flag is passed. On no-intl builds, the flag
is disabled. A fast-path for some latin1 strings is taken from Blink
in an attempt to avoid performance regressions, and microbenchmarks
show that it is competitive with the old implementation. This new behavior
fixes a number of test262 tests, as well as a Kangax test.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582
    http://unicode.org/cldr/trac/ticket/6921
    http://unicode.org/cldr/trac/ticket/7039

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/ and intl/ that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582
    http://unicode.org/cldr/trac/ticket/6921
    http://unicode.org/cldr/trac/ticket/7039

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 8 months ago (2016-04-11 21:05:02 UTC) #14

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/ and intl/ that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582
    http://unicode.org/cldr/trac/ticket/6921
    http://unicode.org/cldr/trac/ticket/7039

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/ and intl/ that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

jungshik at Google

On 2016/04/11 17:47:30, jshin (jungshik at google) wrote: > https://codereview.chromium.org/1812673005/diff/300001/src/runtime/runtime-i18n.cc > File src/runtime/runtime-i18n.cc (right): > ...

4 years, 8 months ago (2016-04-12 22:52:34 UTC) #15

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-13 04:56:36 UTC) #16

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/440001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/440001

4 years, 8 months ago (2016-04-13 04:56:41 UTC) #17

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-13 05:11:02 UTC) #18

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_asan_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_asan_rel_ng/builds/34) v8_linux64_asan_rel_ng_triggered on ...

4 years, 8 months ago (2016-04-13 05:11:02 UTC) #19

jungshik at Google

PS #23 failed in a debug build because of heap allocation with DisAllowHeapAllocation in effect. ...

4 years, 8 months ago (2016-04-13 17:34:35 UTC) #20

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i18n.cc File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i18n.cc#newcode784 src/runtime/runtime-i18n.cc:784: DisallowHeapAllocation no_gc; On 2016/04/13 at 17:34:34, jshin (jungshik at ...

4 years, 8 months ago (2016-04-13 17:43:43 UTC) #21

jungshik at Google

On 2016/04/13 17:43:43, Dan Ehrenberg wrote: > https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i18n.cc > File src/runtime/runtime-i18n.cc (right): > > https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i18n.cc#newcode784 ...

4 years, 8 months ago (2016-04-14 00:23:18 UTC) #22

On 2016/04/13 17:43:43, Dan Ehrenberg wrote:
>
https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i1...
> File src/runtime/runtime-i18n.cc (right):
> 
>
https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i1...
> src/runtime/runtime-i18n.cc:784: DisallowHeapAllocation no_gc;
> On 2016/04/13 at 17:34:34, jshin (jungshik at google) wrote:
> > Getting this out of an isolated block to avoid an additional buffer copy [1]
> does not work (I forgot to try a debug build and thought that I can get away
> with it) because there's a heap allocation down the road. 
> > 
> > I'll see if there's a way around it after going back to the previous patch
> set. 
> > 
> > A question to v8 folks: what's the impact of |DisallowHeapAllocation| on
heap
> allocation in a third_party library (e.g. ICU)? Can bad things happen? BTW,
does
> SmartArrayPointer<uc16) behave differently when it's inside
> |DisallowHeapAllocation| block?  
> > 
> > [1] I also have to make sure that additional buffer copy is indeed the
reason
> for slow down.
> 
> I believe DisallowHeapAllocation's function is to add a DCHECK that fires when
> allocation occurs/may occur from C++ (grep around for
> DCHECK(AllowHeapAllocation::IsAllowed()); ). It won't actually change the
nature
> of allocations.

Sorry that I was not clear. I did see DCHECK(!AllowHeapAllocation::IsAllowed())
at the top of GetFlatContent().
My question was if allocation outside v8 such as ICU and 'NewArray' (in v8) is
something to worry about in terms 
of gc triggering even though obviously ICU not know about DisAllowHeapAllocation
and will not dcheck for it.
How about NewArray used in SmartArrayPointer for uc16?  It seems that it's ok.  

Related to that and adamk's earlier comment:  I have a question up in my
daughter/experimental cl at
https://codereview.chromium.org/1875263006

It'd be great if my question there can be answered. Thanks

Dan Ehrenberg

On 2016/04/14 at 00:23:18, jshin wrote: > On 2016/04/13 17:43:43, Dan Ehrenberg wrote: > > ...

4 years, 8 months ago (2016-04-14 02:30:12 UTC) #23

On 2016/04/14 at 00:23:18, jshin wrote:
> On 2016/04/13 17:43:43, Dan Ehrenberg wrote:
> >
https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i1...
> > File src/runtime/runtime-i18n.cc (right):
> > 
> >
https://codereview.chromium.org/1812673005/diff/440001/src/runtime/runtime-i1...
> > src/runtime/runtime-i18n.cc:784: DisallowHeapAllocation no_gc;
> > On 2016/04/13 at 17:34:34, jshin (jungshik at google) wrote:
> > > Getting this out of an isolated block to avoid an additional buffer copy
[1]
> > does not work (I forgot to try a debug build and thought that I can get away
> > with it) because there's a heap allocation down the road. 
> > > 
> > > I'll see if there's a way around it after going back to the previous patch
> > set. 
> > > 
> > > A question to v8 folks: what's the impact of |DisallowHeapAllocation| on
heap
> > allocation in a third_party library (e.g. ICU)? Can bad things happen? BTW,
does
> > SmartArrayPointer<uc16) behave differently when it's inside
> > |DisallowHeapAllocation| block?  
> > > 
> > > [1] I also have to make sure that additional buffer copy is indeed the
reason
> > for slow down.
> > 
> > I believe DisallowHeapAllocation's function is to add a DCHECK that fires
when
> > allocation occurs/may occur from C++ (grep around for
> > DCHECK(AllowHeapAllocation::IsAllowed()); ). It won't actually change the
nature
> > of allocations.
> 
> Sorry that I was not clear. I did see
DCHECK(!AllowHeapAllocation::IsAllowed()) at the top of GetFlatContent().
> My question was if allocation outside v8 such as ICU and 'NewArray' (in v8) is
something to worry about in terms 
> of gc triggering even though obviously ICU not know about
DisAllowHeapAllocation and will not dcheck for it.
> How about NewArray used in SmartArrayPointer for uc16?  It seems that it's ok.
 

Yes, I don't think ICU would call into V8 to tell whether this flag is set.
> 
> Related to that and adamk's earlier comment:  I have a question up in my
daughter/experimental cl at
> https://codereview.chromium.org/1875263006
> 
> It'd be great if my question there can be answered. Thanks

jungshik at Google

PS #24 and PS #25 do not have a crash issue that I have with ...

4 years, 8 months ago (2016-04-14 09:05:09 UTC) #24

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-15 09:45:46 UTC) #25

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/540001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/540001

4 years, 8 months ago (2016-04-15 09:45:58 UTC) #26

jungshik at Google

https://codereview.chromium.org/1812673005/diff/540001/src/js/i18n.js File src/js/i18n.js (right): https://codereview.chromium.org/1812673005/diff/540001/src/js/i18n.js#newcode2082 src/js/i18n.js:2082: OverrideFunction(GlobalString.prototype, 'toLowerCase', function() { Dan, can you give me ...

4 years, 8 months ago (2016-04-15 09:51:30 UTC) #27

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-15 10:03:19 UTC) #28

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux_nodcheck_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel_ng/builds/207) v8_linux_nodcheck_rel_ng_triggered on ...

4 years, 8 months ago (2016-04-15 10:03:20 UTC) #29

jungshik at Google

On 2016/04/15 10:03:20, commit-bot: I haz the power wrote: > Dry run: Try jobs failed ...

4 years, 8 months ago (2016-04-15 17:50:05 UTC) #30

Dan Ehrenberg

On 2016/04/15 at 17:50:05, jshin wrote: > On 2016/04/15 10:03:20, commit-bot: I haz the power ...

4 years, 8 months ago (2016-04-15 18:02:30 UTC) #31

jungshik at Google

On 2016/04/15 18:02:30, Dan Ehrenberg wrote: > On 2016/04/15 at 17:50:05, jshin wrote: Thanks, Dan, ...

4 years, 8 months ago (2016-04-15 19:13:26 UTC) #32

Dan Ehrenberg

On 2016/04/15 at 19:13:26, jshin wrote: > On 2016/04/15 18:02:30, Dan Ehrenberg wrote: > > ...

4 years, 8 months ago (2016-04-15 20:25:04 UTC) #33

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-18 22:47:48 UTC) #34

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/560001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/560001

4 years, 8 months ago (2016-04-18 22:47:57 UTC) #35

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-18 23:04:09 UTC) #36

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_avx2_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_avx2_rel_ng/builds/333) v8_linux64_avx2_rel_ng_triggered on ...

4 years, 8 months ago (2016-04-18 23:04:10 UTC) #37

jungshik at Google

On 2016/04/15 20:25:04, Dan Ehrenberg wrote: > On 2016/04/15 at 19:13:26, jshin wrote: > https://build.chromium.org/p/tryserver.v8/builders/v8_win_nosnap_shared_rel_ng_triggered/builds/272/steps/Check/logs/string-capitalization ...

4 years, 8 months ago (2016-04-19 00:13:32 UTC) #38

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-19 00:13:47 UTC) #39

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/600001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/600001

4 years, 8 months ago (2016-04-19 00:13:59 UTC) #40

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-19 00:44:31 UTC) #41

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-19 00:44:32 UTC) #42

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-19 08:51:00 UTC) #43

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/620001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/620001

4 years, 8 months ago (2016-04-19 08:51:11 UTC) #44

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-19 09:16:03 UTC) #45

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-19 09:16:04 UTC) #46

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-20 09:22:54 UTC) #47

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/640001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/640001

4 years, 8 months ago (2016-04-20 09:22:56 UTC) #48

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-20 09:47:17 UTC) #49

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-20 09:47:19 UTC) #50

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 8 months ago (2016-04-20 17:33:49 UTC) #51

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/ and intl/ that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/Strings/* and intl/* that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=  test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case
==========

jungshik at Google

Hi Dan, Adam, and Yang Can you take a look? Putting this behind the flag ...

4 years, 8 months ago (2016-04-20 21:29:22 UTC) #52

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js File src/js/i18n.js (right): https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode728 src/js/i18n.js:728: if ((typeof localeID !== 'string' && typeof localeID !== ...

4 years, 8 months ago (2016-04-20 22:01:30 UTC) #53

jungshik at Google

Thanks a lot for the review. I addressed your comments. One remaining question is how ...

4 years, 8 months ago (2016-04-21 20:39:17 UTC) #54

Thanks a lot for the review. I addressed your comments. 

One remaining question is how to pass a conversion locale between JS and
runtime. Passing a string (out of 4 or 5 if 'und' is included) is more readable
/ maintainable, but I have a concern about wrapping and unwrapping a string. Do
you have

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js
File src/js/i18n.js (right):

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode728
src/js/i18n.js:728: if ((typeof localeID !== 'string' && typeof localeID !==
'object') ||
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> Nit: As long as you're changing this code, could you use IS_STRING(localeID),
> and IS_RECEIVER(localeID) to be more idiomatic (and slightly more correct)?

Done.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode735
src/js/i18n.js:735: if (typeof localeID === 'string' &&
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> IS_STRING

Done.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2000: function getCaseConversionLanguageId(locales) {
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> Generally, the v8 convention is CamelCase with an initial capital letter.
Let's
> use that for new code.

Done.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2007: } else if (typeof locales === 'string') {
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> IS_STRING

Done.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2014: // StringSplit is slwoer than this.
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> slower

Typo fixed. 

> Consider factoring this out into a utility function.

You mean  "var pos ..  ; if (pos != -1) .... " ? Like this? 

function GetLanguagePart(locale) {
  // StringSplit is slower than this.
  var pos = %_Call(StringIndexOf, locale, '-');
  return pos == -1 ? locale :
      %_Call(StringSubstring, locale, 0, pos);
}

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2019: var CUSTOM_CASE_LANGUAGES = ['az', 'el', 'lt', 'tr'];
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> Could we somehow query this from ICU? I thought there were more languages that
> had custom case conversion, like French for removing accents over vowels when
> going uppercase.
> 

Some typesetting systems in the past couldn't deal with uppercase with
diacritics, but that's not the case any more. The French Academy is very clear
about what to do about them. (do not drop diacritics). Well, some native French
speakers get confused, too. :-) 

Anyway, what languages are subject to special casings comes from the Unicode
data files, but 
CLDR/ICU does not have that list exposed via API.  

Unicode's SpecialCasing.txt has 'tr', 'az' and 'lt'. 'el' is not there yet.
ECMAScript 402 only lists 'tr', 'az' and 'lt' at the moment citing
SpecialCasing.txt at unicode.org (I'll file a bug against the spec), but 'el'
certainly needs a tailored case mapping. (Blink and Gecko do that, already for
CSS text-transform) per strong demand from Greek speakers (Mozilla's
implementation is more sophisticated than CLDR/ICU's. CLDR/ICU is looking into
revising it). 

> Is there a way to avoid having this parallel array in JS and C++? For example,
> what if we just pass the string into the runtime, and let that interpret it.

I hate parallel arrays, too !! OTOH, I want to avoid passing a string from JS to
C++ unless there's an efficient way to wrap and unwrap it (almost as efficient
as passing an integer).   

StringNormalize does the same (passing integer instead of 'NFC', 'NFKC', "NFD',
'NFKD'). If there's an efficient way to pass one of a small list of strings
between JS and runtime, I'd for sure use that.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2023: function localeConvertCase(s, locales, isToUpper) {
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> LocaleConvertCase

Done.

https://codereview.chromium.org/1812673005/diff/640001/src/js/i18n.js#newcode...
src/js/i18n.js:2088: return %StringToLowerCaseI18N(s);
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> ECMA262 seems to specify using the root locale in
> https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase ;

Right.

>  however, the
> implementation seems to pass "" to ICU, which would use the current default
> locale, right?

"" for root locale and NULL for the default locale :-)

@param locale    The locale to consider, or "" for the root locale or NULL for
the default locale.

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:788: if (V8_UNLIKELY(locale_id == 1 && is_to_upper))
{
On 2016/04/20 22:01:30, Dan Ehrenberg wrote:
> Maybe make an enum for these ids, like
> 
> enum LocaleID {
>   ROOT = -1,
>   AZERI = 0,
>   GREEK = 1,
>   ...
> };
> 
> static const char* conversion_locales[] = {
>   ...,
>   [GREEK] = "el",
>   ...
> };

Is it ok to use a C99 feature?  I have to add '-Wno-c99-extensions' to gyp file.

This is related to your comment in i18n.js. If a string is used for locale, I
don't have to worry about this. As I replied there, I have a perf concern (maybe
I have to measure).

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:814: case_conversion_fn fn = is_to_upper ?
u_strToUpper : u_strToLower;
On 2016/04/20 22:01:30, Dan Ehrenberg wrote:
> Nit: Since this is just used locally, maybe a good case for auto?

Thanks. Done

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:815: const char* locale = locale_id == -1 ? "" :
conversion_locales[locale_id];
On 2016/04/20 22:01:30, Dan Ehrenberg wrote:
> The use of the root locale seems appropriate here for
> String.prototype.toUpperCase. However, if the default locale is different from
> the root locale with respect to case mapping, do we know that it will be
> included in the set of four languages which is included in this code?

Again, "" means root locale and NULL means the default locale. :-)

As toLocale{U,L}Case without an argument, it's the default locale. And, if the
default locale happens to be one of 4, that's already taken care in JS code.

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:822: // twice (overflow).
On 2016/04/20 22:01:30, Dan Ehrenberg wrote:
> Any way we could include a DCHECK to ensure it doesn't run more than twice?

This is the second time I got that question in a month :-). (the first one was
for a  sqlite CL). This is a pretty common pattern used multiple times in both
Chromium and ICU itself.  Anyway, I'm  turning do-while to for loop (i=0 ; i <
2; ++i).

https://codereview.chromium.org/1812673005/diff/640001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:847: SeqString::Truncate(result, dest_length));
On 2016/04/20 22:01:29, Dan Ehrenberg wrote:
> Do you have a test which hits this case?

intl/general/case-mapping.js has several (Greek and Turkic drop diacritic
marks). test262/intl402/Strings/* has some, too.

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-22 22:57:10 UTC) #55

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/680001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/680001

4 years, 8 months ago (2016-04-22 22:57:16 UTC) #56

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-22 22:58:45 UTC) #57

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux_arm_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm_rel_ng/builds/632) v8_linux_dbg_ng on ...

4 years, 8 months ago (2016-04-22 22:58:47 UTC) #58

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 8 months ago (2016-04-23 00:04:57 UTC) #59

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/700001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/700001

4 years, 8 months ago (2016-04-23 00:05:01 UTC) #60

jungshik at Google

On 2016/04/21 20:39:17, jshin (jungshik at google) wrote: > Thanks a lot for the review. ...

4 years, 8 months ago (2016-04-23 00:12:11 UTC) #61

jungshik at Google

Another question: If this has to be put behind a flag (icu_case_mapping), how can I ...

4 years, 8 months ago (2016-04-23 00:14:32 UTC) #62

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 8 months ago (2016-04-23 00:46:09 UTC) #63

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-23 00:46:11 UTC) #64

Dan Ehrenberg

To make loading some code conditional on a runtime flag, follow the pattern of --promise-extras. ...

4 years, 8 months ago (2016-04-25 21:10:11 UTC) #65

jungshik at Google

Thanks, Dan, for the comment and pointers. Micro-benchmark results are up at https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_HGSbrg/edit?usp=sharing ASCII-only lower/uppercasing ...

4 years, 8 months ago (2016-04-26 23:07:34 UTC) #66

Thanks, Dan, for the comment and pointers. 

Micro-benchmark results are up at 
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...

ASCII-only lower/uppercasing is about the same speed as before except that
lowercasing 'lowercase ascii' is 
60% because it just returns the input. 

Latin-1 only lower/uppercasing (including sharp-s) is 1.6 ~ 2.6 times as fast as
before. 

Full-unicode lower/uppercasing (non-latin 1):  This CL is better in terms of
correctness at the expense of 
reduced speed (50~60% of Unibrow). What's curious is that my CL (using
FlatContent) is a bit slower (80~95% of Dan's)
for a short string. As input strings longer, the difference gets smaller between
1. UniBrow : Mine 2, Dan's : Mine.
Apparently, this CL has a higher 'fixed' cost 
Using FlatContent will also help when an input string is nested cons. 

On 2016/04/25 21:10:11, Dan Ehrenberg wrote:
> To make loading some code conditional on a runtime flag, follow the pattern of
> --promise-extras. See src/bootstrapper.cc for a simple example of how flags
can
> be used to load additional JS files. These files are not compiled into the
> snapshot, so for some performance sensitive code (not sure whether or not this
> counts) the functions can be created in JS code from the main snapshot and
just
> installed on the builtins conditionally at runtime. Be sure to update both
build
> files to make the new JS file available for use.

Thank you for the pointer. It's unfortunate that I need a separate JS file and a
bit involved steps. I'm afraid that
having a separate JS file (from i18n.js) will lead to some code duplication (for
some utilities I'm using from i18n.js). 

I wonder if we want to treat this as a 'feature'. Isn't it more like a 'bug fix'
(as you 
mentioned before :-))?  If this is a bug fix, do we really want to take all
those steps? 

>
https://codereview.chromium.org/1812673005/diff/700001/test/test262/test262.s...
> File test/test262/test262.status (left):
> 
>
https://codereview.chromium.org/1812673005/diff/700001/test/test262/test262.s...
> test/test262/test262.status:159:
> 'intl402/String/prototype/toLocaleUpperCase/special_casing_Turkish': [FAIL],
> Good to see that all of these tests are actually fixed by your patch! But
let's
> revert this change when the testcfg.py change is reverted. We can do this when
> --icu-case-mapping is staged.
> 
> https://codereview.chromium.org/1812673005/diff/700001/test/test262/testcfg.py
> File test/test262/testcfg.py (right):
> 
>
https://codereview.chromium.org/1812673005/diff/700001/test/test262/testcfg.p...
> test/test262/testcfg.py:134: ["--icu_case_mapping"] +
> We usually don't add flags this way. Instead, we wait for the feature to be
> staged, at which point it'll be turned on that way. The staging patch is the
one
> which should update test262.status, rather than this one.

Thanks. I left it alone for now because I had a reservation about putting this
behind the flag. 

It'd be an easy call if perf was better across the board (instead of 'full
unicode' being slower while more spec-compliant). 

Anyway, what would you saying about going forward without a flag? Thanks

Dan Ehrenberg

On 2016/04/26 at 23:07:34, jshin wrote: > Thanks, Dan, for the comment and pointers. > ...

4 years, 8 months ago (2016-04-26 23:39:04 UTC) #67

On 2016/04/26 at 23:07:34, jshin wrote:
> Thanks, Dan, for the comment and pointers. 
> 
> 
> Micro-benchmark results are up at 
>
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
> 
> ASCII-only lower/uppercasing is about the same speed as before except that
lowercasing 'lowercase ascii' is 
> 60% because it just returns the input. 
> 
> Latin-1 only lower/uppercasing (including sharp-s) is 1.6 ~ 2.6 times as fast
as before. 
> 
> Full-unicode lower/uppercasing (non-latin 1):  This CL is better in terms of
correctness at the expense of 
> reduced speed (50~60% of Unibrow). What's curious is that my CL (using
FlatContent) is a bit slower (80~95% of Dan's)
> for a short string. As input strings longer, the difference gets smaller
between 1. UniBrow : Mine 2, Dan's : Mine.
> Apparently, this CL has a higher 'fixed' cost 
> Using FlatContent will also help when an input string is nested cons. 

If this is slower for that case, it could be a good idea to look into making use
of Unibrow's caching layer. See the Mapping class in src/unicode.h .
> 
> On 2016/04/25 21:10:11, Dan Ehrenberg wrote:
> > To make loading some code conditional on a runtime flag, follow the pattern
of
> > --promise-extras. See src/bootstrapper.cc for a simple example of how flags
can
> > be used to load additional JS files. These files are not compiled into the
> > snapshot, so for some performance sensitive code (not sure whether or not
this
> > counts) the functions can be created in JS code from the main snapshot and
just
> > installed on the builtins conditionally at runtime. Be sure to update both
build
> > files to make the new JS file available for use.
> 
> 
> Thank you for the pointer. It's unfortunate that I need a separate JS file and
a bit involved steps. I'm afraid that
> having a separate JS file (from i18n.js) will lead to some code duplication
(for some utilities I'm using from i18n.js). 
> 
> I wonder if we want to treat this as a 'feature'. Isn't it more like a 'bug
fix' (as you 
> mentioned before :-))?  If this is a bug fix, do we really want to take all
those steps? 

The question in my mind is, is this a bigger change with significant risk, and
as such, do we want to flag it so that we can back it out if needed? I think the
answer is yes here--this changes user-visible behavior in more than one way, and
also changes performance properties.

I don't think it's that much work, and it's something that you'd have to learn
anyway if you ever implement new features in V8 (for example, ECMA 402 v3
includes a new Intl.getCanonicalLocales() method). Here's an example patch which
uses this technique: https://codereview.chromium.org/1469543003 .
> 
> 
> >
https://codereview.chromium.org/1812673005/diff/700001/test/test262/test262.s...
> > File test/test262/test262.status (left):
> > 
> >
https://codereview.chromium.org/1812673005/diff/700001/test/test262/test262.s...
> > test/test262/test262.status:159:
> > 'intl402/String/prototype/toLocaleUpperCase/special_casing_Turkish': [FAIL],
> > Good to see that all of these tests are actually fixed by your patch! But
let's
> > revert this change when the testcfg.py change is reverted. We can do this
when
> > --icu-case-mapping is staged.
> > 
> >
https://codereview.chromium.org/1812673005/diff/700001/test/test262/testcfg.py
> > File test/test262/testcfg.py (right):
> > 
> >
https://codereview.chromium.org/1812673005/diff/700001/test/test262/testcfg.p...
> > test/test262/testcfg.py:134: ["--icu_case_mapping"] +
> > We usually don't add flags this way. Instead, we wait for the feature to be
> > staged, at which point it'll be turned on that way. The staging patch is the
one
> > which should update test262.status, rather than this one.
> 
> Thanks. I left it alone for now because I had a reservation about putting this
behind the flag. 
> 
> It'd be an easy call if perf was better across the board (instead of 'full
unicode' being slower while more spec-compliant). 
> 
> Anyway, what would you saying about going forward without a flag? Thanks

I'd personally prefer a flag, but maybe others reading this thread have another
opinion.

Yang

yangguo@chromium.org changed reviewers: + yangguo@chromium.org

4 years, 7 months ago (2016-04-27 08:06:43 UTC) #68

Yang

Very nice patch. I have some comments though. https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js File src/js/i18n.js (right): https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js#newcode2015 src/js/i18n.js:2015: if ...

4 years, 7 months ago (2016-04-27 08:06:45 UTC) #69

jungshik at Google

On 2016/04/26 23:39:04, Dan Ehrenberg wrote: > On 2016/04/26 at 23:07:34, jshin wrote: > > ...

4 years, 7 months ago (2016-04-28 07:43:06 UTC) #70

On 2016/04/26 23:39:04, Dan Ehrenberg wrote:
> On 2016/04/26 at 23:07:34, jshin wrote:
> > Thanks, Dan, for the comment and pointers. 
> > 
> > 
> > Micro-benchmark results are up at 
> >
>
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
> > 
> > ASCII-only lower/uppercasing is about the same speed as before except that
> lowercasing 'lowercase ascii' is 
> > 60% because it just returns the input. 
> > 
> > Latin-1 only lower/uppercasing (including sharp-s) is 1.6 ~ 2.6 times as
fast
> as before. 
> > 
> > Full-unicode lower/uppercasing (non-latin 1):  This CL is better in terms of
> correctness at the expense of 
> > reduced speed (50~60% of Unibrow). What's curious is that my CL (using
> FlatContent) is a bit slower (80~95% of Dan's)
> > for a short string. As input strings longer, the difference gets smaller
> between 1. UniBrow : Mine 2, Dan's : Mine.
> > Apparently, this CL has a higher 'fixed' cost 
> > Using FlatContent will also help when an input string is nested cons. 

I re-ran the benchmark with PS #39 and updated the spreadsheet. 
There's some improvement. Not using strcmp helped quite a lot. (I noticed this
by running 
v8 profiler).  Also using the hardcoded result for Latin-1 lowercasing helps,
too. 
And, 'greek sigma-sigma' lowercasing (the second one is turned to final sigma)
result was completely off. 
In this case (where capital sigma=> lowercase final sigma is involved), ICU is
faster than Unibrow. 

 
> If this is slower for that case, it could be a good idea to look into making
use
> of Unibrow's caching layer. See the Mapping class in src/unicode.h .

I'm not sure I can. The input string is passed to ICU and what happens inside is
up to ICU. 

By fixed cost, I mean that the cost seems to be |a * n + b| where 'b' is a fixed
cost and 'n' is # of input
characters. 

> > 
> > On 2016/04/25 21:10:11, Dan Ehrenberg wrote:
> > > To make loading some code conditional on a runtime flag, follow the
pattern
> of
> > > --promise-extras. See src/bootstrapper.cc for a simple example of how
flags
> can
> > > be used to load additional JS files. These files are not compiled into the
> > > snapshot, so for some performance sensitive code (not sure whether or not
> this
> > > counts) the functions can be created in JS code from the main snapshot and
> just
> > > installed on the builtins conditionally at runtime. Be sure to update both
> build
> > > files to make the new JS file available for use.
> > 
> > 
> > Thank you for the pointer. It's unfortunate that I need a separate JS file
and
> a bit involved steps. I'm afraid that
> > having a separate JS file (from i18n.js) will lead to some code duplication
> (for some utilities I'm using from i18n.js). 
> > 
> > I wonder if we want to treat this as a 'feature'. Isn't it more like a 'bug
> fix' (as you 
> > mentioned before :-))?  If this is a bug fix, do we really want to take all
> those steps? 
> 
> The question in my mind is, is this a bigger change with significant risk, and
> as such, do we want to flag it so that we can back it out if needed? I think
the
> answer is yes here--this changes user-visible behavior in more than one way,
and
> also changes performance properties.
> 
> I don't think it's that much work, and it's something that you'd have to learn
> anyway if you ever implement new features in V8 (for example, ECMA 402 v3
> includes a new Intl.getCanonicalLocales() method). Here's an example patch
which
> uses this technique: https://codereview.chromium.org/1469543003 .

Thank you for the pointer. I'll take a look. 

I have an idea. Let's put to{Lower,Upper}Case implemented with ICU behind the
flag but
make toLocale{Lower,Upper}Case available regardless of the flag. In that case, I
don't have to 
worry about the code duplication between i18n.js and a new JS file. 

After that, I can port FastASCII path from Unibrow (word-by-word
upper/lowercasing for ASCII). 

What do you think of this split approach?

jungshik at Google

Addressed and replied to Yang's comment. Thanks ! https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js File src/js/i18n.js (right): https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js#newcode2015 src/js/i18n.js:2015: if ...

4 years, 7 months ago (2016-04-28 10:50:10 UTC) #71

Addressed and replied to Yang's comment. Thanks !

https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js
File src/js/i18n.js (right):

https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js#newcode...
src/js/i18n.js:2015: if (pos != -1)
On 2016/04/27 08:06:45, Yang wrote:
> Can we use V8's coding style where line breaks in an if-statement requires
using
> {}-brackets?

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/js/i18n.js#newcode...
src/js/i18n.js:2021: return isToUpper ? %StringToUpperCaseI18N(s) :
%StringToLowerCaseI18N(s);
On 2016/04/27 08:06:45, Yang wrote:
> ditto.

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:787: String::FlatContent flat = s->GetFlatContent();
On 2016/04/27 08:06:45, Yang wrote:
> I don't see the string being flattened anywhere leading up to this. So the
> string could be a cons string that has not been flattened yet. In which case
the
> DCHECK below would fail.

Thank you for pointing this out. Actually, two of three callers of this function
flatten |s| before calling it, but 
I forgot to flatten it in the 3rd caller. The reason for flattening before
calling this function is that two callers need to use flattened string for
faster path before falling back to this full-unicode path. 

I fixed the 3rd caller (Runtime_StringLocaleConvertCase). 

> 
> Can you add a test case for this? 

I'll. (I tested some cons strings locally but I guess I didn't try it with
toLocale{L,U}Case that goes through
the 3rd caller where I forgot to flatten the input. 

> Let's just always
> - Flatten the string
> - Extract the flat content
> - Copy the flat content into a uc16 buffer (managed by smart pointer) via
> CopyChars, regardless of whether it's one-byte or two-byte. This way, we can
> avoid having to call fastCopyFrom, which copies the buffer yet again.

Thanks for the suggestion. 

BTW, fastCopyFrom is cheap if icu::UnicodeString aliases 'buffer'. The first
argument  in UnicodeString ctor is for that: See
http://icu-project.org/apiref/icu4c/classicu_1_1UnicodeString.html#a2bd1d1822...
). Nonetheless, 
I found an even better way (setTo() method). Either way, transliterator-based
conversion is so slow that this
change will make little difference. 

> - Construct the UnicodeString from that buffer.

I did a slight variation of your suggestion. 

> 
> Let's not use ToWideCString. There is no other uses of it, and I think we
should
> remove it. It's not particularly performant, using StringCharacterStream.

Thanks again. Now it's gone !

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:800:
isolate->factory()->NewStringFromTwoByte(Vector<const uint16_t>(
On 2016/04/27 08:06:45, Yang wrote:
> This means that any case conversion with "problematic" locales is going to
> produce a two-byte string even if the old string was not. Do we expect this
> memory bloat to cause issues?

That's a good point. If toLocale{L,U}Case(<no argument>) is used when the
default locale is one of the four (or 
they're called with one of the 4 locales), there'll be unexpected memory use
increase even for ASCII input. 

I added TODO comments here and Runtime_StringLocaleConvertCase) for now.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:818: String::FlatContent flat = s->GetFlatContent();
On 2016/04/27 08:06:45, Yang wrote:
> We need to make sure s is flattened at this point before trying to get the
flat
> content.

Yup. See the comment above to the same question.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:820: if (flat.IsTwoByte())
On 2016/04/27 08:06:45, Yang wrote:
> add brackets here.

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:833: // dest_length == result->length()
On 2016/04/27 08:06:45, Yang wrote:
> Can we make this a DCHECK?

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:836: // dest_length < result->length()
On 2016/04/27 08:06:45, Yang wrote:
> Also make this a DCHECK.

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:1023: if (V8_UNLIKELY(!is_result_single_byte))
On 2016/04/27 08:06:45, Yang wrote:
> add brackets please.

Done.

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:1035: if (flat.IsOneByte())
On 2016/04/27 08:06:45, Yang wrote:
> brackets

Done.

Yang

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i18n.cc File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i18n.cc#newcode787 src/runtime/runtime-i18n.cc:787: String::FlatContent flat = s->GetFlatContent(); On 2016/04/28 10:50:09, jshin (jungshik ...

4 years, 7 months ago (2016-04-28 12:52:26 UTC) #72

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
src/runtime/runtime-i18n.cc:787: String::FlatContent flat = s->GetFlatContent();
On 2016/04/28 10:50:09, jshin (jungshik at google) wrote:
> On 2016/04/27 08:06:45, Yang wrote:
> > I don't see the string being flattened anywhere leading up to this. So the
> > string could be a cons string that has not been flattened yet. In which case
> the
> > DCHECK below would fail.
> 
> Thank you for pointing this out. Actually, two of three callers of this
function
> flatten |s| before calling it, but 
> I forgot to flatten it in the 3rd caller. The reason for flattening before
> calling this function is that two callers need to use flattened string for
> faster path before falling back to this full-unicode path. 
> 
> I fixed the 3rd caller (Runtime_StringLocaleConvertCase). 
> 
> 
> > 
> > Can you add a test case for this? 
> 
> I'll. (I tested some cons strings locally but I guess I didn't try it with
> toLocale{L,U}Case that goes through
> the 3rd caller where I forgot to flatten the input. 
> 
> 
> > Let's just always
> > - Flatten the string
> > - Extract the flat content
> > - Copy the flat content into a uc16 buffer (managed by smart pointer) via
> > CopyChars, regardless of whether it's one-byte or two-byte. This way, we can
> > avoid having to call fastCopyFrom, which copies the buffer yet again.
> 
> Thanks for the suggestion. 
> 
> BTW, fastCopyFrom is cheap if icu::UnicodeString aliases 'buffer'. The first
> argument  in UnicodeString ctor is for that: See
>
http://icu-project.org/apiref/icu4c/classicu_1_1UnicodeString.html#a2bd1d1822...
> ). Nonetheless, 
> I found an even better way (setTo() method). Either way, transliterator-based
> conversion is so slow that this
> change will make little difference. 
> 
> 
> > - Construct the UnicodeString from that buffer.
> 
> I did a slight variation of your suggestion. 
> 
> > 
> > Let's not use ToWideCString. There is no other uses of it, and I think we
> should
> > remove it. It's not particularly performant, using StringCharacterStream.
> 
> Thanks again. Now it's gone !

I did check the description of fastCopyFrom. In either case, the unicode string
must not alias the original string buffer (which is what would make it fast),
since that would change the original string (src points to the orignal backing
of the two-byte string). So we just might as well make a copy explicitly and not
rely on ICU to do it correctly implicitly.

jungshik at Google

On 2016/04/28 12:52:26, Yang wrote: > https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i18n.cc > File src/runtime/runtime-i18n.cc (right): > > https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i18n.cc#newcode787 > ...

4 years, 7 months ago (2016-04-28 20:19:33 UTC) #73

On 2016/04/28 12:52:26, Yang wrote:
>
https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
> File src/runtime/runtime-i18n.cc (right):
> 
>
https://codereview.chromium.org/1812673005/diff/720001/src/runtime/runtime-i1...
> src/runtime/runtime-i18n.cc:787: String::FlatContent flat =
s->GetFlatContent();
> On 2016/04/28 10:50:09, jshin (jungshik at google) wrote:
> > On 2016/04/27 08:06:45, Yang wrote:
> > > I don't see the string being flattened anywhere leading up to this. So the
> > > string could be a cons string that has not been flattened yet. In which
case
> > the
> > > DCHECK below would fail.
> > 
> > Thank you for pointing this out. Actually, two of three callers of this
> function
> > flatten |s| before calling it, but 
> > I forgot to flatten it in the 3rd caller. The reason for flattening before
> > calling this function is that two callers need to use flattened string for
> > faster path before falling back to this full-unicode path. 
> > 
> > I fixed the 3rd caller (Runtime_StringLocaleConvertCase). 
> > 
> > 
> > > 
> > > Can you add a test case for this? 
> > 
> > I'll. (I tested some cons strings locally but I guess I didn't try it with
> > toLocale{L,U}Case that goes through
> > the 3rd caller where I forgot to flatten the input. 
> > 
> > 
> > > Let's just always
> > > - Flatten the string
> > > - Extract the flat content
> > > - Copy the flat content into a uc16 buffer (managed by smart pointer) via
> > > CopyChars, regardless of whether it's one-byte or two-byte. This way, we
can
> > > avoid having to call fastCopyFrom, which copies the buffer yet again.
> > 
> > Thanks for the suggestion. 
> > 
> > BTW, fastCopyFrom is cheap if icu::UnicodeString aliases 'buffer'. The first
> > argument  in UnicodeString ctor is for that: See
> >
>
http://icu-project.org/apiref/icu4c/classicu_1_1UnicodeString.html#a2bd1d1822...
> > ). Nonetheless, 
> > I found an even better way (setTo() method). Either way,
transliterator-based
> > conversion is so slow that this
> > change will make little difference. 
> > 
> > 
> > > - Construct the UnicodeString from that buffer.
> > 
> > I did a slight variation of your suggestion. 
> > 
> > > 
> > > Let's not use ToWideCString. There is no other uses of it, and I think we
> > should
> > > remove it. It's not particularly performant, using StringCharacterStream.
> > 
> > Thanks again. Now it's gone !
> 
> I did check the description of fastCopyFrom. In either case, the unicode
string
> must not alias the original string buffer (which is what would make it fast),
> since that would change the original string (src points to the orignal backing
> of the two-byte string). So we just might as well make a copy explicitly and
not
> rely on ICU to do it correctly implicitly.

Thanks. I'll go with it. Doing it explicitly in v8 would  make it easier to read
the code (no more wondering as to
what's gonna happen to the aliased buffer) while not making any practical
difference (copying is
gonna happen anyway when the transliteration is about to be done over the input.
It's just a matter of who does it - v8 or ICU).

jungshik at Google

jshin@chromium.org changed reviewers: + jochen@chromium.org

4 years, 7 months ago (2016-04-29 06:23:15 UTC) #74

jungshik at Google

+jochen for v8.gyp changes. Jochen, could you enlighten me as to why icu-case-mapping.js (that is ...

4 years, 7 months ago (2016-04-29 06:23:17 UTC) #75

+jochen for v8.gyp changes. 
Jochen, could you enlighten me as to why icu-case-mapping.js (that is added to
experimental_library_files ) is not listed when js2c is run ? 
i18n.js (added to library_files the same way) is listed when js2c is run. 

Dan, Yang and Adam: can you take another look? 

Dan: are you fine with just putting to{Lower,Upper}Case behind
--icu_case_mapping while making toLocale{L,U}Case available without the flag?

https://codereview.chromium.org/1812673005/diff/860001/test/intl/general/case...
File test/intl/general/case-mapping.js (right):

https://codereview.chromium.org/1812673005/diff/860001/test/intl/general/case...
test/intl/general/case-mapping.js:5: // Flags: --icu_case_mapping
This does not work. --icu_case_mapping is not used by tools/run_tests.py. 
I have to pass --icu_case_mapping via '--extra-flags' option. 

./tools/run-tests.py --buildbot --extra-flags=--icu_case_mapping
--arch-and-mode=x64.Release test
262/intl402/String/*

https://codereview.chromium.org/1812673005/diff/860001/test/test262/test262.s...
File test/test262/test262.status (left):

https://codereview.chromium.org/1812673005/diff/860001/test/test262/test262.s...
test/test262/test262.status:159:
'intl402/String/prototype/toLocaleUpperCase/special_casing_Turkish': [FAIL],
toLocale{U,L}Case should still pass without --icu_case_mapping because I didn't
pull them out of i18n.js (that is, they're not hidden behind the flag). 

OTOH, the ICU-based implementation of to{Upper,Lower}Case is hidden behind the
flag. Is there a way to 
mark them as passing when --icu_case_mapping is used?

https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp
File tools/gyp/v8.gyp (right):

https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp#newco...
tools/gyp/v8.gyp:2026: 'experimental_library_files':
['../../src/js/icu-case-mapping.js'],
For some reason, icu-case-mapping.js is never added to the list of js files to
be compiled by 'js2c' for
experimental-libraries.cc and libraries-experimental.bin. 

When I manually built those two files ( cc and bin) using the following commands
and d8 after that, d8 worked as expected. Without --icu_case_mapping, I got the
current
behavior and with --icu_case_mapping, I got a new correct behaviro for
to{Lower,Upper}Case. 

python ../tools/js2c.py ../out/Release/gen/experimental-libraries.cc
EXPERIMENTAL js/macros.py messages.h js/harmony-atomics.js
js/harmony-regexp-exec.js js/harmony-sharedarraybuffer.js js/harmony-simd.js
js/harmony-species.js js/harmony-unicode-regexps.js js/harmony-string-padding.js
js/promise-extra.js js/icu-case-mapping.js 

python ../tools/js2c.py ../out/Release/gen/experimental-libraries.cc
EXPERIMENTAL js/macros.py messages.h js/harmony-atomics.js
js/harmony-regexp-exec.js js/harmony-sharedarraybuffer.js js/harmony-simd.js
js/harmony-species.js js/harmony-unicode-regexps.js js/harmony-string-padding.js
js/promise-extra.js js/icu-case-mapping.js --startup_blob
../out/Release/gen/libraries-experimental.bin --nojs

I also wanted to try GN build, but it turned out that GN does not yet support a
standalone v8 build.

jungshik at Google

https://codereview.chromium.org/1812673005/diff/860001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/1812673005/diff/860001/src/flag-definitions.h#newcode184 src/flag-definitions.h:184: DEFINE_NEG_IMPLICATION(es_staging, icu_case_mapping) I noticed that https://codereview.chromium.org/1513873002/ has a comment ...

4 years, 7 months ago (2016-04-29 06:55:32 UTC) #76

jochen (gone - plz use gerrit)

https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp File tools/gyp/v8.gyp (right): https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp#newcode1971 tools/gyp/v8.gyp:1971: }, { this file doesn't exist anymore, it's now ...

4 years, 7 months ago (2016-04-29 06:58:09 UTC) #77

jungshik at Google

Thank you for the reply. On 2016/04/29 06:58:09, jochen wrote: > https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp > File tools/gyp/v8.gyp ...

4 years, 7 months ago (2016-04-29 07:23:30 UTC) #78

Thank you for the reply. 

On 2016/04/29 06:58:09, jochen wrote:
> https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp
> File tools/gyp/v8.gyp (right):
> 
>
https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp#newco...
> tools/gyp/v8.gyp:1971: }, {
> this file doesn't exist anymore, it's now in src/

It must be fairly recent. (I rebased earlier this week, IIRC). I'll rebase it
again. 

> 
>
https://codereview.chromium.org/1812673005/diff/860001/tools/gyp/v8.gyp#newco...
> tools/gyp/v8.gyp:2026: 'experimental_library_files':
> ['../../src/js/icu-case-mapping.js'],
> On 2016/04/29 at 06:23:17, jshin (jungshik at google) wrote:
> > For some reason, icu-case-mapping.js is never added to the list of js files
to
> be compiled by 'js2c' for
> > experimental-libraries.cc and libraries-experimental.bin. 
> > 
> > When I manually built those two files ( cc and bin) using the following
> commands and d8 after that, d8 worked as expected. Without --icu_case_mapping,
I
> got the current
> > behavior and with --icu_case_mapping, I got a new correct behaviro for
> to{Lower,Upper}Case. 
> > 
> > python ../tools/js2c.py ../out/Release/gen/experimental-libraries.cc
> EXPERIMENTAL js/macros.py messages.h js/harmony-atomics.js
> js/harmony-regexp-exec.js js/harmony-sharedarraybuffer.js js/harmony-simd.js
> js/harmony-species.js js/harmony-unicode-regexps.js
js/harmony-string-padding.js
> js/promise-extra.js js/icu-case-mapping.js 
> > 
> > python ../tools/js2c.py ../out/Release/gen/experimental-libraries.cc
> EXPERIMENTAL js/macros.py messages.h js/harmony-atomics.js
> js/harmony-regexp-exec.js js/harmony-sharedarraybuffer.js js/harmony-simd.js
> js/harmony-species.js js/harmony-unicode-regexps.js
js/harmony-string-padding.js
> js/promise-extra.js js/icu-case-mapping.js --startup_blob
> ../out/Release/gen/libraries-experimental.bin --nojs
> > 
> > I also wanted to try GN build, but it turned out that GN does not yet
support
> a standalone v8 build.
> 
> gyp is a bit peculiar here. I'd recommend to keep  i18n_library_files and add
> i18n_experimental_library_files

That's exactly what I actually did first (didn't upload that version here), but
it didn't work either.  After scratching head for a while, 
 I realized that v8.gyp is taking a rather winding road with i18n_library_files
when just appending to library_files can work (and much simpler). So, I tried it
for both 
library and experimental_library hoping that it would work for both cases, but
it only works with library_files. 

Anyway, I'll try it once more.

Yang

https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc File src/bootstrapper.cc (right): https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc#newcode203 src/bootstrapper.cc:203: DECLARE_FEATURE_INITIALIZATION(icu_case_mapping, "") Is this necessary? We declare it here, ...

4 years, 7 months ago (2016-04-29 07:38:47 UTC) #79

jungshik at Google

On 2016/04/29 07:23:30, jshin (jungshik at google) wrote: > On 2016/04/29 06:58:09, jochen wrote: > ...

4 years, 7 months ago (2016-04-29 08:42:40 UTC) #80

jungshik at Google

Yang, thanks for taking a look. I addressed your comments. https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc File src/bootstrapper.cc (right): https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc#newcode203 ...

4 years, 7 months ago (2016-04-29 18:03:23 UTC) #81

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-04-29 19:54:02 UTC) #82

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/940001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/940001

4 years, 7 months ago (2016-04-29 19:54:11 UTC) #83

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-04-29 19:59:01 UTC) #84

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_win64_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/6634) v8_win_rel_ng on ...

4 years, 7 months ago (2016-04-29 19:59:03 UTC) #85

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-04-29 21:25:16 UTC) #86

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/960001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/960001

4 years, 7 months ago (2016-04-29 21:25:27 UTC) #87

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-04-29 21:30:16 UTC) #88

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_win_compile_dbg on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/16811) v8_win_rel_ng on ...

4 years, 7 months ago (2016-04-29 21:30:18 UTC) #89

jungshik at Google

https://codereview.chromium.org/1812673005/diff/1000001/src/runtime/runtime-i18n.cc File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1812673005/diff/1000001/src/runtime/runtime-i18n.cc#newcode802 src/runtime/runtime-i18n.cc:802: converted.setTo(false, src, src_length); Yang, https://docs.google.com/spreadsheets/d/19FCu-uk48FT7NCvMxRWgsnrCc06iON79Uni4a8POiog/edit?usp=sharing compares various combinations of ...

4 years, 7 months ago (2016-04-29 23:41:43 UTC) #90

https://codereview.chromium.org/1812673005/diff/1000001/src/runtime/runtime-i...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/1000001/src/runtime/runtime-i...
src/runtime/runtime-i18n.cc:802: converted.setTo(false, src, src_length);
Yang, 

https://docs.google.com/spreadsheets/d/19FCu-uk48FT7NCvMxRWgsnrCc06iON79Uni4a...
compares various combinations of aliasing and copying when obtaining |src|
(GetUCharBufferFromFlat) and setting up the input UnicodeString with setTo(). 

Using read-alias for two-byte string when obtaining |src| and read-alias for
setTo minimizes # of copy operations. 

One outstanding question is whether the buffer pointed to by
|flat.ToUC16Vector()| will continue to be reliably available after |flat| goes
out of scope even though a flattened string (from which |flat| was obtained)
remains in the scope (and not gc'd away).

Adam thinks that it can still cause a problem (I asked him that question a week
or so ago) and I want to play safe. Therefore, I changed the CL to avoid that. 
I have since discovered that the implementation and usage pattern of
GetCharVector() ( https://goo.gl/EsgZXq ) kinda indicates otherwise. The way the
return value of GetCharVector() is used is almost identical to this one except
that in my case, I need to access the buffer even after DisallowHeapAllocation
is NOT in force any more. 

Anyway, if the way I use the buffer obtained via GetFlatContent() is safe [1],
the current way (using read-alias for both steps) is the best (although the
difference would be rather small relatively because Transltierator takes so
long). 

If not, either '1. alias 2. upfront copy' or '1. copy 2. alias' should be used.
'1. copy 2. alias' is what you suggested.  I prefer to use '1. alias 2. upfront
copy' (PS #47 ~ #49) because I'm using GetUCharBufferFromFlat() in non-Greek
code-path below (which is much more common) where '1. copy' is wasteful.

Well, I shouldn't tinker with this too much because 1) ICU 58 in 5 months  will
support el-Upper with a regular case conversion API (no more need for
special-casing) 2) the cost of transliteration is so expensive that saving one
copy does not help much. 

[1] I have yet to find a test that will break for TwoByteString path. When |sap|
is inside DisallowHeapAllocation block, accessing |converted| with an unmodified
buffer, I definitely get a crash for a OneByte input in a debug build.

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-04-30 00:04:20 UTC) #91

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1000001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1000001

4 years, 7 months ago (2016-04-30 00:04:28 UTC) #92

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-04-30 00:26:59 UTC) #93

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 7 months ago (2016-04-30 00:27:02 UTC) #94

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-05-02 22:10:15 UTC) #95

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1020001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1020001

4 years, 7 months ago (2016-05-02 22:10:22 UTC) #96

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-05-02 22:27:50 UTC) #97

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_avx2_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_avx2_rel_ng/builds/1042) v8_linux64_avx2_rel_ng_triggered on ...

4 years, 7 months ago (2016-05-02 22:27:52 UTC) #98

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc File src/bootstrapper.cc (right): https://codereview.chromium.org/1812673005/diff/860001/src/bootstrapper.cc#newcode203 src/bootstrapper.cc:203: DECLARE_FEATURE_INITIALIZATION(icu_case_mapping, "") On 2016/04/29 at 18:03:23, jshin (jungshik at ...

4 years, 7 months ago (2016-05-03 17:58:17 UTC) #99

Dan Ehrenberg

As we've discussed off-line, the main TODO item is to move all behavior changes from ...

4 years, 7 months ago (2016-05-03 21:24:10 UTC) #100

jungshik at Google

Thank you, Dan for the review and help. Now both to{L,U}Case and toLocale{L,U}Case are behind ...

4 years, 7 months ago (2016-05-03 22:16:00 UTC) #101

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js File src/js/icu-case-mapping.js (right): https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js#newcode13 src/js/icu-case-mapping.js:13: var LocaleConvertCase = utils.ImportNow("LocaleConvertCase"); Rather than using ImportNow (generally ...

4 years, 7 months ago (2016-05-04 00:13:08 UTC) #102

jungshik at Google

https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js File src/js/icu-case-mapping.js (right): https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js#newcode13 src/js/icu-case-mapping.js:13: var LocaleConvertCase = utils.ImportNow("LocaleConvertCase"); On 2016/05/04 00:13:08, Dan Ehrenberg ...

4 years, 7 months ago (2016-05-04 19:02:58 UTC) #103

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js File src/js/icu-case-mapping.js (right): https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js#newcode19 src/js/icu-case-mapping.js:19: OverrideFunction(GlobalString.prototype, 'toLowerCase', function() { On 2016/05/04 at 19:02:58, jshin ...

4 years, 7 months ago (2016-05-04 19:06:37 UTC) #105

jungshik at Google

On 2016/05/04 19:06:37, Dan Ehrenberg wrote: > https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js > File src/js/icu-case-mapping.js (right): > > https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mapping.js#newcode19 ...

4 years, 7 months ago (2016-05-04 22:03:22 UTC) #106

On 2016/05/04 19:06:37, Dan Ehrenberg wrote:
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> File src/js/icu-case-mapping.js (right):
> 
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> src/js/icu-case-mapping.js:19: OverrideFunction(GlobalString.prototype,
> 'toLowerCase', function() {
> On 2016/05/04 at 19:02:58, jshin (jungshik at google) wrote:
> > offline, you're worried about actually defining functions in this file
instead
> of just installing them. 
> > 
> > Would this be better, then?
> > 
> > OverrideFunction(GlobalString.prototype, 'toLowerCase', toLowerCaseICU);
> > 
> > where |toLowerCaseICU| is defined in i18n.js.
> > 
> > I found an example in harmony-regexp-exec.js exactly doing that. I'll make a
> change.
> 
> That's what I was suggesting, though if you found that performance is already
> good in this configuration, then that might not be necessary. Note that in
this
> case, you'll need to use ImportNow. Eventually, the flag will be switched to
be
> flipped on all the time and this file will be merged into i18n.js, so this is
> just about performance during the transition.

Thanks. I was able to change the way to{L,U}Case are overriden (And, I found out
that I need to use importNow instead of import by trial/error because I hadn't
read your reply while trying :-) ).  

However, my attempt to move toLocale{L,U}Case was not successful. I don't know
how to deal with 'arguments[0]' in the function body. Various attempts led to
all sort of funny crashes. 

PS 54 does not have any of this, yet.  Because to{L,U}Case are more critical
than their Locale counterparts (toLocale{L,U}Case was broken completely anyway
in the ToT), I'll just change the way to{L,U}Case are overriden (while leaving
alone toLocale{L,U}Case) hoping that it may squeeze out a bit more perf
improvement (I don't know if there's any significatn difference. I'm gonna
measure it).

Dan Ehrenberg

On 2016/05/04 at 22:03:22, jshin wrote: > On 2016/05/04 19:06:37, Dan Ehrenberg wrote: > > ...

4 years, 7 months ago (2016-05-04 22:27:17 UTC) #107

On 2016/05/04 at 22:03:22, jshin wrote:
> On 2016/05/04 19:06:37, Dan Ehrenberg wrote:
> >
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > File src/js/icu-case-mapping.js (right):
> > 
> >
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > src/js/icu-case-mapping.js:19: OverrideFunction(GlobalString.prototype,
> > 'toLowerCase', function() {
> > On 2016/05/04 at 19:02:58, jshin (jungshik at google) wrote:
> > > offline, you're worried about actually defining functions in this file
instead
> > of just installing them. 
> > > 
> > > Would this be better, then?
> > > 
> > > OverrideFunction(GlobalString.prototype, 'toLowerCase', toLowerCaseICU);
> > > 
> > > where |toLowerCaseICU| is defined in i18n.js.
> > > 
> > > I found an example in harmony-regexp-exec.js exactly doing that. I'll make
a
> > change.
> > 
> > That's what I was suggesting, though if you found that performance is
already
> > good in this configuration, then that might not be necessary. Note that in
this
> > case, you'll need to use ImportNow. Eventually, the flag will be switched to
be
> > flipped on all the time and this file will be merged into i18n.js, so this
is
> > just about performance during the transition.
> 
> Thanks. I was able to change the way to{L,U}Case are overriden (And, I found
out that I need to use importNow instead of import by trial/error because I
hadn't read your reply while trying :-) ).  
> 
> However, my attempt to move toLocale{L,U}Case was not successful. I don't know
how to deal with 'arguments[0]' in the function body. Various attempts led to
all sort of funny crashes. 
> 
> PS 54 does not have any of this, yet.  Because to{L,U}Case are more critical
than their Locale counterparts (toLocale{L,U}Case was broken completely anyway
in the ToT), I'll just change the way to{L,U}Case are overriden (while leaving
alone toLocale{L,U}Case) hoping that it may squeeze out a bit more perf
improvement (I don't know if there's any significatn difference. I'm gonna
measure it).

The arguments[0] trick was just a funny way to avoid having to set the function
length explicitly. You can just take the arguments as normal function parameters
and later call %FunctionSetLength to set it to 0.

jungshik at Google

On 2016/05/04 22:27:17, Dan Ehrenberg wrote: > On 2016/05/04 at 22:03:22, jshin wrote: > > ...

4 years, 7 months ago (2016-05-05 00:33:30 UTC) #108

On 2016/05/04 22:27:17, Dan Ehrenberg wrote:
> On 2016/05/04 at 22:03:22, jshin wrote:
> > On 2016/05/04 19:06:37, Dan Ehrenberg wrote:
> > >
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > > File src/js/icu-case-mapping.js (right):
> > > 
> > >
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > > src/js/icu-case-mapping.js:19: OverrideFunction(GlobalString.prototype,
> > > 'toLowerCase', function() {
> > > On 2016/05/04 at 19:02:58, jshin (jungshik at google) wrote:
> > > > offline, you're worried about actually defining functions in this file
> instead
> > > of just installing them. 
> > > > 
> > > > Would this be better, then?
> > > > 
> > > > OverrideFunction(GlobalString.prototype, 'toLowerCase', toLowerCaseICU);
> > > > 
> > > > where |toLowerCaseICU| is defined in i18n.js.
> > > > 
> > > > I found an example in harmony-regexp-exec.js exactly doing that. I'll
make
> a
> > > change.
> > > 
> > > That's what I was suggesting, though if you found that performance is
> already
> > > good in this configuration, then that might not be necessary. Note that in
> this
> > > case, you'll need to use ImportNow. Eventually, the flag will be switched
to
> be
> > > flipped on all the time and this file will be merged into i18n.js, so this
> is
> > > just about performance during the transition.
> > 
> > Thanks. I was able to change the way to{L,U}Case are overriden (And, I found
> out that I need to use importNow instead of import by trial/error because I
> hadn't read your reply while trying :-) ).  
> > 
> > However, my attempt to move toLocale{L,U}Case was not successful. I don't
know
> how to deal with 'arguments[0]' in the function body. Various attempts led to
> all sort of funny crashes. 
> > 
> > PS 54 does not have any of this, yet.  Because to{L,U}Case are more critical
> than their Locale counterparts (toLocale{L,U}Case was broken completely anyway
> in the ToT), I'll just change the way to{L,U}Case are overriden (while leaving
> alone toLocale{L,U}Case) hoping that it may squeeze out a bit more perf
> improvement (I don't know if there's any significatn difference. I'm gonna
> measure it).
> 
> The arguments[0] trick was just a funny way to avoid having to set the
function
> length explicitly. You can just take the arguments as normal function
parameters
> and later call %FunctionSetLength to set it to 0.

Thanks. I've tried it and it still does not work. 

More importantly, even PS 55 (that only changed the way to{L,U}Case are
overriden) has an issue. It fails the following test and other similar tests:

=== test262/built-ins/String/prototype/toLowerCase/S15.5.4.16_A6 ===           
/usr/local/google/home/jungshik/v8/v8/test/test262/data/harness/sta.js:18:
Test262Error: #1: 
String.prototype.toLowerCase.prototype === undefined. Actual: [object Object]
    throw new Test262Error(message);
    ^
|String.prototype.toLowerCase.prototype| is supposed to be undefined, but it's
not the case. 

BTW, PS 53 (function body in icu_case_mapping.js) vs PS 55 (function body in
i18n.js): in most cases, PS 55 is 'faster' than PS 53 (by a constant
time no matter how long a test is), but it's all within a standard deviation.
So, it's not statistically significant. 

https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
has more details (PS 53 v PS55 tab).

jungshik at Google

On 2016/05/05 00:33:30, jshin (jungshik at google) wrote: > On 2016/05/04 22:27:17, Dan Ehrenberg wrote: ...

4 years, 7 months ago (2016-05-05 17:54:34 UTC) #109

On 2016/05/05 00:33:30, jshin (jungshik at google) wrote:
> On 2016/05/04 22:27:17, Dan Ehrenberg wrote:
> > On 2016/05/04 at 22:03:22, jshin wrote:
> > > On 2016/05/04 19:06:37, Dan Ehrenberg wrote:
> > > >
> >
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > > > File src/js/icu-case-mapping.js (right):
> > > > 
> > > >
> >
>
https://codereview.chromium.org/1812673005/diff/1040001/src/js/icu-case-mappi...
> > > > src/js/icu-case-mapping.js:19: OverrideFunction(GlobalString.prototype,
> > > > 'toLowerCase', function() {
> > > > On 2016/05/04 at 19:02:58, jshin (jungshik at google) wrote:
> > > > > offline, you're worried about actually defining functions in this file
> > instead
> > > > of just installing them. 
> > > > > 
> > > > > Would this be better, then?
> > > > > 
> > > > > OverrideFunction(GlobalString.prototype, 'toLowerCase',
toLowerCaseICU);
> > > > > 
> > > > > where |toLowerCaseICU| is defined in i18n.js.
> > > > > 
> > > > > I found an example in harmony-regexp-exec.js exactly doing that. I'll
> make
> > a
> > > > change.
> > > > 
> > > > That's what I was suggesting, though if you found that performance is
> > already
> > > > good in this configuration, then that might not be necessary. Note that
in
> > this
> > > > case, you'll need to use ImportNow. Eventually, the flag will be
switched
> to
> > be
> > > > flipped on all the time and this file will be merged into i18n.js, so
this
> > is
> > > > just about performance during the transition.

> > > However, my attempt to move toLocale{L,U}Case was not successful. I don't
> know
> > how to deal with 'arguments[0]' in the function body. Various attempts led
to
> > all sort of funny crashes. 
...
> > The arguments[0] trick was just a funny way to avoid having to set the
> function
> > length explicitly. You can just take the arguments as normal function
> parameters
> > and later call %FunctionSetLength to set it to 0.
> 
> Thanks. I've tried it and it still does not work. 
> 
> More importantly, even PS 55 (that only changed the way to{L,U}Case are
> overriden) has an issue. It fails the following test and other similar tests:
> 
> === test262/built-ins/String/prototype/toLowerCase/S15.5.4.16_A6 ===          

> /usr/local/google/home/jungshik/v8/v8/test/test262/data/harness/sta.js:18:
> Test262Error: #1: 
> String.prototype.toLowerCase.prototype === undefined. Actual: [object Object]
>     throw new Test262Error(message);
>     ^
> |String.prototype.toLowerCase.prototype| is supposed to be undefined, but it's
> not the case. 

There's a typo in my previous attempt to override toLocale*Case in a more
efficient way. 
With that fixed (PS 56), all the case-conversion-related tests pass with
--icu_case_mapping passed.

However, the above test (S15.5.4.16_A6) still fails for all 4 functions
(to{U,L}Case and toLocale{U,L}Case). 

Dan, can you take a look at PS 56 and see if there's anything I can do  to make
the test happy ( 
|String.prototype.toLowerCase.prototype|  should be undefined to pass the test)
?  

Otherwise, I'll just go back to PS 54 with a less efficient override of those 4
functions. 
As shown in the column M and N of 'PS 53 vs PS 55' tab of the following
spreadsheet, the difference between
PS 53 and PS 55 are within one sigma (even though PS 55 is better in most cases;
perhaps pooling all
the test results together also shows that PS 55 is faster than PS 53 by ~20 ms
for 10^7 operations) and is 
insignificant. 


https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...

Thanks

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-05-05 19:44:57 UTC) #110

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1120001

4 years, 7 months ago (2016-05-05 19:45:08 UTC) #111

jungshik at Google

On 2016/05/05 17:54:34, jshin (jungshik at google) wrote: > > /usr/local/google/home/jungshik/v8/v8/test/test262/data/harness/sta.js:18: > > Test262Error: #1: ...

4 years, 7 months ago (2016-05-05 19:47:21 UTC) #112

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-05 19:53:10 UTC) #113

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  I18N is enabled.

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with I18N enabled passes a bunch of tests
in test262/intl402/Strings/* and intl/* that failed before.

In non-intl builds, they're still handled by unibrow with a few test failures.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=  test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
--icu_case_mapping flag is turned on. To control the override by the flag,
they're overriden in icu-case-mapping.js 

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch of tests
in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast as
Unibrow's implementation that uses word-by-word case conversion. OTOH, Latin-1
input handling is
faster than Unibrow. General Unicode input handling is slower but more accurate.


This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=  test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case
==========

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-05 19:55:11 UTC) #114

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
--icu_case_mapping flag is turned on. To control the override by the flag,
they're overriden in icu-case-mapping.js 

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr), Greek
(el)
and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch of tests
in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast as
Unibrow's implementation that uses word-by-word case conversion. OTOH, Latin-1
input handling is
faster than Unibrow. General Unicode input handling is slower but more accurate.


This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but
has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=  test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-05-05 20:16:00 UTC) #115

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 7 months ago (2016-05-05 20:16:03 UTC) #116

Dan Ehrenberg

littledan@chromium.org changed reviewers: + machenbach@chromium.org

4 years, 7 months ago (2016-05-05 23:36:54 UTC) #117

Dan Ehrenberg

lgtm +machenbach for build file changes. This patch looks good to me, but I'd hold ...

4 years, 7 months ago (2016-05-05 23:36:57 UTC) #118

jungshik at Google

Thanks, Dan and Michael. Yang, can you take another look? Thanks ! https://codereview.chromium.org/1812673005/diff/1120001/test/intl/testcfg.py File test/intl/testcfg.py ...

4 years, 7 months ago (2016-05-06 21:51:18 UTC) #120

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-05-09 18:45:53 UTC) #121

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1140001

4 years, 7 months ago (2016-05-09 18:46:11 UTC) #122

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-05-09 19:16:24 UTC) #123

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 7 months ago (2016-05-09 19:16:26 UTC) #124

jungshik at Google

On 2016/05/05 23:36:57, Dan Ehrenberg wrote: > lgtm > > +machenbach for build file changes. ...

4 years, 7 months ago (2016-05-10 04:42:12 UTC) #125

Yang

https://codereview.chromium.org/1812673005/diff/1140001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/1812673005/diff/1140001/src/flag-definitions.h#newcode184 src/flag-definitions.h:184: DEFINE_IMPLICATION(es_staging, icu_case_mapping) Can we instead put this into the ...

4 years, 7 months ago (2016-05-10 10:00:46 UTC) #126

Dan Ehrenberg

https://codereview.chromium.org/1812673005/diff/1160001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/1812673005/diff/1160001/src/flag-definitions.h#newcode205 src/flag-definitions.h:205: #else Could you define this flag unconditionally, and ignore ...

4 years, 7 months ago (2016-05-10 19:29:37 UTC) #127

jungshik at Google

Thank you, Yang, for taking a look. The latest CL addresses Yang's comments. Besides, Dan ...

4 years, 7 months ago (2016-05-10 20:15:18 UTC) #128

Thank you, Yang, for taking a look. 

The latest CL addresses Yang's comments.

Besides, Dan asked me to put '--icu_case_mapping' in the group of flags for
HARMONY_INPROGRESS (instead of HARMONY_STAGED). As a result, empty global
initialization function definition has to be added back for --icu_case_mapping
(that was removed per Yang the other day) because HARMONY_INPROGRESS macro adds
a call to an empty global init and a declaration for icu_case_mapping.

https://codereview.chromium.org/1812673005/diff/1140001/src/js/i18n.js
File src/js/i18n.js (right):

https://codereview.chromium.org/1812673005/diff/1140001/src/js/i18n.js#newcod...
src/js/i18n.js:2119: [ToLowerCaseI18N, ToUpperCaseI18N, ToLocaleLowerCaseI18N,
ToLocaleUpperCaseI18N].
On 2016/05/10 10:00:46, Yang wrote:
> Can we avoid using this short hand? I'd rather have %FunctionRemovePrototype
> called verbatim. We are assuming, at this point, that forEach is installed,
and
> we are using the correct one. This may all hold at this point, but seems
> unnecessary to me. And we are unnecessarily creating an array literal. 

Done.

https://codereview.chromium.org/1812673005/diff/1140001/src/runtime/runtime-i...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/1140001/src/runtime/runtime-i...
src/runtime/runtime-i18n.cc:806: converted.setTo(false, src, src_length);
On 2016/05/10 10:00:46, Yang wrote:
> Can you explain when the source string is copied? The comment for
> icu::Transliterator::transliterate says "Transliterates an entire string in
> place.", which suggests that the existing string may be overwritten.

Whenever the read-alias buffer needs to be overwritten/modified,
icu::UnicodeString makes a copy before doing that. To make 100% sure, last week
I tracked down where it happens when UnicodeString is passed to Transliterate.
Note that Transliterate() takes 'icu::Replaceable' (inherited by UnicodeString).
Transliterate() calls Replaceable::handleReplaceBetween whose UnicodeString()
implementation calls CloneIfNeeded() to honor 'copy-on-write'. 

> I haven't found a test case that checks that the input string has not been
> overwritten.

A few CLs ago, I added the following test cases where input is the same as
output (in test/intl/general/case-mapping.js ). 
When the memory pointing to the input (unmodified) becomes inaccessible, I got a
crash. That was fixed. 

assertEquals("ΑΒΓΔΕ", "ΑΒΓΔΕ".toLocaleUpperCase("el"));
assertEquals("ΑΒΓΔΕАБ𝐀𝐁", "ΑΒΓΔΕАБ𝐀𝐁".toLocaleUpperCase("el"));
assertEquals("ABCDEÂÓḴ123", "ABCDEÂÓḴ123".toLocaleUpperCase("el"));

Ahah  what you want this time  is the opposite case.  When the output is
different from the input, you want me to verify that the input is NOT changed,
don't you?  I added test cases for that and they pass both in Debug and Release
builds.

Yang

https://codereview.chromium.org/1812673005/diff/1160001/test/intl/general/case-mapping.js File test/intl/general/case-mapping.js (right): https://codereview.chromium.org/1812673005/diff/1160001/test/intl/general/case-mapping.js#newcode115 test/intl/general/case-mapping.js:115: var uppered = s.toLocaleUpperCase("el"); unfortunately this is not the ...

4 years, 7 months ago (2016-05-10 20:37:37 UTC) #129

jungshik at Google

https://codereview.chromium.org/1812673005/diff/1160001/src/flag-definitions.h File src/flag-definitions.h (right): https://codereview.chromium.org/1812673005/diff/1160001/src/flag-definitions.h#newcode205 src/flag-definitions.h:205: #else On 2016/05/10 19:29:37, Dan Ehrenberg wrote: > Could ...

4 years, 7 months ago (2016-05-10 23:22:33 UTC) #130

jungshik at Google

Can you take another look? Thanks ! https://codereview.chromium.org/1812673005/diff/1160001/test/intl/general/case-mapping.js File test/intl/general/case-mapping.js (right): https://codereview.chromium.org/1812673005/diff/1160001/test/intl/general/case-mapping.js#newcode115 test/intl/general/case-mapping.js:115: var uppered ...

4 years, 7 months ago (2016-05-10 23:33:24 UTC) #131

jungshik at Google

The CQ bit was checked by jshin@chromium.org to run a CQ dry run

4 years, 7 months ago (2016-05-10 23:34:37 UTC) #132

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1200001

4 years, 7 months ago (2016-05-10 23:34:53 UTC) #133

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 7 months ago (2016-05-10 23:37:22 UTC) #134

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: v8_linux64_asan_rel_ng on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux64_asan_rel_ng/builds/1439) v8_linux64_rel_ng on ...

4 years, 7 months ago (2016-05-10 23:37:24 UTC) #135

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-10 23:40:35 UTC) #136

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

Yang

LGTM https://codereview.chromium.org/1812673005/diff/1200001/src/runtime/runtime-i18n.cc File src/runtime/runtime-i18n.cc (right): https://codereview.chromium.org/1812673005/diff/1200001/src/runtime/runtime-i18n.cc#newcode807 src/runtime/runtime-i18n.cc:807: ConvertCaseWithTransliterator(&converted, "el-Upper"); So... if ConvertCaseWithTransliterator does not change ...

4 years, 7 months ago (2016-05-11 08:42:30 UTC) #137

jungshik at Google

Thank you, Yang ! Your comment was addressed. I'm adding the CL to CQ. https://codereview.chromium.org/1812673005/diff/1200001/src/runtime/runtime-i18n.cc ...

4 years, 7 months ago (2016-05-11 18:10:08 UTC) #138

jungshik at Google

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-11 18:26:58 UTC) #140

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

jungshik at Google

The CQ bit was checked by jshin@chromium.org

4 years, 7 months ago (2016-05-11 18:28:01 UTC) #141

jungshik at Google

The patchset sent to the CQ was uploaded after l-g-t-m from machenbach@chromium.org, yangguo@chromium.org Link to ...

4 years, 7 months ago (2016-05-11 18:28:02 UTC) #142

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1812673005/1240001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1812673005/1240001

4 years, 7 months ago (2016-05-11 18:28:06 UTC) #143

commit-bot: I haz the power

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-11 19:01:51 UTC) #144

Message was sent while issue was closed.

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

commit-bot: I haz the power

Committed patchset #63 (id:1240001)

4 years, 7 months ago (2016-05-11 19:01:54 UTC) #145

commit-bot: I haz the power

Description was changed from ========== Use ICU case conversion/transliterator for case conversion When I18N is ...

4 years, 7 months ago (2016-05-11 19:03:15 UTC) #146

Message was sent while issue was closed.

Description was changed from

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.


[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*
==========

to

==========
Use ICU case conversion/transliterator for case conversion

When I18N is enabled, use ICU's case conversion API and transliteration
API [1] to implement String.prototype.to{Upper,Lower}Case and
String.prototype.toLocale{Upper,Lower}Case.

* ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
* The above 4 functions are overridden with those in i18n.js when
  --icu_case_mapping flag is turned on. To control the override by the flag,
  they're overriden in icu-case-mapping.js

Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
support locale-sensitive case conversion for Turkic languages (az, tr),
Greek (el) and Lithuanian (lt).

Before ICU APIs for the most general case are called, a fast-path for Latin-1
is tried. It's taken from Blink and adopted as necessary. This fast path
is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
when a locale (explicitly specified or default) is not in {az, el, lt, tr}.

With these changes, a build with --icu_case_mapping=true passes a bunch
of tests in test262/intl402/Strings/* and intl/* that failed before.

Handling of pure ASCII strings (aligned at word boundary) are not as fast
as Unibrow's implementation that uses word-by-word case conversion. OTOH,
Latin-1 input handling is faster than Unibrow. General Unicode input
handling is slower but more accurate.

See
https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_...
for the benchmark.

This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
but has changed significantly since.

[1] See why transliteration API is needed for uppercasing in Greek.
    http://bugs.icu-project.org/trac/ticket/10582

R=yangguo
BUG=v8:4476,v8:4477
LOG=Y
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
mjsunit/string-case,
     intl/general/case*

Committed: https://crrev.com/b348d47bb94399045394bf4743c0c8c35328923b
Cr-Commit-Position: refs/heads/master@{#36187}
==========

commit-bot: I haz the power

Patchset 63 (id:??) landed as https://crrev.com/b348d47bb94399045394bf4743c0c8c35328923b Cr-Commit-Position: refs/heads/master@{#36187}

4 years, 7 months ago (2016-05-11 19:03:18 UTC) #147

srl295

4 years, 4 months ago (2016-07-27 18:53:39 UTC) #149

Message was sent while issue was closed.

Not sure if i'm late or early to the party.

https://codereview.chromium.org/1812673005/diff/1240001/src/js/i18n.js
File src/js/i18n.js (right):

https://codereview.chromium.org/1812673005/diff/1240001/src/js/i18n.js#newcod...
src/js/i18n.js:2003: // toLocale{U,L}Case() and about 40% of
toLocale{U,L}Case("<locale>").
if it's slow, please file a bug in ICU to address. Filed:
http://bugs.icu-project.org/trac/ticket/12647

https://codereview.chromium.org/1812673005/diff/1240001/src/js/i18n.js#newcod...
src/js/i18n.js:2019: var CUSTOM_CASE_LANGUAGES = ['az', 'el', 'lt', 'tr'];
Why are these hard coded? This decision should be made in ICU. Fragile.

https://codereview.chromium.org/1812673005/diff/1240001/src/js/i18n.js#newcod...
src/js/i18n.js:2019: var CUSTOM_CASE_LANGUAGES = ['az', 'el', 'lt', 'tr'];
Filed : http://bugs.icu-project.org/trac/ticket/12647

https://codereview.chromium.org/1812673005/diff/1240001/src/runtime/runtime-i...
File src/runtime/runtime-i18n.cc (right):

https://codereview.chromium.org/1812673005/diff/1240001/src/runtime/runtime-i...
src/runtime/runtime-i18n.cc:754: 
filed ICU bug http://bugs.icu-project.org/trac/ticket/12647 to pull this
fastpath into ICU.

Issue 1812673005: Use ICU case conversion/transliterator for case conversion behind a flag (Closed)