|
|
Created:
5 years ago by Dan Ehrenberg Modified:
4 years, 11 months ago CC:
v8-reviews_googlegroups.com Base URL:
https://chromium.googlesource.com/v8/v8.git@master Target Ref:
refs/pending/heads/master Project:
v8 Visibility:
Public. |
DescriptionCall out to ICU for case conversion under a new flag
V8 has its own implementation of case conversion, in unibrow, which
has a couple bugs. This patch calls out to Unicode case conversion
from String.prototype.toLowerCase() and String.prototype.toUpperCase()
if the --icu-case-mapping flag is passed. On no-intl builds, the flag
is disabled. A fast-path for some latin1 strings is taken from Blink
in an attempt to avoid performance regressions, and microbenchmarks
show that it is competitive with the old implementation. This new behavior
fixes a number of test262 tests, as well as a Kangax test.
R=yangguo
BUG=v8:4476
LOG=Y
Patch Set 1 #Patch Set 2 : Fix memory allocation, type checking, webkit test passes now #Patch Set 3 : Fix Windows typecheck issue #Patch Set 4 : Starting to bring in Blink optimization (doesn't work yet) #Patch Set 5 : Add ICU casing conditionally behind a flag, with all optimizations #Patch Set 6 : Fix cast that Windows doesn't like #Patch Set 7 : Fix lint issue #Patch Set 8 : Add a test, fix cast (again) #Patch Set 9 : Hopefully fix lint #Patch Set 10 : Alphabetize includes #Patch Set 11 : Additional test case #
Total comments: 12
Messages
Total messages: 44 (22 generated)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/1
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_presubmit on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/9226)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_win_nosnap_shared_compile_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_nosnap_shared_compil...)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/80001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_presubmit on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/9275)
Description was changed from ========== Call out to ICU for case conversion The current code is just a straw man; it needs a bunch of optimization to be reasonable. There is a risk of regression on benchmarks. This will fix a number of test262 tests, as well as a Kangax test. R=yangguo BUG=v8:4476 LOG=Y ========== to ========== Call out to ICU for case conversion under a new flag V8 has its own implementation of case conversion, in unibrow, which has a couple bugs. This patch calls out to Unicode case conversion from String.prototype.toLowerCase() and String.prototype.toUpperCase() if the --icu-case-mapping flag is passed. On no-intl builds, the flag is disabled. A fast-path for some latin1 strings is taken from Blink in an attempt to avoid performance regressions, but benchmarking work is still needed to understand the performance implications. This new behavior fixes a number of test262 tests, as well as a Kangax test. R=yangguo BUG=v8:4476 LOG=Y ==========
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/100001
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/120001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_arm64_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_arm64_rel/builds/1...) v8_mac_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_mac_rel/builds/13575)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/140001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_presubmit on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/9279)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/160001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_presubmit on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/9282)
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/180001
This patch introduces the option of using ICU for case conversion, as suggested by https://docs.google.com/document/d/1ueSYBXXAf0cI3H1gyjwLdyTNhOE7WLeW2Yhu1cBRg... Benchmarking work still needs to be done, but it's hoped that this simple flag interface will facilitate that. This patch is written so it can be committed before we are sure that we can switch over to it, as it's guarded by a flag, and should allow builds with internationalization disabled which fall back to always using unibrow (I tested it manually that way). What do you think?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2015/12/25 at 08:05:50, Dan Ehrenberg wrote: > This patch introduces the option of using ICU for case conversion, as suggested by > https://docs.google.com/document/d/1ueSYBXXAf0cI3H1gyjwLdyTNhOE7WLeW2Yhu1cBRg... > > Benchmarking work still needs to be done, but it's hoped that this simple flag interface will facilitate that. This patch is written so it can be committed before we are sure that we can switch over to it, as it's guarded by a flag, and should allow builds with internationalization disabled which fall back to always using unibrow (I tested it manually that way). > > What do you think? I ran some quick microbenchmarks: https://docs.google.com/spreadsheets/d/1xDpYTaFVE97rtqQ5KyZCk4T_QJsKp-S_lRhQZ... Looks like this path is a little better on some workloads (including ASCII and Latin1) and a bit worse on some others, but nothing more than 2x in either direction on any workload I tried.
The CQ bit was checked by littledan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1544023002/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1544023002/200001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== Call out to ICU for case conversion under a new flag V8 has its own implementation of case conversion, in unibrow, which has a couple bugs. This patch calls out to Unicode case conversion from String.prototype.toLowerCase() and String.prototype.toUpperCase() if the --icu-case-mapping flag is passed. On no-intl builds, the flag is disabled. A fast-path for some latin1 strings is taken from Blink in an attempt to avoid performance regressions, but benchmarking work is still needed to understand the performance implications. This new behavior fixes a number of test262 tests, as well as a Kangax test. R=yangguo BUG=v8:4476 LOG=Y ========== to ========== Call out to ICU for case conversion under a new flag V8 has its own implementation of case conversion, in unibrow, which has a couple bugs. This patch calls out to Unicode case conversion from String.prototype.toLowerCase() and String.prototype.toUpperCase() if the --icu-case-mapping flag is passed. On no-intl builds, the flag is disabled. A fast-path for some latin1 strings is taken from Blink in an attempt to avoid performance regressions, and microbenchmarks show that it is competitive with the old implementation. This new behavior fixes a number of test262 tests, as well as a Kangax test. R=yangguo BUG=v8:4476 LOG=Y ==========
yangguo@chromium.org changed reviewers: + yangguo@chromium.org
Sorry it took this long. Vacation... :) https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... File src/runtime/runtime-strings.cc (right): https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1078: namespace { Can we move all of that into its own file? https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1097: reinterpret_cast<const UChar*>(SeqTwoByteString::cast(*s)->GetChars()); Please use String::FlatContent to access the flat content. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1119: int32_t real_length_again = fn(reinterpret_cast<UChar*>(result->GetChars()), Instead of running the whole thing again, you could simply create a new string with the excess length, convert the remaining characters, and create a cons string for both parts. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1120: real_length, src, length, "", &error); src can have moved already after the allocation of the two byte string. Please use a new instance of String::FlatContent to refetch the source. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1155: uint16_t ch = s->Get(index); This can get pretty expensive if the string is a deeply nested ConsString. Please flatten it and use String::FlatContent instead. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1176: if (StringShape(*s).IsSequentialOneByte()) { Let's not use StringShape. Just check for s->IsSeqOneByteString(). https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1180: memcpy(result->GetChars(), source->GetChars(), first_index_to_lower); We should just use CopyChars for uniformity here. See below. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1193: // as Blink does, but there's also the ConsString case. As explained above, we should flatten upfront, so no ConsStrings here. We could then do CopyChars up to first_index_to_lower, which handles the two-byte to one-byte copying. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1215: const uint16_t sharp_s = L'\u00DF'; 0xDF should do the trick to, right? https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1232: uint16_t ch = s->Get(index); Again, please flatten upfront and use String::FlatContent. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1253: goto upconvert; Can't we simply "return ConvertCaseICU(s, isolate, u_strToUpper);"? I'd prefer this small code duplication way more than goto. https://codereview.chromium.org/1544023002/diff/200001/src/runtime/runtime-st... src/runtime/runtime-strings.cc:1293: if (FLAG_icu_case_mapping) the convention with multiline conditionals is to use brackets. |