Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(19)

Issue 2081653007: Replace ICU with CED for auto encoding detection (Closed)

Created:
4 years, 6 months ago by Jinsuk Kim
Modified:
4 years, 5 months ago
CC:
blink-reviews, chromium-reviews
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Replace ICU with CED for auto encoding detection This is a drop-in replacement of ICU library performing automatic text encoding detection with CED (Compact Encdoing Detection). CED is used extensively in Google for every crawled web page, email message, query string, etc., and recently open-sourced for public use. (https://github.com/google/compact_enc_det) Also it is a much better alternative to ICU in terms of speed. ICU introduces significant regression in page loading (up to 30%): = ICU auto-detection vs. TOT = page_cycler.typical_25:cold_times.page_load_time 1085.13±9.31% 754.28±12.03% (30.49%) http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-05-08_21-20-58 while CED adds virtually no additional loading time (delta < sigma): = CED auto-detection vs. TOT = page_cycler.typical_25:cold_times.page_load_time ms 705.70±9.49% vs. 760.31±11.90% (-7.74%) http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-05-08_20-37-54 With CED, it is feasible to turn on auto encoding detection by default so that web pages without encoding label can be taken care of. It will be done in a follow-up CL. BUG=597488 Committed: https://crrev.com/13510755ac11e6e1f58c34ef8c9bd4cf925c8d70 Cr-Commit-Position: refs/heads/master@{#402622}

Patch Set 1 #

Patch Set 2 : fix trybot builds #

Total comments: 5

Patch Set 3 : deps #

Total comments: 2

Patch Set 4 : hint encoding #

Unified diffs Side-by-side diffs Delta from patch set Stats (+16 lines, -69 lines) Patch
M third_party/WebKit/Source/platform/BUILD.gn View 1 2 1 chunk +1 line, -0 lines 0 comments Download
M third_party/WebKit/Source/platform/DEPS View 1 2 1 chunk +1 line, -0 lines 0 comments Download
M third_party/WebKit/Source/platform/blink_platform.gyp View 1 1 chunk +1 line, -0 lines 0 comments Download
M third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp View 1 2 3 2 chunks +13 lines, -69 lines 0 comments Download

Messages

Total messages: 24 (6 generated)
Jinsuk Kim
A few Windows broken trybot builds can be fixed by crrev.com/2092533003.
4 years, 6 months ago (2016-06-22 23:23:27 UTC) #4
tkent
I understand CED is faster than ICU according to the CL description. What about correctness? ...
4 years, 6 months ago (2016-06-22 23:36:41 UTC) #5
aelias_OOO_until_Jul13
On 2016/06/22 at 23:36:41, tkent wrote: > I understand CED is faster than ICU according ...
4 years, 6 months ago (2016-06-22 23:53:57 UTC) #6
Jinsuk Kim
For accuracy, I have only anecdotal results like tests against the URLs reported here https://bugs.chromium.org/p/chromium/issues/detail?id=512032#c19 ...
4 years, 6 months ago (2016-06-23 00:45:57 UTC) #8
tkent
I don't believe this code produces results as accurate as CED in Google or ICU ...
4 years, 6 months ago (2016-06-23 03:30:39 UTC) #9
tkent
On 2016/06/23 at 03:30:39, tkent wrote: > If you'd like to produce this change without ...
4 years, 6 months ago (2016-06-23 03:32:17 UTC) #10
Jinsuk Kim
On 2016/06/23 03:30:39, tkent wrote: > I don't believe this code produces results as accurate ...
4 years, 6 months ago (2016-06-23 04:33:19 UTC) #11
tkent
On 2016/06/23 at 04:33:19, jinsukkim wrote: > On 2016/06/23 03:30:39, tkent wrote: > > I ...
4 years, 6 months ago (2016-06-23 05:17:31 UTC) #12
Jinsuk Kim
On 2016/06/23 05:17:31, tkent wrote: > On 2016/06/23 at 04:33:19, jinsukkim wrote: > > On ...
4 years, 6 months ago (2016-06-23 06:00:13 UTC) #13
tkent
I think we may proceed this CL and enable encoding detection by default if hintEncodingName ...
4 years, 6 months ago (2016-06-23 07:30:58 UTC) #14
Jinsuk Kim
PTAL. This needs update on CED to support encoding string to enum conversion google3/i18n/encodings/public/encodings.h. I'm ...
4 years, 6 months ago (2016-06-23 07:35:58 UTC) #15
tkent
On 2016/06/23 at 07:35:58, jinsukkim wrote: > PTAL. This needs update on CED to support ...
4 years, 6 months ago (2016-06-23 07:38:47 UTC) #16
Jinsuk Kim
On 2016/06/23 07:38:47, tkent wrote: > On 2016/06/23 at 07:35:58, jinsukkim wrote: > > PTAL. ...
4 years, 5 months ago (2016-06-27 17:28:44 UTC) #17
tkent
lgtm
4 years, 5 months ago (2016-06-27 22:46:23 UTC) #18
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2081653007/120001
4 years, 5 months ago (2016-06-28 21:15:15 UTC) #20
commit-bot: I haz the power
Committed patchset #4 (id:120001)
4 years, 5 months ago (2016-06-29 00:45:38 UTC) #21
commit-bot: I haz the power
Patchset 4 (id:??) landed as https://crrev.com/13510755ac11e6e1f58c34ef8c9bd4cf925c8d70 Cr-Commit-Position: refs/heads/master@{#402622}
4 years, 5 months ago (2016-06-29 00:47:09 UTC) #23
shans
4 years, 5 months ago (2016-06-29 01:39:01 UTC) #24
Message was sent while issue was closed.
A revert of this CL (patchset #4 id:120001) has been created in
https://codereview.chromium.org/2109633003/ by shans@chromium.org.

The reason for reverting is: This seems to have broken compile on Windows 8:

[46/8396] CXX
obj\third_party\ced\src\compact_enc_det\ced.compact_enc_det_hint_code.obj
FAILED:
obj/third_party/ced/src/compact_enc_det/ced.compact_enc_det_hint_code.obj 
ninja -t msvc -e environment.x86 -- C:\b\build\slave\cache\cipd\goma/gomacc
"C:\b\depot_tools\win_toolchain\vs_files\95ddda401ec5678f15eeed01d2bee08fcbc5ee97\VC\bin\amd64_x86\cl.exe"
/nologo /showIncludes /FC
@obj\third_party\ced\src\compact_enc_det\ced.compact_enc_det_hint_code.obj.rsp
/c ..\..\third_party\ced\src\compact_enc_det\compact_enc_det_hint_code.cc
/Foobj\third_party\ced\src\compact_enc_det\ced.compact_enc_det_hint_code.obj
/Fdobj\third_party\ced\ced.cc.pdb 
c:\b\build\slave\win8_gyp\build\src\third_party\ced\src\compact_enc_det\compact_enc_det_hint_code.cc(122):
error C2220: warning treated as error - no 'object' file generated
c:\b\build\slave\win8_gyp\build\src\third_party\ced\src\compact_enc_det\compact_enc_det_hint_code.cc(122):
warning C4018: '<': signed/unsigned mismatch
c:\b\build\slave\win8_gyp\build\src\third_party\ced\src\compact_enc_det\compact_enc_det_hint_code.cc(151):
warning C4018: '<': signed/unsigned mismatch
c:\b\build\slave\win8_gyp\build\src\third_party\ced\src\compact_enc_det\compact_enc_det_hint_code.cc(169):
warning C4018: '<': signed/unsigned mismatch

https://build.chromium.org/p/chromium.win/builders/Win8%20GYP/builds/89.

Powered by Google App Engine
This is Rietveld 408576698