Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(608)

Issue 2168003003: Replace ICU encoding detection with CED (Closed)

Created:
4 years, 5 months ago by Jinsuk Kim
Modified:
4 years, 4 months ago
CC:
Paweł Hajdan Jr., cbentzel+watch_chromium.org, chromium-reviews, darin-cc_chromium.org, jam, jshin+watch_chromium.org, oshima+watch_chromium.org, stevenjb+watch_chromium.org
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Replace ICU encoding detection with CED CED (Compact Encoding Detection) is a better alternative to ICU in terms of speed and accuracy. This CL switches the encoding detection library to CED and fixes some of the failing cases. BUG=630113 R=armansito@chromium.org, cbentzel@chromium.org, ellyjones@chromium.org, jshin@chromium.org, phajdan.jr@chromium.org, sievers@chromium.org, thestig@chromium.org Committed: https://chromium.googlesource.com/chromium/src/+/70de1704feea35840fbed4bc1e864a24b620569e

Patch Set 1 : icu -> ced #

Patch Set 2 : temorarily reverted test files #

Total comments: 6

Patch Set 3 : addressed comments #

Patch Set 4 : fix build #

Patch Set 5 : HTML5 mode #

Total comments: 6

Patch Set 6 : addressed comments #

Total comments: 6

Patch Set 7 : comments/add datafiles back #

Total comments: 2
Unified diffs Side-by-side diffs Delta from patch set Stats (+281 lines, -288 lines) Patch
M base/BUILD.gn View 1 2 3 4 5 6 2 chunks +3 lines, -2 lines 0 comments Download
M base/DEPS View 1 chunk +1 line, -0 lines 0 comments Download
M base/base.gyp View 1 2 3 4 5 6 2 chunks +2 lines, -0 lines 0 comments Download
M base/base.gypi View 1 2 3 4 5 6 1 chunk +2 lines, -2 lines 0 comments Download
M base/base_nacl.gyp View 1 2 3 1 chunk +1 line, -0 lines 0 comments Download
A base/i18n/encoding_detection.h View 1 2 3 4 5 6 1 chunk +21 lines, -0 lines 0 comments Download
A base/i18n/encoding_detection.cc View 1 2 3 4 5 6 1 chunk +41 lines, -0 lines 0 comments Download
D base/i18n/icu_encoding_detection.h View 1 chunk +0 lines, -30 lines 0 comments Download
D base/i18n/icu_encoding_detection.cc View 1 chunk +0 lines, -100 lines 0 comments Download
M chromeos/network/network_state_unittest.cc View 1 2 3 1 chunk +2 lines, -2 lines 0 comments Download
M chromeos/network/shill_property_util.cc View 1 chunk +1 line, -1 line 0 comments Download
M chromeos/test/data/network/shill_wifi_non_utf8_ssid.json View 1 2 3 1 chunk +2 lines, -3 lines 0 comments Download
M chromeos/test/data/network/translation_of_shill_wifi_non_utf8_ssid.onc View 1 2 3 1 chunk +2 lines, -2 lines 0 comments Download
M content/child/BUILD.gn View 1 2 3 4 5 6 1 chunk +1 line, -1 line 0 comments Download
M content/child/ftp_directory_listing_response_delegate.cc View 1 2 chunks +2 lines, -2 lines 0 comments Download
M ios/net/protocol_handler_util.mm View 1 chunk +1 line, -1 line 0 comments Download
M net/data/ftp/dir-listing-ls-20.expected View 2 3 4 5 6 14 chunks +14 lines, -14 lines 0 comments Download
M net/data/ftp/dir-listing-ls-21.expected View 2 3 4 5 6 24 chunks +24 lines, -24 lines 0 comments Download
M net/data/ftp/dir-listing-ls-22.expected View 2 3 4 5 6 29 chunks +55 lines, -55 lines 0 comments Download
M net/ftp/ftp_directory_listing_parser.cc View 2 chunks +17 lines, -23 lines 2 comments Download
M pdf/pdfium/pdfium_engine.cc View 1 2 3 4 5 6 1 chunk +1 line, -1 line 0 comments Download
M third_party/ced/BUILD.gn View 1 2 3 4 5 3 chunks +7 lines, -3 lines 0 comments Download
M third_party/ced/ced.gyp View 1 2 3 5 4 chunks +9 lines, -22 lines 0 comments Download
A third_party/ced/ced.gypi View 1 2 3 1 chunk +31 lines, -0 lines 0 comments Download
A third_party/ced/ced_nacl.gyp View 1 2 3 1 chunk +41 lines, -0 lines 0 comments Download

Messages

Total messages: 105 (69 generated)
Jinsuk Kim
jshin@: Let me have your initial review on overall change. I'll take care of some ...
4 years, 5 months ago (2016-07-21 07:50:10 UTC) #4
Jinsuk Kim
On 2016/07/21 07:50:10, Jinsuk wrote: > jshin@: Let me have your initial review on overall ...
4 years, 5 months ago (2016-07-22 05:08:21 UTC) #9
Jinsuk Kim
phajdan.jr@chromium.org: Please review changes in net/ftp
4 years, 5 months ago (2016-07-22 06:10:49 UTC) #15
Paweł Hajdan Jr.
https://codereview.chromium.org/2168003003/diff/60001/base/i18n/encoding_detection.h File base/i18n/encoding_detection.h (right): https://codereview.chromium.org/2168003003/diff/60001/base/i18n/encoding_detection.h#newcode16 base/i18n/encoding_detection.h:16: BASE_I18N_EXPORT bool DetectEncoding(const std::string& text, nit: Add WARN_UNUSED_RESULT https://codereview.chromium.org/2168003003/diff/60001/net/ftp/ftp_directory_listing_parser_unittest.cc ...
4 years, 5 months ago (2016-07-22 13:22:19 UTC) #21
Jinsuk Kim
https://codereview.chromium.org/2168003003/diff/60001/base/i18n/encoding_detection.h File base/i18n/encoding_detection.h (right): https://codereview.chromium.org/2168003003/diff/60001/base/i18n/encoding_detection.h#newcode16 base/i18n/encoding_detection.h:16: BASE_I18N_EXPORT bool DetectEncoding(const std::string& text, On 2016/07/22 13:22:19, Paweł ...
4 years, 5 months ago (2016-07-24 22:24:46 UTC) #25
Jinsuk Kim
jshin@chromium.org : Please review changes in base/i18n thestig@chromium.org: Please review changes in pdf, base/ ellyjones@chromium.org: ...
4 years, 5 months ago (2016-07-25 07:52:28 UTC) #39
Elly Fong-Jones
/ios lgtm
4 years, 5 months ago (2016-07-25 17:28:12 UTC) #40
Lei Zhang
looks ok, but there are red bots.
4 years, 5 months ago (2016-07-25 17:44:20 UTC) #41
jungshik at Google
I'm also worried about encoding names and non-html5 encodings coming out of CED (especially the ...
4 years, 5 months ago (2016-07-26 00:24:38 UTC) #50
armansito
chromeos/ lgtm
4 years, 4 months ago (2016-07-26 19:51:55 UTC) #67
Jinsuk Kim
thestig@: All builds are green now. jshin@: Introduced HTML5_MODE in https://crrev.com/2188663002 to handle non-HTML5 encodings ...
4 years, 4 months ago (2016-07-27 02:22:21 UTC) #68
jungshik at Google
Terribly sorry for the delay. My previous comment about 7-bit encoding can be addressed by ...
4 years, 4 months ago (2016-07-29 06:35:46 UTC) #69
jungshik at Google
On 2016/07/27 02:22:21, Jinsuk wrote: > thestig@: All builds are green now. > > jshin@: ...
4 years, 4 months ago (2016-07-29 06:44:24 UTC) #70
Jinsuk Kim
On 2016/07/29 06:44:24, jungshik at google wrote: > On 2016/07/27 02:22:21, Jinsuk wrote: > > ...
4 years, 4 months ago (2016-07-29 07:07:27 UTC) #71
Jinsuk Kim
On 2016/07/29 07:07:27, Jinsuk wrote: > On 2016/07/29 06:44:24, jungshik at google wrote: > > ...
4 years, 4 months ago (2016-07-29 07:12:03 UTC) #72
Jinsuk Kim
On 2016/07/29 07:12:03, Jinsuk wrote: > On 2016/07/29 07:07:27, Jinsuk wrote: > > On 2016/07/29 ...
4 years, 4 months ago (2016-07-29 07:20:54 UTC) #73
Jinsuk Kim
https://codereview.chromium.org/2168003003/diff/240001/base/i18n/encoding_detection.cc File base/i18n/encoding_detection.cc (right): https://codereview.chromium.org/2168003003/diff/240001/base/i18n/encoding_detection.cc#newcode18 base/i18n/encoding_detection.cc:18: CompactEncDet::WEB_CORPUS, On 2016/07/29 06:35:46, jungshik at google wrote: > ...
4 years, 4 months ago (2016-08-01 07:48:15 UTC) #78
Jinsuk Kim
On 2016/08/01 07:48:15, Jinsuk wrote: > https://codereview.chromium.org/2168003003/diff/240001/base/i18n/encoding_detection.cc > File base/i18n/encoding_detection.cc (right): > > https://codereview.chromium.org/2168003003/diff/240001/base/i18n/encoding_detection.cc#newcode18 > ...
4 years, 4 months ago (2016-08-04 06:41:30 UTC) #79
Paweł Hajdan Jr.
https://codereview.chromium.org/2168003003/diff/60001/net/ftp/ftp_directory_listing_parser_unittest.cc File net/ftp/ftp_directory_listing_parser_unittest.cc (right): https://codereview.chromium.org/2168003003/diff/60001/net/ftp/ftp_directory_listing_parser_unittest.cc#newcode145 net/ftp/ftp_directory_listing_parser_unittest.cc:145: // {"dir-listing-ls-20", OK}, On 2016/07/24 at 22:24:46, Jinsuk wrote: ...
4 years, 4 months ago (2016-08-08 09:00:48 UTC) #81
Jinsuk Kim
https://codereview.chromium.org/2168003003/diff/60001/net/ftp/ftp_directory_listing_parser_unittest.cc File net/ftp/ftp_directory_listing_parser_unittest.cc (right): https://codereview.chromium.org/2168003003/diff/60001/net/ftp/ftp_directory_listing_parser_unittest.cc#newcode145 net/ftp/ftp_directory_listing_parser_unittest.cc:145: // {"dir-listing-ls-20", OK}, On 2016/08/08 09:00:48, Paweł Hajdan Jr. ...
4 years, 4 months ago (2016-08-08 11:28:26 UTC) #82
Paweł Hajdan Jr.
Okay. LGTM.
4 years, 4 months ago (2016-08-08 20:49:19 UTC) #83
Lei Zhang
On 2016/07/25 07:52:28, Jinsuk wrote: > mailto:jshin@chromium.org : Please review changes in base/i18n > > ...
4 years, 4 months ago (2016-08-08 22:02:35 UTC) #84
jungshik at Google
LGTM Thanks ! (and sorry).
4 years, 4 months ago (2016-08-08 22:07:07 UTC) #85
Lei Zhang
lgtm with some nits: https://codereview.chromium.org/2168003003/diff/260001/base/i18n/encoding_detection.cc File base/i18n/encoding_detection.cc (right): https://codereview.chromium.org/2168003003/diff/260001/base/i18n/encoding_detection.cc#newcode12 base/i18n/encoding_detection.cc:12: int consumedBytes; nit: foo_bar, not ...
4 years, 4 months ago (2016-08-08 22:09:55 UTC) #86
Jinsuk Kim
Thanks! sievers@ Could you review content/? I'd like a owner's lgtm. Thanks.
4 years, 4 months ago (2016-08-08 22:14:00 UTC) #88
no sievers
On 2016/08/08 22:14:00, Jinsuk wrote: > Thanks! > > sievers@ Could you review content/? I'd ...
4 years, 4 months ago (2016-08-08 23:26:32 UTC) #89
Jinsuk Kim
cbentzel@chromium.org: Please review changes in net/data - the old test data files are wrong (lots ...
4 years, 4 months ago (2016-08-09 01:16:19 UTC) #91
cbentzel
On 2016/08/09 01:16:19, Jinsuk wrote: > mailto:cbentzel@chromium.org: Please review changes in net/data - the old ...
4 years, 4 months ago (2016-08-10 01:50:23 UTC) #92
cbentzel
On 2016/08/09 01:16:19, Jinsuk wrote: > mailto:cbentzel@chromium.org: Please review changes in net/data - the old ...
4 years, 4 months ago (2016-08-10 01:50:29 UTC) #93
cbentzel
https://codereview.chromium.org/2168003003/diff/280001/net/ftp/ftp_directory_listing_parser.cc File net/ftp/ftp_directory_listing_parser.cc (right): https://codereview.chromium.org/2168003003/diff/280001/net/ftp/ftp_directory_listing_parser.cc#newcode93 net/ftp/ftp_directory_listing_parser.cc:93: if (base::CodepageToUTF16(text, encoding_name, I assume that encoding names are ...
4 years, 4 months ago (2016-08-10 01:52:00 UTC) #94
Jinsuk Kim
https://codereview.chromium.org/2168003003/diff/280001/net/ftp/ftp_directory_listing_parser.cc File net/ftp/ftp_directory_listing_parser.cc (right): https://codereview.chromium.org/2168003003/diff/280001/net/ftp/ftp_directory_listing_parser.cc#newcode93 net/ftp/ftp_directory_listing_parser.cc:93: if (base::CodepageToUTF16(text, encoding_name, On 2016/08/10 01:52:00, cbentzel wrote: > ...
4 years, 4 months ago (2016-08-10 02:15:14 UTC) #95
Jinsuk Kim
On 2016/08/10 01:50:29, cbentzel wrote: > On 2016/08/09 01:16:19, Jinsuk wrote: > > mailto:cbentzel@chromium.org: Please ...
4 years, 4 months ago (2016-08-10 07:41:25 UTC) #96
cbentzel
On 2016/08/10 07:41:25, Jinsuk wrote: > On 2016/08/10 01:50:29, cbentzel wrote: > > On 2016/08/09 ...
4 years, 4 months ago (2016-08-10 11:28:28 UTC) #97
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2168003003/280001
4 years, 4 months ago (2016-08-10 21:44:09 UTC) #100
commit-bot: I haz the power
Patchset 7 (id:??) landed as https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e Cr-Commit-Position: refs/heads/master@{#411163}
4 years, 4 months ago (2016-08-10 22:29:25 UTC) #103
Jinsuk Kim
4 years, 4 months ago (2016-08-10 22:30:36 UTC) #105
Message was sent while issue was closed.
Committed patchset #7 (id:280001) manually as
70de1704feea35840fbed4bc1e864a24b620569e.

Powered by Google App Engine
This is Rietveld 408576698