|
|
Chromium Code Reviews|
Created:
3 years, 9 months ago by Jinsuk Kim Modified:
3 years, 9 months ago Reviewers:
tkent CC:
blink-reviews, chromium-reviews, kinuko+watch Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionConvert non-WHATWG text encoding to ASCII
CED is returning text encodings not supported by WHATWG
standard, which Blink refused to accept. It can cause
an unexpected bug. This CL converts those encoding to ASCII
so that raw bytes of the text remain intact.
BUG=698605
Review-Url: https://codereview.chromium.org/2737033003
Cr-Commit-Position: refs/heads/master@{#455560}
Committed: https://chromium.googlesource.com/chromium/src/+/a86006db0adbe5bda6789d2a14d0805ce6273596
Patch Set 1 #
Total comments: 2
Patch Set 2 : shift-jis variants #
Total comments: 2
Patch Set 3 : comment #
Messages
Total messages: 23 (12 generated)
jinsukkim@chromium.org changed reviewers: + tkent@chromium.org
Got the list of encodings to convert to ASCII by filtering what CED can return https://cs.chromium.org/chromium/src/third_party/ced/src/util/encodings/encod... with WHATWG standard i.e. |textEncodingNameMap| used in Blink https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/text/TextE... cannot recognize.
https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... File third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp (right): https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp:99: case SOFTBANK_ISO_2022_JP: Should detectTextEncoding() return |false| for these encodings? I'm afraid pages detected as JAPANESE_CP932, which is a superset of JAPNANESE_SHIFT_JIS, can't be readable even if Shift_JIS is the user's default encoding.
https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... File third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp (right): https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp:99: case SOFTBANK_ISO_2022_JP: On 2017/03/08 02:17:05, tkent wrote: > Should detectTextEncoding() return |false| for these encodings? > > I'm afraid pages detected as JAPANESE_CP932, which is a superset of > JAPNANESE_SHIFT_JIS, can't be readable even if Shift_JIS is the user's default > encoding. Hm, returning false is not the solution for the linked bug. Blink will keep trying detection till the end, and the problem will persist. The choice left users viewing CP932 will be to get the document as ascii (by this CL), and then use manual encoding to shift-jis to salvage whatever is viewable in it. I don't think CP932-encoded document is not meant for Chrome anyway.
On 2017/03/08 at 02:27:06, jinsukkim wrote: > https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... > File third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp (right): > > https://codereview.chromium.org/2737033003/diff/1/third_party/WebKit/Source/p... > third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp:99: case SOFTBANK_ISO_2022_JP: > On 2017/03/08 02:17:05, tkent wrote: > > Should detectTextEncoding() return |false| for these encodings? > > > > I'm afraid pages detected as JAPANESE_CP932, which is a superset of > > JAPNANESE_SHIFT_JIS, can't be readable even if Shift_JIS is the user's default > > encoding. > > Hm, returning false is not the solution for the linked bug. Blink will keep trying detection till the end, and the problem will persist. > > The choice left users viewing CP932 will be to get the document as ascii (by this CL), and then use manual encoding to shift-jis to salvage whatever is viewable in it. I don't think CP932-encoded document is not meant for Chrome anyway. oh, CP932 is actually the problem trigger of the bug. CP932 is very popular variant of Shift_JIS, and showing it as ASCII isn't acceptable for Japanese users. How about adding |if hintUserLanguage starts with "ja" and IsShiftJisOrVariant(encoding)) { *detectedEncoding = WTF::TextEncoding("Shift_JIS"); return true; }| before |switch (encoding)|?
PTAL. I'd still like to to keep the conversion for other non-WHATWG encodings in case any of them causes similar issues.
On 2017/03/08 03:15:10, Jinsuk Kim wrote: > PTAL. > > I'd still like to to keep the conversion for other non-WHATWG encodings in case > any of them causes similar issues. I'm thinking of moving the post-detection conversion to ced some time. Do you think this Shift-JIS conversion is safe to move to there too? It will be triggered when WHATWG mode is set to true.
> I'm thinking of moving the post-detection conversion to ced some time. Do you > think this Shift-JIS conversion is safe to move to there too? It will be > triggered when WHATWG mode is set to true. I think it's reasonable. https://codereview.chromium.org/2737033003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp (right): https://codereview.chromium.org/2737033003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp:85: case MSFT_CP874: We still need to list Shift_JIS variants for non-Japanese locale.
https://codereview.chromium.org/2737033003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp (right): https://codereview.chromium.org/2737033003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp:85: case MSFT_CP874: On 2017/03/08 03:43:37, tkent wrote: > We still need to list Shift_JIS variants for non-Japanese locale. Right. done.
lgtm
The CQ bit was checked by jinsukkim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_arm64_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_arm6...) android_clang_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_clan...) android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...) android_cronet on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_cron...) android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...) cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_a...) linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by jinsukkim@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by jinsukkim@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 40001, "attempt_start_ts": 1489011270376610,
"parent_rev": "2b5d08b7569cbefecc4bb0eba8f0da15b282672d", "commit_rev":
"a86006db0adbe5bda6789d2a14d0805ce6273596"}
Message was sent while issue was closed.
Description was changed from ========== Convert non-WHATWG text encoding to ASCII CED is returning text encodings not supported by WHATWG standard, which Blink refused to accept. It can cause an unexpected bug. This CL converts those encoding to ASCII so that raw bytes of the text remain intact. BUG=698605 ========== to ========== Convert non-WHATWG text encoding to ASCII CED is returning text encodings not supported by WHATWG standard, which Blink refused to accept. It can cause an unexpected bug. This CL converts those encoding to ASCII so that raw bytes of the text remain intact. BUG=698605 Review-Url: https://codereview.chromium.org/2737033003 Cr-Commit-Position: refs/heads/master@{#455560} Committed: https://chromium.googlesource.com/chromium/src/+/a86006db0adbe5bda6789d2a14d0... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/a86006db0adbe5bda6789d2a14d0... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
