Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(33)

Side by Side Diff: third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp

Issue 2737033003: Convert non-WHATWG text encoding to ASCII (Closed)
Patch Set: shift-jis variants Created 3 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 /* 1 /*
2 * Copyright (C) 2008, 2009 Google Inc. All rights reserved. 2 * Copyright (C) 2008, 2009 Google Inc. All rights reserved.
3 * 3 *
4 * Redistribution and use in source and binary forms, with or without 4 * Redistribution and use in source and binary forms, with or without
5 * modification, are permitted provided that the following conditions are 5 * modification, are permitted provided that the following conditions are
6 * met: 6 * met:
7 * 7 *
8 * * Redistributions of source code must retain the above copyright 8 * * Redistributions of source code must retain the above copyright
9 * notice, this list of conditions and the following disclaimer. 9 * notice, this list of conditions and the following disclaimer.
10 * * Redistributions in binary form must reproduce the above 10 * * Redistributions in binary form must reproduce the above
(...skipping 44 matching lines...) Expand 10 before | Expand all | Expand 10 after
55 55
56 // Should return false if the detected encoding is UTF8. This helps prevent 56 // Should return false if the detected encoding is UTF8. This helps prevent
57 // modern web sites from neglecting proper encoding labelling and simply 57 // modern web sites from neglecting proper encoding labelling and simply
58 // relying on browser-side encoding detection. Encoding detection is supposed 58 // relying on browser-side encoding detection. Encoding detection is supposed
59 // to work for web sites with legacy encoding only. Detection failure leads 59 // to work for web sites with legacy encoding only. Detection failure leads
60 // |TextResourceDecoder| to use its default encoding determined from system 60 // |TextResourceDecoder| to use its default encoding determined from system
61 // locale or TLD. 61 // locale or TLD.
62 if (encoding == UNKNOWN_ENCODING || encoding == UTF8) 62 if (encoding == UNKNOWN_ENCODING || encoding == UTF8)
63 return false; 63 return false;
64 64
65 // 7-bit encodings (except ISO-2022-JP) are not supported in WHATWG encoding 65 // Map all the Shift-JIS variants to Shift-JIS.
66 // standard. Mark them as ASCII to keep the raw bytes intact. 66 if (hintUserLanguage && !strncmp(hintUserLanguage, "ja", 2) &&
67 IsShiftJisOrVariant(encoding)) {
68 encoding = JAPANESE_SHIFT_JIS;
69 }
70
71 // 7-bit encodings (except ISO-2022-JP), and some obscure encodings not
72 // supported in WHATWG encoding standard are marked as ASCII to keep the raw
73 // bytes intact.
74 // TODO(jinsukkim): Put this conversion into CED library, and enable "WHATWG"
75 // mode.
67 switch (encoding) { 76 switch (encoding) {
68 case HZ_GB_2312: 77 case HZ_GB_2312:
69 case ISO_2022_KR: 78 case ISO_2022_KR:
70 case ISO_2022_CN: 79 case ISO_2022_CN:
71 case UTF7: 80 case UTF7:
81
82 case CHINESE_EUC_DEC:
83 case CHINESE_CNS:
84 case CHINESE_BIG5_CP950:
85 case MSFT_CP874:
tkent 2017/03/08 03:43:37 We still need to list Shift_JIS variants for non-J
Jinsuk Kim 2017/03/08 04:05:54 Right. done.
86 case TSCII:
87 case TAMIL_MONO:
88 case TAMIL_BI:
89 case JAGRAN:
90 case BHASKAR:
91 case HTCHANAKYA:
92 case BINARYENC:
93 case UTF8UTF8:
94 case TAM_ELANGO:
95 case TAM_LTTMBARANI:
96 case TAM_SHREE:
97 case TAM_TBOOMIS:
98 case TAM_TMNEWS:
99 case TAM_WEBTAMIL:
100 case KDDI_ISO_2022_JP:
101 case SOFTBANK_ISO_2022_JP:
72 encoding = ASCII_7BIT; 102 encoding = ASCII_7BIT;
73 break; 103 break;
74 default: 104 default:
75 break; 105 break;
76 } 106 }
77 *detectedEncoding = WTF::TextEncoding(MimeEncodingName(encoding)); 107 *detectedEncoding = WTF::TextEncoding(MimeEncodingName(encoding));
78 return true; 108 return true;
79 } 109 }
80 110
81 } // namespace blink 111 } // namespace blink
OLDNEW
« no previous file with comments | « no previous file | third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698