Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(621)

Issue 1258813002: Implement a new IDN display policy (Closed)

Created:
5 years, 5 months ago by jungshik at Google
Modified:
4 years, 9 months ago
CC:
chromium-reviews, cbentzel+watch_chromium.org, brentlondon_google.com, jshin+watch_chromium.org
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Implement a new IDN display policy The new policy is language-indepedent, implemented with ICU's uspoof API and is as following: 1. Use moderately restrictive rules for script mixing [1] with additional restrictions on mixing with Latin. - Script mixing is only allowed with ASCII-Latin (instead of any Latin) + another script allowed at the moderatate restriction level 2. Only allow the recommended sets from UTS 39 [2] and inclusion sets from UAX 31 [3]. This is equivalent to [:IdentifierStatus=Allowed:] [4]. 3. Allow 5 aspirational scripts from UAX 31 [5] 4. Do not allow labels with two or more numbering systems mixed. 5. Do not allow invisible characters or a sequence of the same combining mark. 6. Turn off whole script confusable check. It'd block some common domain labels like рф (IDN ccTLD for '.ru'), 'bücher' (German) and 'färgbolaget' (Swedish). 7. Keep ON 'mixed script confusable' check. This is different/separate from 'script mixing restriction' and will catch cases like 'gօօgle' with 'օ' (U+0585; Armenian Small Letter OH) [6] that would be otherwise allowed by rules #1 ~ #5. 8. Block 4 Katakanas surrounded by non-Japanese scripts because they could be mistaken as a slash. (this has been in place for a few years and is kept.) 9. Labels with any of four deviation characters (IDNA 2003 vs IDNA 2008) encoded in punycode/ACE are always shown in Punycode. This is to make the display policy consistent with our prior decision to use UTS 46 'transitional' processing (map or drop the 4 deviation characters.). [9] 10. Character black list (Mozilla's : [8]) is trimmed down to two characters. Note that this is almost identical to Mozilla's IDN display algorithm [7] except for #7, #8, and an additional restrictions in #1. #9 is another difference because of Mozilla's use of UTS 46 'non-transitional' processing and our use of UTS 46 'transitional' processing. Most of domains filtered out in ".com" TLD is filtered due to the character set restrictiction (#2 and #3) that accounts for 94% (2,050) of IDNs filtered out (0.2% of ~ 1 million IDNs in com TLD). All the IDN TLDs are shown in Unicode. So are all the IDNs in the effective TLD list, ".рф" (~ 860k), and ".みんな" (~25k). 48 out of 200k in ".xyz" and 3 out of 25k in ".jp" are filtered and shown in punycode. P.S. This CL keeps 'languages' parameter for the public APIs. I'll follow up this CL with another to get rid of that parameter and adjust callers. P.S.2: http://dev.chromium.org/developers/design-documents/idn-in-google-chrome will be updated after this CL is landed. [1] http://www.unicode.org/reports/tr39/#Restriction_Level_Detection [2] http://www.unicode.org/reports/tr39 http://www.unicode.org/Public/security/latest/xidmodifications.txt [3] http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts [4] http://goo.gl/L3WD1s [5] http://www.unicode.org/reports/tr31/#Aspirational_Use_Scripts [6] http://unicode.org/cldr/utility/confusables.jsp?a=o&r=None [7] https://wiki.mozilla.org/IDN_Display_Algorithm [8] http://kb.mozillazine.org/Network.IDN.blacklist_chars : Most of them are blocked or mapped any way by other restrictions/mechanism in place. See https://bugzilla.mozilla.org/show_bug.cgi?id=1257108 [9] This is to "fix" bug 595263 BUG=336973, 595263 TEST=components_unittests --gtest_filter=*IDN*, --gtest_filter=UrlForm*, --gtest_filter=*Puny* Committed: https://crrev.com/62a928390ba06db29576bbb32606696b3e16a66c Cr-Commit-Position: refs/heads/master@{#382029}

Patch Set 1 #

Patch Set 2 : use lazy instance, remove the old code #

Patch Set 3 : a little bit reorg #

Patch Set 4 : restore languages as dummy arg for now #

Patch Set 5 : restore languages in one more function #

Patch Set 6 : add back languages to one more, update comments #

Total comments: 15

Patch Set 7 : manual rebase after move to components #

Patch Set 8 : format url test fix #

Patch Set 9 : review comments addressed + alpha #

Patch Set 10 : typo fix #

Total comments: 36

Patch Set 11 : rebased #

Patch Set 12 : address review comments + fix some test (still WiP) #

Patch Set 13 : whole_script check experiment #

Patch Set 14 : back to disable wholescript conf. check + comment update #

Patch Set 15 : switch to moderately restrictive + turn off wholescirpt confusable check #

Patch Set 16 : fix typos, clean up the allowed set. #

Patch Set 17 : comment clean up #

Patch Set 18 : more tweaks + temporary log for testing #

Patch Set 19 : U+30FB/FC tweak #

Patch Set 20 : more tweaking for U+30FC/B handling #

Patch Set 21 : dangerous check - thread safety #

Patch Set 22 : deviation character check added #

Patch Set 23 : add new IDN tests and drop special casing of U+30FC/B #

Patch Set 24 : adjust two IDN test entries #

Total comments: 42

Patch Set 25 : url_canon test: wchar* needs a surrogate pair on *nix #

Total comments: 10

Patch Set 26 : review commetns addressed #

Patch Set 27 : drop U+2027 and add tests for U+2027 and U+05F4 #

Total comments: 27

Patch Set 28 : use TLS and update comments #

Patch Set 29 : more comment update per Peter #

Unified diffs Side-by-side diffs Delta from patch set Stats (+647 lines, -735 lines) Patch
M components/omnibox/browser/history_url_provider_unittest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 2 chunks +48 lines, -46 lines 0 comments Download
M components/url_formatter/url_formatter.h View 1 2 3 4 5 6 7 8 9 10 11 3 chunks +6 lines, -12 lines 0 comments Download
M components/url_formatter/url_formatter.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 17 chunks +250 lines, -261 lines 0 comments Download
M components/url_formatter/url_formatter_unittest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 18 chunks +336 lines, -416 lines 0 comments Download
M url/url_canon_unittest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 chunk +7 lines, -0 lines 0 comments Download

Messages

Total messages: 82 (40 generated)
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/60001
5 years, 4 months ago (2015-07-28 23:41:18 UTC) #2
commit-bot: I haz the power
Dry run: Try jobs failed on following builders: cast_shell_linux on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/35739) ios_dbg_simulator_ninja on ...
5 years, 4 months ago (2015-07-28 23:51:03 UTC) #4
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/80001
5 years, 4 months ago (2015-07-29 00:05:45 UTC) #6
commit-bot: I haz the power
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_compile_dbg_ng/builds/78983)
5 years, 4 months ago (2015-07-29 00:16:13 UTC) #8
Ryan Sleevi
I don't know why I had these drafts, but uh, have some drafts? :) https://codereview.chromium.org/1258813002/diff/100001/net/base/net_util_icu.cc ...
5 years, 3 months ago (2015-08-28 00:35:15 UTC) #10
jungshik at Google
Thank you for the comments. I just happened to come back to this CL and ...
5 years, 3 months ago (2015-08-28 21:05:25 UTC) #11
jungshik at Google
Ryan, can you take another look? I'll also add a bunch of new IDNs to ...
5 years, 3 months ago (2015-09-01 19:47:07 UTC) #12
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/180001
5 years, 3 months ago (2015-09-02 18:38:43 UTC) #14
commit-bot: I haz the power
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/107339) win8_chromium_ng on ...
5 years, 3 months ago (2015-09-02 19:19:45 UTC) #16
Ryan Sleevi
Sorry for the delay. I've taken another pass at reviewing - while I don't know ...
5 years, 3 months ago (2015-09-08 21:45:59 UTC) #17
Ryan Sleevi
Ping?
5 years ago (2015-12-08 01:39:27 UTC) #18
jungshik at Google
On 2015/12/08 01:39:27, Ryan Sleevi wrote: > Ping? yeah... it's been in my todo list. ...
5 years ago (2015-12-10 01:05:58 UTC) #19
jungshik at Google
On 2015/12/10 01:05:58, jshin (jungshik at g) wrote: > On 2015/12/08 01:39:27, Ryan Sleevi wrote: ...
5 years ago (2015-12-10 01:07:07 UTC) #20
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/320001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/320001
4 years, 10 months ago (2016-02-14 03:17:50 UTC) #25
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
4 years, 10 months ago (2016-02-14 04:23:23 UTC) #27
Ryan Sleevi
I saw you were tweaking this CL - did you mean to publish for review?
4 years, 10 months ago (2016-02-25 22:52:02 UTC) #32
jungshik at Google
Ryan and Peter, Can you take a look at url_formatter changes? Brett, I added one ...
4 years, 9 months ago (2016-03-11 20:05:38 UTC) #39
jungshik at Google
On 2016/02/25 22:52:02, Ryan Sleevi wrote: > I saw you were tweaking this CL - ...
4 years, 9 months ago (2016-03-11 20:08:34 UTC) #40
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/460001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/460001
4 years, 9 months ago (2016-03-11 21:42:57 UTC) #43
commit-bot: I haz the power
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/138939)
4 years, 9 months ago (2016-03-11 22:04:09 UTC) #45
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/480001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/480001
4 years, 9 months ago (2016-03-11 23:54:58 UTC) #47
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
4 years, 9 months ago (2016-03-12 01:00:39 UTC) #49
Peter Kasting
I'm not competent to do a functional review, so this is basically all tiny comment ...
4 years, 9 months ago (2016-03-12 01:25:08 UTC) #51
Ryan Sleevi
I took a quick run at this before having to dash, but overall looks much ...
4 years, 9 months ago (2016-03-12 01:28:13 UTC) #52
jungshik at Google
Thanks a lot for the review, Peter and Ryan. I'll address them tomorrow. (I'm ooo ...
4 years, 9 months ago (2016-03-14 17:36:54 UTC) #53
Peter Kasting
On 2016/03/14 17:36:54, jshin (jungshik at google) wrote: > As for VLOG, actually I can ...
4 years, 9 months ago (2016-03-14 21:41:31 UTC) #54
jungshik at Google
Thanks a lot for the review. I addressed review comments. In addition, I reexamined 'blacklist ...
4 years, 9 months ago (2016-03-16 08:34:55 UTC) #55
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/520001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/520001
4 years, 9 months ago (2016-03-16 08:40:24 UTC) #58
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
4 years, 9 months ago (2016-03-16 10:26:24 UTC) #62
jungshik at Google
Filed a mozilla bug on making two browsers aligned as closely as possible: https://bugzilla.mozilla.org/show_bug.cgi?id=1257275
4 years, 9 months ago (2016-03-16 18:34:03 UTC) #63
jungshik at Google
On 2016/03/16 18:34:03, jshin (jungshik at google) wrote: > Filed a mozilla bug on making ...
4 years, 9 months ago (2016-03-16 18:35:28 UTC) #64
Ryan Sleevi
Like Peter, I struggle with the functional review, but I'm trying to do my best ...
4 years, 9 months ago (2016-03-16 20:30:49 UTC) #65
Peter Kasting
LGTM https://codereview.chromium.org/1258813002/diff/520001/components/omnibox/browser/history_url_provider_unittest.cc File components/omnibox/browser/history_url_provider_unittest.cc (right): https://codereview.chromium.org/1258813002/diff/520001/components/omnibox/browser/history_url_provider_unittest.cc#newcode856 components/omnibox/browser/history_url_provider_unittest.cc:856: // A URL that matches due to a ...
4 years, 9 months ago (2016-03-17 06:01:24 UTC) #68
jungshik at Google
Thank you for the review again, Peter and Ryan. I updated CL to address your ...
4 years, 9 months ago (2016-03-17 07:43:26 UTC) #69
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/560001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/560001
4 years, 9 months ago (2016-03-17 07:47:53 UTC) #71
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
4 years, 9 months ago (2016-03-17 08:45:44 UTC) #73
Ryan Sleevi
LGTM Thanks for taking this on! Exciting times!
4 years, 9 months ago (2016-03-18 02:20:31 UTC) #74
jungshik at Google
Thanks a lot, Ryan, for your patience and careful review ! Brett, can you review/approve ...
4 years, 9 months ago (2016-03-18 05:41:34 UTC) #75
brettw
url_canon LGTM
4 years, 9 months ago (2016-03-18 17:23:26 UTC) #76
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1258813002/560001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1258813002/560001
4 years, 9 months ago (2016-03-18 17:24:40 UTC) #79
commit-bot: I haz the power
Committed patchset #29 (id:560001)
4 years, 9 months ago (2016-03-18 18:42:52 UTC) #80
commit-bot: I haz the power
4 years, 9 months ago (2016-03-18 18:45:39 UTC) #82
Message was sent while issue was closed.
Patchset 29 (id:??) landed as
https://crrev.com/62a928390ba06db29576bbb32606696b3e16a66c
Cr-Commit-Position: refs/heads/master@{#382029}

Powered by Google App Engine
This is Rietveld 408576698