Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(602)

Issue 2784933002: Mitigate spoofing attempt using Latin letters. (Closed)

Created:
3 years, 8 months ago by jungshik at Google
Modified:
3 years, 7 months ago
CC:
chromium-reviews, emilyschechter
Target Ref:
refs/heads/master
Project:
chromium
Visibility:
Public.

Description

Add checks against spoofing attempt at top domains Remove diacritic marks from a hostname and calculate the confusability skeleton of the accent-free name. Look it up in the pre-calculated list of the skeletons of top 10k domains. Removing diacritic marks from a hostname is equivalent to comparing names with the primary collation strength in the root locale. To make them equivalent, three mappings are added (ł > l; ø > o; đ > d) on top of the diacritic-removal. Also add two more mappings ([кĸκ] > k, п > n) to supplement the Unicode's confusables list. Binary file size increase: ~ 59kB for the DAFSA representation of top domain name skeletons. The IDN display policy check takes ~ 2µs longer on the average (3.3 µs => 5.5µs) on my machine per the test run over ~1 million IDNs in com TLD). It adds about 1500 domains to the list of domains to display in Punycode out of ~ 1 million IDNs in com TLD. (3018 => 4571) In addition, disallow combining diarctic marks unless they're preceded by Latin-Greek-Cyrillic. BUG=703750, 714628, 719199, 722639 TEST=components_unittests --gtest_filter=*IDNToUni* CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_chromium_x64_rel_ng,win10_chromium_x64_rel_ng Review-Url: https://codereview.chromium.org/2784933002 Cr-Commit-Position: refs/heads/master@{#473109} Committed: https://chromium.googlesource.com/chromium/src/+/a8add0308ba6067eb3de5a8fe82f9c2f2460ad91

Patch Set 1 #

Patch Set 2 : Use skeletons and sortkeys of top domains for check #

Patch Set 3 : Add top500 domain skeletons/sortkeys #

Patch Set 4 : Add make_topdomain_list generator #

Patch Set 5 : fix format #

Patch Set 6 : Add license headers to new files #

Patch Set 7 : Remove the unused code and add a comment to SimilarToTopDomains() #

Patch Set 8 : fix Windows build error and add test entries #

Patch Set 9 : a bit more cleanup #

Patch Set 10 : use top 10k list instead of 500 #

Patch Set 11 : switch to DAFSA wip; dafsa unittests disabled to work around dep. check tempoarily #

Patch Set 12 : Move dafsa tools/test data to base from net #

Patch Set 13 : generate top domain skeleton/name list with make_dafsa.py #

Patch Set 14 : add gperf files #

Patch Set 15 : fix typos in BUIILD.gn #

Patch Set 16 : fix another build issue - make_dafsa.py path fix #

Patch Set 17 : Update make_top_domains_list to automatically generate gperf files #

Patch Set 18 : Combine two DAFSAs into one (40kB size reduction and speed up) #

Patch Set 19 : Make top_domains subdir and clean up unused files #

Patch Set 20 : Add comments about the values in the gperf file #

Patch Set 21 : Drop domains with more than 3 labels from the top domain list. #

Patch Set 22 : Use UnicodeString::findAndReplace for 3 manual mapping #

Patch Set 23 : another variation in UnicodeString ctor for findAndReplace #

Patch Set 24 : use string16 ReplaceChars #

Patch Set 25 : use string16 one-char replace #

Patch Set 26 : Use RuleBasedTransliterator to handle "ł > l; ø > o; đ > d" #

Patch Set 27 : Call transliterators only for domain names made of Latin. #

Patch Set 28 : Add [\u0300-\u0331] to the set of characters for accent-free comparison #

Patch Set 29 : add similarity check unittests #

Total comments: 12

Patch Set 30 : Move DAFSA-related files back to net/ from base/ #

Patch Set 31 : more fixes: moving dafsa bak to net/ #

Patch Set 32 : StringPiece16 for RemoveAcc, GetSkel.. #

Patch Set 33 : Use 1 for domain names, 2 for skeleton, 3 for both #

Patch Set 34 : use python to generate top domain list from json; drop 10k from gperf and *inc.cc names #

Patch Set 35 : check in alex_names_and_skeletons.gperf #

Total comments: 6

Patch Set 36 : address ncarter's comments #

Patch Set 37 : Remove diacritics and calculate skeleton before look-up #

Patch Set 38 : Drop 'U+04CF (ӏ) -> l', add '[кĸκ] > k; п > n'. Fix accent check #

Patch Set 39 : add a few popular domains #

Patch Set 40 : Truncate '.' at the end of a hostname before similarity check #

Patch Set 41 : add back U+04CF (ӏ) -> 'l' map #

Total comments: 77

Patch Set 42 : pull IDNSpoofChecker to separae h/cc files #

Total comments: 40

Patch Set 43 : rebased against refactoring CL #

Patch Set 44 : rebased wrt master #

Patch Set 45 : Delete top.list (accidentally added) #

Patch Set 46 : Delete two more accidentally added files #

Total comments: 4

Patch Set 47 : Peter's latest comments addressed #

Patch Set 48 : use checked_cast and make win64 happy #

Unified diffs Side-by-side diffs Delta from patch set Stats (+18752 lines, -13 lines) Patch
M components/url_formatter/BUILD.gn View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 2 chunks +2 lines, -0 lines 0 comments Download
M components/url_formatter/idn_spoof_checker.h View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 3 chunks +15 lines, -0 lines 0 comments Download
M components/url_formatter/idn_spoof_checker.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 9 chunks +104 lines, -8 lines 0 comments Download
A components/url_formatter/top_domains/BUILD.gn View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 1 chunk +32 lines, -0 lines 0 comments Download
A components/url_formatter/top_domains/README View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 chunk +16 lines, -0 lines 0 comments Download
A components/url_formatter/top_domains/alexa_domains.list View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 chunk +9176 lines, -0 lines 0 comments Download
A components/url_formatter/top_domains/alexa_skeletons.gperf View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 1 chunk +9186 lines, -0 lines 0 comments Download
A components/url_formatter/top_domains/make_alexa_top_list.py View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 chunk +55 lines, -0 lines 0 comments Download
A components/url_formatter/top_domains/make_top_domain_gperf.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 1 chunk +122 lines, -0 lines 0 comments Download
M components/url_formatter/url_formatter.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 3 chunks +11 lines, -0 lines 0 comments Download
M components/url_formatter/url_formatter_unittest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 4 chunks +33 lines, -5 lines 0 comments Download

Messages

Total messages: 145 (96 generated)
jungshik at Google
Ryan, can you take a look? A lot of files outside components/url_format are due to ...
3 years, 8 months ago (2017-04-20 19:57:23 UTC) #47
jungshik at Google
Adding Peter to the reviewers list now. Peter, please take a look once you're back. ...
3 years, 8 months ago (2017-04-20 19:58:21 UTC) #49
Ryan Sleevi
Going to add two other reviewers, since I think their feedback is important enough to ...
3 years, 8 months ago (2017-04-20 20:04:50 UTC) #51
Ryan Sleevi
And for context: If this goal is solely for .com, we could always download the ...
3 years, 8 months ago (2017-04-20 20:07:12 UTC) #52
Nico
On 2017/04/20 20:04:50, Ryan Sleevi wrote: > Going to add two other reviewers, since I ...
3 years, 8 months ago (2017-04-20 20:12:00 UTC) #53
jungshik at Google
Thanks, Ryan ! I didn't realize that components/ can depend on net/ (in retrospect, it's ...
3 years, 8 months ago (2017-04-20 20:12:34 UTC) #54
jungshik at Google
On 2017/04/20 20:12:00, Nico wrote: > On 2017/04/20 20:04:50, Ryan Sleevi wrote: > > Going ...
3 years, 8 months ago (2017-04-20 20:21:21 UTC) #55
jungshik at Google
On 2017/04/20 20:07:12, Ryan Sleevi wrote: > And for context: If this goal is solely ...
3 years, 8 months ago (2017-04-20 20:37:29 UTC) #56
ncarter (slow)
On 2017/04/20 20:37:29, jungshik at Google wrote: > On 2017/04/20 20:07:12, Ryan Sleevi wrote: > ...
3 years, 8 months ago (2017-04-20 22:12:13 UTC) #59
ncarter (slow)
https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/README File components/url_formatter/top_domains/README (right): https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/README#newcode5 components/url_formatter/top_domains/README:5: src/tools/perf/page_sets/alexa1-10000-urls.json by running the following: IIRC the alexa10000 from ...
3 years, 8 months ago (2017-04-20 22:26:59 UTC) #60
ncarter (slow)
https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/url_formatter.cc File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/url_formatter.cc#newcode557 components/url_formatter/url_formatter.cc:557: if (GetSkeleton(hostname, &skeleton) && LookupMatchInTopDomains(skeleton, 0)) If it's possible ...
3 years, 8 months ago (2017-04-20 23:37:22 UTC) #61
jungshik at Google
https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf File components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf (right): https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf#newcode83 components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf:83: microsoft.com, 1 On 2017/04/20 22:26:59, ncarter wrote: > Here ...
3 years, 8 months ago (2017-04-21 19:31:47 UTC) #62
jungshik at Google
https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/url_formatter.cc File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/url_formatter.cc#newcode557 components/url_formatter/url_formatter.cc:557: if (GetSkeleton(hostname, &skeleton) && LookupMatchInTopDomains(skeleton, 0)) On 2017/04/20 23:37:22, ...
3 years, 8 months ago (2017-04-21 20:16:33 UTC) #63
Ryan Sleevi
On 2017/04/21 20:16:33, jungshik at Google wrote: > > Also, as above, it's worth asking ...
3 years, 8 months ago (2017-04-21 20:24:16 UTC) #64
ncarter (slow)
https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf File components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf (right): https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf#newcode10 components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf:10: // 1: original domain name. Should this file be ...
3 years, 8 months ago (2017-04-21 21:35:34 UTC) #65
jungshik at Google
On 2017/04/21 21:35:34, ncarter wrote: > https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf > File components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf > (right): > > https://codereview.chromium.org/2784933002/diff/490001/components/url_formatter/top_domains/alexa_10k_names_and_skeletons.gperf#newcode10 ...
3 years, 8 months ago (2017-04-21 23:16:13 UTC) #68
Peter Kasting
There are a lot of reviewers on this, most of whom seem more capable than ...
3 years, 8 months ago (2017-04-24 22:00:57 UTC) #71
jungshik at Google
Peter, I need your input on performance. On average (as tested with ~ 1 million ...
3 years, 8 months ago (2017-04-25 05:28:11 UTC) #78
Peter Kasting
On 2017/04/25 05:28:11, jungshik at Google wrote: > Peter, I need your input on performance. ...
3 years, 7 months ago (2017-04-25 22:04:02 UTC) #83
jungshik at Google
On 2017/04/25 22:04:02, Peter Kasting wrote: > On 2017/04/25 05:28:11, jungshik at Google wrote: > ...
3 years, 7 months ago (2017-04-26 00:30:50 UTC) #84
ncarter (slow)
On 2017/04/26 00:30:50, jungshik at Google wrote: > On 2017/04/25 22:04:02, Peter Kasting wrote: > ...
3 years, 7 months ago (2017-04-26 17:28:23 UTC) #85
ncarter (slow)
the dafsa usage seems correct, so lgtm from that perspective https://codereview.chromium.org/2784933002/diff/610001/components/url_formatter/url_formatter.cc File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2784933002/diff/610001/components/url_formatter/url_formatter.cc#newcode214 ...
3 years, 7 months ago (2017-04-26 18:46:29 UTC) #86
jungshik at Google
Thank you for the explanation and suggestion. > On 2017/04/26 00:30:50, jungshik at Google wrote: ...
3 years, 7 months ago (2017-04-26 19:12:38 UTC) #87
jungshik at Google
Thank you for LGTM. Addressed the comments in the latest PS. https://codereview.chromium.org/2784933002/diff/610001/components/url_formatter/url_formatter.cc File components/url_formatter/url_formatter.cc (right): ...
3 years, 7 months ago (2017-04-26 19:36:21 UTC) #88
ncarter (slow)
On 2017/04/26 19:12:38, jungshik at Google wrote: > Thank you for the explanation and suggestion. ...
3 years, 7 months ago (2017-04-26 20:20:40 UTC) #89
jungshik at Google
On 2017/04/26 20:20:40, ncarter wrote: > On 2017/04/26 19:12:38, jungshik at Google wrote: > > ...
3 years, 7 months ago (2017-05-03 00:25:18 UTC) #92
jungshik at Google
Peter, I also need your owner review/approval as well. Thank you, This is not perfect ...
3 years, 7 months ago (2017-05-08 23:36:01 UTC) #107
jungshik at Google
Oops. The beginning of my reply is lost in copy'n'paste. Ryan, can you take a ...
3 years, 7 months ago (2017-05-08 23:37:30 UTC) #108
Ryan Sleevi
I'm not sure I feel qualified enough to review this from a policy side, and ...
3 years, 7 months ago (2017-05-09 00:00:29 UTC) #110
Peter Kasting
https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/README File components/url_formatter/top_domains/README (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/README#newcode2 components/url_formatter/top_domains/README:2: It is an input to make_top_domain_list and is made ...
3 years, 7 months ago (2017-05-09 01:37:04 UTC) #111
jungshik at Google
https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/BUILD.gn File components/url_formatter/top_domains/BUILD.gn (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/BUILD.gn#newcode21 components/url_formatter/top_domains/BUILD.gn:21: executable("make_top_domain_gperf") { On 2017/05/09 00:00:29, Ryan Sleevi wrote: > ...
3 years, 7 months ago (2017-05-09 19:57:39 UTC) #112
jungshik at Google
https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/make_alexa_top_list.py File components/url_formatter/top_domains/make_alexa_top_list.py (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/make_alexa_top_list.py#newcode21 components/url_formatter/top_domains/make_alexa_top_list.py:21: "page_sets", "alexa1-10000-urls.json") On 2017/05/09 00:00:29, Ryan Sleevi wrote: > ...
3 years, 7 months ago (2017-05-09 20:19:42 UTC) #113
Peter Kasting
https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/make_alexa_top_list.py File components/url_formatter/top_domains/make_alexa_top_list.py (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/make_alexa_top_list.py#newcode21 components/url_formatter/top_domains/make_alexa_top_list.py:21: "page_sets", "alexa1-10000-urls.json") On 2017/05/09 19:57:39, jungshik at Google wrote: ...
3 years, 7 months ago (2017-05-09 20:50:14 UTC) #114
jungshik at Google
Addressed most of Peter's comments in the latest PS https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/README File components/url_formatter/top_domains/README (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/top_domains/README#newcode2 components/url_formatter/top_domains/README:2: ...
3 years, 7 months ago (2017-05-10 18:05:16 UTC) #115
jungshik at Google
Addressed most of Peter's comments in the latest PS
3 years, 7 months ago (2017-05-10 18:05:21 UTC) #116
jungshik at Google
On 2017/05/10 18:05:21, jungshik at Google wrote: > Addressed most of Peter's comments in the ...
3 years, 7 months ago (2017-05-10 18:06:30 UTC) #117
ncarter (slow)
+nparker for the following question: This proposed defense against IDN homograph spoofing requires a list ...
3 years, 7 months ago (2017-05-10 18:41:34 UTC) #118
Peter Kasting
This is looking pretty good. https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/url_formatter.cc File components/url_formatter/url_formatter.cc (right): https://codereview.chromium.org/2784933002/diff/730001/components/url_formatter/url_formatter.cc#newcode515 components/url_formatter/url_formatter.cc:515: const size_t kNumberOfLabelsToCheck = ...
3 years, 7 months ago (2017-05-10 22:38:47 UTC) #119
jungshik at Google
Thanks for a thorough review. (as you found out, pulling out IDNSpoofChecker was done separately ...
3 years, 7 months ago (2017-05-14 09:36:23 UTC) #122
ncarter (slow)
https://codereview.chromium.org/2784933002/diff/750001/components/url_formatter/url_formatter_unittest.cc File components/url_formatter/url_formatter_unittest.cc (right): https://codereview.chromium.org/2784933002/diff/750001/components/url_formatter/url_formatter_unittest.cc#newcode286 components/url_formatter/url_formatter_unittest.cc:286: // 'digklmo68.co.uk" are listed for unittest in the top ...
3 years, 7 months ago (2017-05-15 18:27:57 UTC) #125
Peter Kasting
LGTM, I trust you to follow up appropriately on outstanding issues https://codereview.chromium.org/2784933002/diff/750001/components/url_formatter/url_formatter_unittest.cc File components/url_formatter/url_formatter_unittest.cc (right): ...
3 years, 7 months ago (2017-05-15 18:55:31 UTC) #126
jungshik at Google
On 2017/05/15 18:27:57, ncarter (slow) wrote: > https://codereview.chromium.org/2784933002/diff/750001/components/url_formatter/url_formatter_unittest.cc > File components/url_formatter/url_formatter_unittest.cc (right): > > https://codereview.chromium.org/2784933002/diff/750001/components/url_formatter/url_formatter_unittest.cc#newcode286 ...
3 years, 7 months ago (2017-05-17 22:26:09 UTC) #127
jungshik at Google
Thank you, Peter and Nick. > LGTM, I trust you to follow up appropriately on ...
3 years, 7 months ago (2017-05-17 23:11:04 UTC) #128
Peter Kasting
Following up in another CL is fine with me.
3 years, 7 months ago (2017-05-17 23:12:59 UTC) #129
commit-bot: I haz the power
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2784933002/850001
3 years, 7 months ago (2017-05-19 05:30:23 UTC) #133
commit-bot: I haz the power
Committed patchset #47 (id:850001) as https://chromium.googlesource.com/chromium/src/+/a8add0308ba6067eb3de5a8fe82f9c2f2460ad91
3 years, 7 months ago (2017-05-19 06:49:36 UTC) #136
tsergeant
A revert of this CL (patchset #47 id:850001) has been created in https://codereview.chromium.org/2889303003/ by tsergeant@chromium.org. ...
3 years, 7 months ago (2017-05-19 07:23:40 UTC) #137
findit-for-me
Findit (https://goo.gl/kROfz5) confirmed this CL at revision 473109 as the culprit for failures in the ...
3 years, 7 months ago (2017-05-19 07:29:55 UTC) #138
jungshik at Google
3 years, 7 months ago (2017-05-22 05:22:43 UTC) #145
Message was sent while issue was closed.
Instead of reopening the committed CL, decided to make a new CL with 'size_t <->
int" conversion issue addressed with checked_cast<> for win_x64. 

https://codereview.chromium.org/2897873002

Powered by Google App Engine
This is Rietveld 408576698