Chromium Code Reviews| Index: components/url_formatter/top_domains/README |
| diff --git a/components/url_formatter/top_domains/README b/components/url_formatter/top_domains/README |
| new file mode 100644 |
| index 0000000000000000000000000000000000000000..804f4e722899ff2b9db9674c9fc51c82f3902485 |
| --- /dev/null |
| +++ b/components/url_formatter/top_domains/README |
| @@ -0,0 +1,23 @@ |
| +* alexa_10k_domains.list |
| + It is an input to make_top_domain_list and is made up of list of Alexa |
| + top 10k domains (one per line). |
| + It's derived from |
| + src/tools/perf/page_sets/alexa1-10000-urls.json by running the following: |
|
ncarter (slow)
2017/04/20 22:26:59
IIRC the alexa10000 from page_sets was almost five
|
| + |
| + grep http ../../../tools/perf/page_sets/alexa1-10000-urls.json | \ |
| + sed -r -e 's;^.*"https?://(.*)/".*$;\1;' -e 's/www\.//' | \ |
| + awk 'BEGIN {FS="."} { printf("%s%s\n", NF > 3 ? "#" : "", $0); } \ |
| + END {printf ("# for testing\ndigklmo68.com\ndigklmo68.co.uk\n");}' > \ |
|
ncarter (slow)
2017/04/20 22:26:58
This would probably be better as a python script,
|
| + alexa_10k_domains.list |
| + |
| +* alexa_10k_names_and_skeletons.gperf |
| + |
| + It is generated by running make_top_domain_list and checked in. |
| + No command line argument needs to be passed. |
| + |
| + $ ninja -C $build_outdir make_top_domain_list |
| + $ $build_outdir/make_top_domain_list |
| + |
| + During a build, it is processed by base/dafsa/make_dafsa.py to generate |
| + alexa_10k_names_and_skeletons-inc.cc that is included by |
| + components/url_formatter/url_formatter.cc |