Index: components/url_formatter/top_domains/README |
diff --git a/components/url_formatter/top_domains/README b/components/url_formatter/top_domains/README |
new file mode 100644 |
index 0000000000000000000000000000000000000000..804f4e722899ff2b9db9674c9fc51c82f3902485 |
--- /dev/null |
+++ b/components/url_formatter/top_domains/README |
@@ -0,0 +1,23 @@ |
+* alexa_10k_domains.list |
+ It is an input to make_top_domain_list and is made up of list of Alexa |
+ top 10k domains (one per line). |
+ It's derived from |
+ src/tools/perf/page_sets/alexa1-10000-urls.json by running the following: |
ncarter (slow)
2017/04/20 22:26:59
IIRC the alexa10000 from page_sets was almost five
|
+ |
+ grep http ../../../tools/perf/page_sets/alexa1-10000-urls.json | \ |
+ sed -r -e 's;^.*"https?://(.*)/".*$;\1;' -e 's/www\.//' | \ |
+ awk 'BEGIN {FS="."} { printf("%s%s\n", NF > 3 ? "#" : "", $0); } \ |
+ END {printf ("# for testing\ndigklmo68.com\ndigklmo68.co.uk\n");}' > \ |
ncarter (slow)
2017/04/20 22:26:58
This would probably be better as a python script,
|
+ alexa_10k_domains.list |
+ |
+* alexa_10k_names_and_skeletons.gperf |
+ |
+ It is generated by running make_top_domain_list and checked in. |
+ No command line argument needs to be passed. |
+ |
+ $ ninja -C $build_outdir make_top_domain_list |
+ $ $build_outdir/make_top_domain_list |
+ |
+ During a build, it is processed by base/dafsa/make_dafsa.py to generate |
+ alexa_10k_names_and_skeletons-inc.cc that is included by |
+ components/url_formatter/url_formatter.cc |