OLD | NEW |
| (Empty) |
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved. | |
2 // Use of this source code is governed by a BSD-style license that can be | |
3 // found in the LICENSE file. | |
4 | |
5 // NB: Modelled after Mozilla's code (originally written by Pamela Greene, | |
6 // later modified by others), but almost entirely rewritten for Chrome. | |
7 // (netwerk/dns/src/nsEffectiveTLDService.h) | |
8 /* ***** BEGIN LICENSE BLOCK ***** | |
9 * Version: MPL 1.1/GPL 2.0/LGPL 2.1 | |
10 * | |
11 * The contents of this file are subject to the Mozilla Public License Version | |
12 * 1.1 (the "License"); you may not use this file except in compliance with | |
13 * the License. You may obtain a copy of the License at | |
14 * http://www.mozilla.org/MPL/ | |
15 * | |
16 * Software distributed under the License is distributed on an "AS IS" basis, | |
17 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License | |
18 * for the specific language governing rights and limitations under the | |
19 * License. | |
20 * | |
21 * The Original Code is Mozilla TLD Service | |
22 * | |
23 * The Initial Developer of the Original Code is | |
24 * Google Inc. | |
25 * Portions created by the Initial Developer are Copyright (C) 2006 | |
26 * the Initial Developer. All Rights Reserved. | |
27 * | |
28 * Contributor(s): | |
29 * Pamela Greene <pamg.bugs@gmail.com> (original author) | |
30 * | |
31 * Alternatively, the contents of this file may be used under the terms of | |
32 * either the GNU General Public License Version 2 or later (the "GPL"), or | |
33 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), | |
34 * in which case the provisions of the GPL or the LGPL are applicable instead | |
35 * of those above. If you wish to allow use of your version of this file only | |
36 * under the terms of either the GPL or the LGPL, and not to allow others to | |
37 * use your version of this file under the terms of the MPL, indicate your | |
38 * decision by deleting the provisions above and replace them with the notice | |
39 * and other provisions required by the GPL or the LGPL. If you do not delete | |
40 * the provisions above, a recipient may use your version of this file under | |
41 * the terms of any one of the MPL, the GPL or the LGPL. | |
42 * | |
43 * ***** END LICENSE BLOCK ***** */ | |
44 | |
45 /* | |
46 (Documentation based on the Mozilla documentation currently at | |
47 http://wiki.mozilla.org/Gecko:Effective_TLD_Service, written by the same | |
48 author.) | |
49 | |
50 The RegistryControlledDomainService examines the hostname of a GURL passed to | |
51 it and determines the longest portion that is controlled by a registrar. | |
52 Although technically the top-level domain (TLD) for a hostname is the last | |
53 dot-portion of the name (such as .com or .org), many domains (such as co.uk) | |
54 function as though they were TLDs, allocating any number of more specific, | |
55 essentially unrelated names beneath them. For example, .uk is a TLD, but | |
56 nobody is allowed to register a domain directly under .uk; the "effective" | |
57 TLDs are ac.uk, co.uk, and so on. We wouldn't want to allow any site in | |
58 *.co.uk to set a cookie for the entire co.uk domain, so it's important to be | |
59 able to identify which higher-level domains function as effective TLDs and | |
60 which can be registered. | |
61 | |
62 The service obtains its information about effective TLDs from a text resource | |
63 that must be in the following format: | |
64 | |
65 * It should use plain ASCII. | |
66 * It should contain one domain rule per line, terminated with \n, with nothing | |
67 else on the line. (The last rule in the file may omit the ending \n.) | |
68 * Rules should have been normalized using the same canonicalization that GURL | |
69 applies. For ASCII, that means they're not case-sensitive, among other | |
70 things; other normalizations are applied for other characters. | |
71 * Each rule should list the entire TLD-like domain name, with any subdomain | |
72 portions separated by dots (.) as usual. | |
73 * Rules should neither begin nor end with a dot. | |
74 * If a hostname matches more than one rule, the most specific rule (that is, | |
75 the one with more dot-levels) will be used. | |
76 * Other than in the case of wildcards (see below), rules do not implicitly | |
77 include their subcomponents. For example, "bar.baz.uk" does not imply | |
78 "baz.uk", and if "bar.baz.uk" is the only rule in the list, "foo.bar.baz.uk" | |
79 will match, but "baz.uk" and "qux.baz.uk" won't. | |
80 * The wildcard character '*' will match any valid sequence of characters. | |
81 * Wildcards may only appear as the entire most specific level of a rule. That | |
82 is, a wildcard must come at the beginning of a line and must be followed by | |
83 a dot. (You may not use a wildcard as the entire rule.) | |
84 * A wildcard rule implies a rule for the entire non-wildcard portion. For | |
85 example, the rule "*.foo.bar" implies the rule "foo.bar" (but not the rule | |
86 "bar"). This is typically important in the case of exceptions (see below). | |
87 * The exception character '!' before a rule marks an exception to a wildcard | |
88 rule. If your rules are "*.tokyo.jp" and "!pref.tokyo.jp", then | |
89 "a.b.tokyo.jp" has an effective TLD of "b.tokyo.jp", but "a.pref.tokyo.jp" | |
90 has an effective TLD of "tokyo.jp" (the exception prevents the wildcard | |
91 match, and we thus fall through to matching on the implied "tokyo.jp" rule | |
92 from the wildcard). | |
93 * If you use an exception rule without a corresponding wildcard rule, the | |
94 behavior is undefined. | |
95 | |
96 Firefox has a very similar service, and it's their data file we use to | |
97 construct our resource. However, the data expected by this implementation | |
98 differs from the Mozilla file in several important ways: | |
99 (1) We require that all single-level TLDs (com, edu, etc.) be explicitly | |
100 listed. As of this writing, Mozilla's file includes the single-level | |
101 TLDs too, but that might change. | |
102 (2) Our data is expected be in pure ASCII: all UTF-8 or otherwise encoded | |
103 items must already have been normalized. | |
104 (3) We do not allow comments, rule notes, blank lines, or line endings other | |
105 than LF. | |
106 Rules are also expected to be syntactically valid. | |
107 | |
108 The utility application tld_cleanup.exe converts a Mozilla-style file into a | |
109 Chrome one, making sure that single-level TLDs are explicitly listed, using | |
110 GURL to normalize rules, and validating the rules. | |
111 */ | |
112 | |
113 #ifndef NET_BASE_REGISTRY_CONTROLLED_DOMAINS_REGISTRY_CONTROLLED_DOMAIN_H_ | |
114 #define NET_BASE_REGISTRY_CONTROLLED_DOMAINS_REGISTRY_CONTROLLED_DOMAIN_H_ | |
115 | |
116 #include <string> | |
117 | |
118 #include "base/basictypes.h" | |
119 #include "net/base/net_export.h" | |
120 | |
121 class GURL; | |
122 | |
123 struct DomainRule; | |
124 | |
125 namespace net { | |
126 namespace registry_controlled_domains { | |
127 | |
128 // This enum is a required parameter to all public methods declared for this | |
129 // service. The Public Suffix List (http://publicsuffix.org/) this service | |
130 // uses as a data source splits all effective-TLDs into two groups. The main | |
131 // group describes registries that are acknowledged by ICANN. The second group | |
132 // contains a list of private additions for domains that enable external users | |
133 // to create subdomains, such as appspot.com. | |
134 // The RegistryFilter enum lets you choose whether you want to include the | |
135 // private additions in your lookup. | |
136 // See this for example use cases: | |
137 // https://wiki.mozilla.org/Public_Suffix_List/Use_Cases | |
138 enum NET_EXPORT PrivateRegistryFilter { | |
139 EXCLUDE_PRIVATE_REGISTRIES = 0, | |
140 INCLUDE_PRIVATE_REGISTRIES | |
141 }; | |
142 | |
143 // This enum is a required parameter to the GetRegistryLength functions | |
144 // declared for this service. Whenever there is no matching rule in the | |
145 // effective-TLD data (or in the default data, if the resource failed to | |
146 // load), the result will be dependent on which enum value was passed in. | |
147 // If EXCLUDE_UNKNOWN_REGISTRIES was passed in, the resulting registry length | |
148 // will be 0. If INCLUDE_UNKNOWN_REGISTRIES was passed in, the resulting | |
149 // registry length will be the length of the last subcomponent (eg. 3 for | |
150 // foobar.baz). | |
151 enum NET_EXPORT UnknownRegistryFilter { | |
152 EXCLUDE_UNKNOWN_REGISTRIES = 0, | |
153 INCLUDE_UNKNOWN_REGISTRIES | |
154 }; | |
155 | |
156 // Returns the registered, organization-identifying host and all its registry | |
157 // information, but no subdomains, from the given GURL. Returns an empty | |
158 // string if the GURL is invalid, has no host (e.g. a file: URL), has multiple | |
159 // trailing dots, is an IP address, has only one subcomponent (i.e. no dots | |
160 // other than leading/trailing ones), or is itself a recognized registry | |
161 // identifier. If no matching rule is found in the effective-TLD data (or in | |
162 // the default data, if the resource failed to load), the last subcomponent of | |
163 // the host is assumed to be the registry. | |
164 // | |
165 // Examples: | |
166 // http://www.google.com/file.html -> "google.com" (com) | |
167 // http://..google.com/file.html -> "google.com" (com) | |
168 // http://google.com./file.html -> "google.com." (com) | |
169 // http://a.b.co.uk/file.html -> "b.co.uk" (co.uk) | |
170 // file:///C:/bar.html -> "" (no host) | |
171 // http://foo.com../file.html -> "" (multiple trailing dots) | |
172 // http://192.168.0.1/file.html -> "" (IP address) | |
173 // http://bar/file.html -> "" (no subcomponents) | |
174 // http://co.uk/file.html -> "" (host is a registry) | |
175 // http://foo.bar/file.html -> "foo.bar" (no rule; assume bar) | |
176 NET_EXPORT std::string GetDomainAndRegistry(const GURL& gurl, | |
177 PrivateRegistryFilter filter); | |
178 | |
179 // Like the GURL version, but takes a host (which is canonicalized internally) | |
180 // instead of a full GURL. | |
181 NET_EXPORT std::string GetDomainAndRegistry(const std::string& host, | |
182 PrivateRegistryFilter filter); | |
183 | |
184 // This convenience function returns true if the two GURLs both have hosts | |
185 // and one of the following is true: | |
186 // * They each have a known domain and registry, and it is the same for both | |
187 // URLs. Note that this means the trailing dot, if any, must match too. | |
188 // * They don't have known domains/registries, but the hosts are identical. | |
189 // Effectively, callers can use this function to check whether the input URLs | |
190 // represent hosts "on the same site". | |
191 NET_EXPORT bool SameDomainOrHost(const GURL& gurl1, const GURL& gurl2, | |
192 PrivateRegistryFilter filter); | |
193 | |
194 // Finds the length in bytes of the registrar portion of the host in the | |
195 // given GURL. Returns std::string::npos if the GURL is invalid or has no | |
196 // host (e.g. a file: URL). Returns 0 if the GURL has multiple trailing dots, | |
197 // is an IP address, has no subcomponents, or is itself a recognized registry | |
198 // identifier. The result is also dependent on the UnknownRegistryFilter. | |
199 // If no matching rule is found in the effective-TLD data (or in | |
200 // the default data, if the resource failed to load), returns 0 if | |
201 // |unknown_filter| is EXCLUDE_UNKNOWN_REGISTRIES, or the length of the last | |
202 // subcomponent if |unknown_filter| is INCLUDE_UNKNOWN_REGISTRIES. | |
203 // | |
204 // Examples: | |
205 // http://www.google.com/file.html -> 3 (com) | |
206 // http://..google.com/file.html -> 3 (com) | |
207 // http://google.com./file.html -> 4 (com) | |
208 // http://a.b.co.uk/file.html -> 5 (co.uk) | |
209 // file:///C:/bar.html -> std::string::npos (no host) | |
210 // http://foo.com../file.html -> 0 (multiple trailing | |
211 // dots) | |
212 // http://192.168.0.1/file.html -> 0 (IP address) | |
213 // http://bar/file.html -> 0 (no subcomponents) | |
214 // http://co.uk/file.html -> 0 (host is a registry) | |
215 // http://foo.bar/file.html -> 0 or 3, depending (no rule; assume | |
216 // bar) | |
217 NET_EXPORT size_t GetRegistryLength(const GURL& gurl, | |
218 UnknownRegistryFilter unknown_filter, | |
219 PrivateRegistryFilter private_filter); | |
220 | |
221 // Like the GURL version, but takes a host (which is canonicalized internally) | |
222 // instead of a full GURL. | |
223 NET_EXPORT size_t GetRegistryLength(const std::string& host, | |
224 UnknownRegistryFilter unknown_filter, | |
225 PrivateRegistryFilter private_filter); | |
226 | |
227 typedef const struct DomainRule* (*FindDomainPtr)(const char *, unsigned int); | |
228 | |
229 // Used for unit tests. Use default domains. | |
230 NET_EXPORT_PRIVATE void SetFindDomainGraph(); | |
231 | |
232 // Used for unit tests, so that a frozen list of domains is used. | |
233 NET_EXPORT_PRIVATE void SetFindDomainGraph(const unsigned char* domains, | |
234 size_t length); | |
235 } // namespace registry_controlled_domains | |
236 } // namespace net | |
237 | |
238 #endif // NET_BASE_REGISTRY_CONTROLLED_DOMAINS_REGISTRY_CONTROLLED_DOMAIN_H_ | |
OLD | NEW |