| OLD | NEW |
| (Empty) |
| 1 // Copyright (c) 2011 The Chromium Authors. All rights reserved. | |
| 2 // Use of this source code is governed by a BSD-style license that can be | |
| 3 // found in the LICENSE file. | |
| 4 | |
| 5 // NB: Modelled after Mozilla's code (originally written by Pamela Greene, | |
| 6 // later modified by others), but almost entirely rewritten for Chrome. | |
| 7 // (netwerk/dns/src/nsEffectiveTLDService.h) | |
| 8 /* ***** BEGIN LICENSE BLOCK ***** | |
| 9 * Version: MPL 1.1/GPL 2.0/LGPL 2.1 | |
| 10 * | |
| 11 * The contents of this file are subject to the Mozilla Public License Version | |
| 12 * 1.1 (the "License"); you may not use this file except in compliance with | |
| 13 * the License. You may obtain a copy of the License at | |
| 14 * http://www.mozilla.org/MPL/ | |
| 15 * | |
| 16 * Software distributed under the License is distributed on an "AS IS" basis, | |
| 17 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License | |
| 18 * for the specific language governing rights and limitations under the | |
| 19 * License. | |
| 20 * | |
| 21 * The Original Code is Mozilla TLD Service | |
| 22 * | |
| 23 * The Initial Developer of the Original Code is | |
| 24 * Google Inc. | |
| 25 * Portions created by the Initial Developer are Copyright (C) 2006 | |
| 26 * the Initial Developer. All Rights Reserved. | |
| 27 * | |
| 28 * Contributor(s): | |
| 29 * Pamela Greene <pamg.bugs@gmail.com> (original author) | |
| 30 * | |
| 31 * Alternatively, the contents of this file may be used under the terms of | |
| 32 * either the GNU General Public License Version 2 or later (the "GPL"), or | |
| 33 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), | |
| 34 * in which case the provisions of the GPL or the LGPL are applicable instead | |
| 35 * of those above. If you wish to allow use of your version of this file only | |
| 36 * under the terms of either the GPL or the LGPL, and not to allow others to | |
| 37 * use your version of this file under the terms of the MPL, indicate your | |
| 38 * decision by deleting the provisions above and replace them with the notice | |
| 39 * and other provisions required by the GPL or the LGPL. If you do not delete | |
| 40 * the provisions above, a recipient may use your version of this file under | |
| 41 * the terms of any one of the MPL, the GPL or the LGPL. | |
| 42 * | |
| 43 * ***** END LICENSE BLOCK ***** */ | |
| 44 | |
| 45 /* | |
| 46 (Documentation based on the Mozilla documentation currently at | |
| 47 http://wiki.mozilla.org/Gecko:Effective_TLD_Service, written by the same | |
| 48 author.) | |
| 49 | |
| 50 The RegistryControlledDomainService examines the hostname of a GURL passed to | |
| 51 it and determines the longest portion that is controlled by a registrar. | |
| 52 Although technically the top-level domain (TLD) for a hostname is the last | |
| 53 dot-portion of the name (such as .com or .org), many domains (such as co.uk) | |
| 54 function as though they were TLDs, allocating any number of more specific, | |
| 55 essentially unrelated names beneath them. For example, .uk is a TLD, but | |
| 56 nobody is allowed to register a domain directly under .uk; the "effective" | |
| 57 TLDs are ac.uk, co.uk, and so on. We wouldn't want to allow any site in | |
| 58 *.co.uk to set a cookie for the entire co.uk domain, so it's important to be | |
| 59 able to identify which higher-level domains function as effective TLDs and | |
| 60 which can be registered. | |
| 61 | |
| 62 The service obtains its information about effective TLDs from a text resource | |
| 63 that must be in the following format: | |
| 64 | |
| 65 * It should use plain ASCII. | |
| 66 * It should contain one domain rule per line, terminated with \n, with nothing | |
| 67 else on the line. (The last rule in the file may omit the ending \n.) | |
| 68 * Rules should have been normalized using the same canonicalization that GURL | |
| 69 applies. For ASCII, that means they're not case-sensitive, among other | |
| 70 things; other normalizations are applied for other characters. | |
| 71 * Each rule should list the entire TLD-like domain name, with any subdomain | |
| 72 portions separated by dots (.) as usual. | |
| 73 * Rules should neither begin nor end with a dot. | |
| 74 * If a hostname matches more than one rule, the most specific rule (that is, | |
| 75 the one with more dot-levels) will be used. | |
| 76 * Other than in the case of wildcards (see below), rules do not implicitly | |
| 77 include their subcomponents. For example, "bar.baz.uk" does not imply | |
| 78 "baz.uk", and if "bar.baz.uk" is the only rule in the list, "foo.bar.baz.uk" | |
| 79 will match, but "baz.uk" and "qux.baz.uk" won't. | |
| 80 * The wildcard character '*' will match any valid sequence of characters. | |
| 81 * Wildcards may only appear as the entire most specific level of a rule. That | |
| 82 is, a wildcard must come at the beginning of a line and must be followed by | |
| 83 a dot. (You may not use a wildcard as the entire rule.) | |
| 84 * A wildcard rule implies a rule for the entire non-wildcard portion. For | |
| 85 example, the rule "*.foo.bar" implies the rule "foo.bar" (but not the rule | |
| 86 "bar"). This is typically important in the case of exceptions (see below). | |
| 87 * The exception character '!' before a rule marks an exception to a wildcard | |
| 88 rule. If your rules are "*.tokyo.jp" and "!pref.tokyo.jp", then | |
| 89 "a.b.tokyo.jp" has an effective TLD of "b.tokyo.jp", but "a.pref.tokyo.jp" | |
| 90 has an effective TLD of "tokyo.jp" (the exception prevents the wildcard | |
| 91 match, and we thus fall through to matching on the implied "tokyo.jp" rule | |
| 92 from the wildcard). | |
| 93 * If you use an exception rule without a corresponding wildcard rule, the | |
| 94 behavior is undefined. | |
| 95 | |
| 96 Firefox has a very similar service, and it's their data file we use to | |
| 97 construct our resource. However, the data expected by this implementation | |
| 98 differs from the Mozilla file in several important ways: | |
| 99 (1) We require that all single-level TLDs (com, edu, etc.) be explicitly | |
| 100 listed. As of this writing, Mozilla's file includes the single-level | |
| 101 TLDs too, but that might change. | |
| 102 (2) Our data is expected be in pure ASCII: all UTF-8 or otherwise encoded | |
| 103 items must already have been normalized. | |
| 104 (3) We do not allow comments, rule notes, blank lines, or line endings other | |
| 105 than LF. | |
| 106 Rules are also expected to be syntactically valid. | |
| 107 | |
| 108 The utility application tld_cleanup.exe converts a Mozilla-style file into a | |
| 109 Chrome one, making sure that single-level TLDs are explicitly listed, using | |
| 110 GURL to normalize rules, and validating the rules. | |
| 111 */ | |
| 112 | |
| 113 #ifndef NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_ | |
| 114 #define NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_ | |
| 115 | |
| 116 #include <string> | |
| 117 | |
| 118 #include "base/basictypes.h" | |
| 119 #include "net/base/net_export.h" | |
| 120 | |
| 121 class GURL; | |
| 122 | |
| 123 struct DomainRule; | |
| 124 | |
| 125 namespace net { | |
| 126 | |
| 127 class NET_EXPORT RegistryControlledDomainService { | |
| 128 public: | |
| 129 // Returns the registered, organization-identifying host and all its registry | |
| 130 // information, but no subdomains, from the given GURL. Returns an empty | |
| 131 // string if the GURL is invalid, has no host (e.g. a file: URL), has multiple | |
| 132 // trailing dots, is an IP address, has only one subcomponent (i.e. no dots | |
| 133 // other than leading/trailing ones), or is itself a recognized registry | |
| 134 // identifier. If no matching rule is found in the effective-TLD data (or in | |
| 135 // the default data, if the resource failed to load), the last subcomponent of | |
| 136 // the host is assumed to be the registry. | |
| 137 // | |
| 138 // Examples: | |
| 139 // http://www.google.com/file.html -> "google.com" (com) | |
| 140 // http://..google.com/file.html -> "google.com" (com) | |
| 141 // http://google.com./file.html -> "google.com." (com) | |
| 142 // http://a.b.co.uk/file.html -> "b.co.uk" (co.uk) | |
| 143 // file:///C:/bar.html -> "" (no host) | |
| 144 // http://foo.com../file.html -> "" (multiple trailing dots) | |
| 145 // http://192.168.0.1/file.html -> "" (IP address) | |
| 146 // http://bar/file.html -> "" (no subcomponents) | |
| 147 // http://co.uk/file.html -> "" (host is a registry) | |
| 148 // http://foo.bar/file.html -> "foo.bar" (no rule; assume bar) | |
| 149 static std::string GetDomainAndRegistry(const GURL& gurl); | |
| 150 | |
| 151 // Like the GURL version, but takes a host (which is canonicalized internally) | |
| 152 // instead of a full GURL. | |
| 153 static std::string GetDomainAndRegistry(const std::string& host); | |
| 154 | |
| 155 // This convenience function returns true if the two GURLs both have hosts | |
| 156 // and one of the following is true: | |
| 157 // * They each have a known domain and registry, and it is the same for both | |
| 158 // URLs. Note that this means the trailing dot, if any, must match too. | |
| 159 // * They don't have known domains/registries, but the hosts are identical. | |
| 160 // Effectively, callers can use this function to check whether the input URLs | |
| 161 // represent hosts "on the same site". | |
| 162 static bool SameDomainOrHost(const GURL& gurl1, const GURL& gurl2); | |
| 163 | |
| 164 // Finds the length in bytes of the registrar portion of the host in the | |
| 165 // given GURL. Returns std::string::npos if the GURL is invalid or has no | |
| 166 // host (e.g. a file: URL). Returns 0 if the GURL has multiple trailing dots, | |
| 167 // is an IP address, has no subcomponents, or is itself a recognized registry | |
| 168 // identifier. If no matching rule is found in the effective-TLD data (or in | |
| 169 // the default data, if the resource failed to load), returns 0 if | |
| 170 // |allow_unknown_registries| is false, or the length of the last subcomponent | |
| 171 // if |allow_unknown_registries| is true. | |
| 172 // | |
| 173 // Examples: | |
| 174 // http://www.google.com/file.html -> 3 (com) | |
| 175 // http://..google.com/file.html -> 3 (com) | |
| 176 // http://google.com./file.html -> 4 (com) | |
| 177 // http://a.b.co.uk/file.html -> 5 (co.uk) | |
| 178 // file:///C:/bar.html -> std::string::npos (no host) | |
| 179 // http://foo.com../file.html -> 0 (multiple trailing | |
| 180 // dots) | |
| 181 // http://192.168.0.1/file.html -> 0 (IP address) | |
| 182 // http://bar/file.html -> 0 (no subcomponents) | |
| 183 // http://co.uk/file.html -> 0 (host is a registry) | |
| 184 // http://foo.bar/file.html -> 0 or 3, depending (no rule; assume | |
| 185 // bar) | |
| 186 static size_t GetRegistryLength(const GURL& gurl, | |
| 187 bool allow_unknown_registries); | |
| 188 | |
| 189 // Like the GURL version, but takes a host (which is canonicalized internally) | |
| 190 // instead of a full GURL. | |
| 191 static size_t GetRegistryLength(const std::string& host, | |
| 192 bool allow_unknown_registries); | |
| 193 | |
| 194 private: | |
| 195 friend class RegistryControlledDomainTest; | |
| 196 | |
| 197 // Internal workings of the static public methods. See above. | |
| 198 static std::string GetDomainAndRegistryImpl(const std::string& host); | |
| 199 static size_t GetRegistryLengthImpl(const std::string& host, | |
| 200 bool allow_unknown_registries); | |
| 201 | |
| 202 typedef const struct DomainRule* (*FindDomainPtr)(const char *, unsigned int); | |
| 203 | |
| 204 // Used for unit tests, so that a different perfect hash map from the full | |
| 205 // list is used. Set to NULL to use the Default function. | |
| 206 static void UseFindDomainFunction(FindDomainPtr function); | |
| 207 | |
| 208 // Function that returns a DomainRule given a domain. | |
| 209 static FindDomainPtr find_domain_function_; | |
| 210 | |
| 211 | |
| 212 DISALLOW_IMPLICIT_CONSTRUCTORS(RegistryControlledDomainService); | |
| 213 }; | |
| 214 | |
| 215 } // namespace net | |
| 216 | |
| 217 #endif // NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_ | |
| OLD | NEW |