net/base/registry_controlled_domain.h - Issue 10796033: Move files related to registry-controlled domains into a new net/base/registry_controlled_domains/ …

Side by Side Diff: net/base/registry_controlled_domain.h

Issue 10796033: Move files related to registry-controlled domains into a new net/base/registry_controlled_domains/ … (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src/

Patch Set: Update checkout Created 8 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
	(Empty)
1 // Copyright (c) 2011 The Chromium Authors. All rights reserved.

2 // Use of this source code is governed by a BSD-style license that can be

3 // found in the LICENSE file.

4

5 // NB: Modelled after Mozilla's code (originally written by Pamela Greene,

6 // later modified by others), but almost entirely rewritten for Chrome.

7 // (netwerk/dns/src/nsEffectiveTLDService.h)

8 /* *** BEGIN LICENSE BLOCK ***

9 * Version: MPL 1.1/GPL 2.0/LGPL 2.1

10 *

11 * The contents of this file are subject to the Mozilla Public License Version

12 * 1.1 (the "License"); you may not use this file except in compliance with

13 * the License. You may obtain a copy of the License at

14 * http://www.mozilla.org/MPL/

15 *

16 * Software distributed under the License is distributed on an "AS IS" basis,

17 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License

18 * for the specific language governing rights and limitations under the

19 * License.

20 *

21 * The Original Code is Mozilla TLD Service

22 *

23 * The Initial Developer of the Original Code is

24 * Google Inc.

25 * Portions created by the Initial Developer are Copyright (C) 2006

26 * the Initial Developer. All Rights Reserved.

27 *

28 * Contributor(s):

29 * Pamela Greene <pamg.bugs@gmail.com> (original author)

30 *

31 * Alternatively, the contents of this file may be used under the terms of

32 * either the GNU General Public License Version 2 or later (the "GPL"), or

33 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),

34 * in which case the provisions of the GPL or the LGPL are applicable instead

35 * of those above. If you wish to allow use of your version of this file only

36 * under the terms of either the GPL or the LGPL, and not to allow others to

37 * use your version of this file under the terms of the MPL, indicate your

38 * decision by deleting the provisions above and replace them with the notice

39 * and other provisions required by the GPL or the LGPL. If you do not delete

40 * the provisions above, a recipient may use your version of this file under

41 * the terms of any one of the MPL, the GPL or the LGPL.

42 *

43 * *** END LICENSE BLOCK *** */

44

45 /*

46 (Documentation based on the Mozilla documentation currently at

47 http://wiki.mozilla.org/Gecko:Effective_TLD_Service, written by the same

48 author.)

49

50 The RegistryControlledDomainService examines the hostname of a GURL passed to

51 it and determines the longest portion that is controlled by a registrar.

52 Although technically the top-level domain (TLD) for a hostname is the last

53 dot-portion of the name (such as .com or .org), many domains (such as co.uk)

54 function as though they were TLDs, allocating any number of more specific,

55 essentially unrelated names beneath them. For example, .uk is a TLD, but

56 nobody is allowed to register a domain directly under .uk; the "effective"

57 TLDs are ac.uk, co.uk, and so on. We wouldn't want to allow any site in

58 *.co.uk to set a cookie for the entire co.uk domain, so it's important to be

59 able to identify which higher-level domains function as effective TLDs and

60 which can be registered.

61

62 The service obtains its information about effective TLDs from a text resource

63 that must be in the following format:

64

65 * It should use plain ASCII.

66 * It should contain one domain rule per line, terminated with \n, with nothing

67 else on the line. (The last rule in the file may omit the ending \n.)

68 * Rules should have been normalized using the same canonicalization that GURL

69 applies. For ASCII, that means they're not case-sensitive, among other

70 things; other normalizations are applied for other characters.

71 * Each rule should list the entire TLD-like domain name, with any subdomain

72 portions separated by dots (.) as usual.

73 * Rules should neither begin nor end with a dot.

74 * If a hostname matches more than one rule, the most specific rule (that is,

75 the one with more dot-levels) will be used.

76 * Other than in the case of wildcards (see below), rules do not implicitly

77 include their subcomponents. For example, "bar.baz.uk" does not imply

78 "baz.uk", and if "bar.baz.uk" is the only rule in the list, "foo.bar.baz.uk"

79 will match, but "baz.uk" and "qux.baz.uk" won't.

80 * The wildcard character '*' will match any valid sequence of characters.

81 * Wildcards may only appear as the entire most specific level of a rule. That

82 is, a wildcard must come at the beginning of a line and must be followed by

83 a dot. (You may not use a wildcard as the entire rule.)

84 * A wildcard rule implies a rule for the entire non-wildcard portion. For

85 example, the rule "*.foo.bar" implies the rule "foo.bar" (but not the rule

86 "bar"). This is typically important in the case of exceptions (see below).

87 * The exception character '!' before a rule marks an exception to a wildcard

88 rule. If your rules are "*.tokyo.jp" and "!pref.tokyo.jp", then

89 "a.b.tokyo.jp" has an effective TLD of "b.tokyo.jp", but "a.pref.tokyo.jp"

90 has an effective TLD of "tokyo.jp" (the exception prevents the wildcard

91 match, and we thus fall through to matching on the implied "tokyo.jp" rule

92 from the wildcard).

93 * If you use an exception rule without a corresponding wildcard rule, the

94 behavior is undefined.

95

96 Firefox has a very similar service, and it's their data file we use to

97 construct our resource. However, the data expected by this implementation

98 differs from the Mozilla file in several important ways:

99 (1) We require that all single-level TLDs (com, edu, etc.) be explicitly

100 listed. As of this writing, Mozilla's file includes the single-level

101 TLDs too, but that might change.

102 (2) Our data is expected be in pure ASCII: all UTF-8 or otherwise encoded

103 items must already have been normalized.

104 (3) We do not allow comments, rule notes, blank lines, or line endings other

105 than LF.

106 Rules are also expected to be syntactically valid.

107

108 The utility application tld_cleanup.exe converts a Mozilla-style file into a

109 Chrome one, making sure that single-level TLDs are explicitly listed, using

110 GURL to normalize rules, and validating the rules.

111 */

112

113 #ifndef NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_

114 #define NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_

115

116 #include <string>

117

118 #include "base/basictypes.h"

119 #include "net/base/net_export.h"

120

121 class GURL;

122

123 struct DomainRule;

124

125 namespace net {

126

127 class NET_EXPORT RegistryControlledDomainService {

128 public:

129 // Returns the registered, organization-identifying host and all its registry

130 // information, but no subdomains, from the given GURL. Returns an empty

131 // string if the GURL is invalid, has no host (e.g. a file: URL), has multiple

132 // trailing dots, is an IP address, has only one subcomponent (i.e. no dots

133 // other than leading/trailing ones), or is itself a recognized registry

134 // identifier. If no matching rule is found in the effective-TLD data (or in

135 // the default data, if the resource failed to load), the last subcomponent of

136 // the host is assumed to be the registry.

137 //

138 // Examples:

139 // http://www.google.com/file.html -> "google.com" (com)

140 // http://..google.com/file.html -> "google.com" (com)

141 // http://google.com./file.html -> "google.com." (com)

142 // http://a.b.co.uk/file.html -> "b.co.uk" (co.uk)

143 // file:///C:/bar.html -> "" (no host)

144 // http://foo.com../file.html -> "" (multiple trailing dots)

145 // http://192.168.0.1/file.html -> "" (IP address)

146 // http://bar/file.html -> "" (no subcomponents)

147 // http://co.uk/file.html -> "" (host is a registry)

148 // http://foo.bar/file.html -> "foo.bar" (no rule; assume bar)

149 static std::string GetDomainAndRegistry(const GURL& gurl);

150

151 // Like the GURL version, but takes a host (which is canonicalized internally)

152 // instead of a full GURL.

153 static std::string GetDomainAndRegistry(const std::string& host);

154

155 // This convenience function returns true if the two GURLs both have hosts

156 // and one of the following is true:

157 // * They each have a known domain and registry, and it is the same for both

158 // URLs. Note that this means the trailing dot, if any, must match too.

159 // * They don't have known domains/registries, but the hosts are identical.

160 // Effectively, callers can use this function to check whether the input URLs

161 // represent hosts "on the same site".

162 static bool SameDomainOrHost(const GURL& gurl1, const GURL& gurl2);

163

164 // Finds the length in bytes of the registrar portion of the host in the

165 // given GURL. Returns std::string::npos if the GURL is invalid or has no

166 // host (e.g. a file: URL). Returns 0 if the GURL has multiple trailing dots,

167 // is an IP address, has no subcomponents, or is itself a recognized registry

168 // identifier. If no matching rule is found in the effective-TLD data (or in

169 // the default data, if the resource failed to load), returns 0 if

170 // \|allow_unknown_registries\| is false, or the length of the last subcomponent

171 // if \|allow_unknown_registries\| is true.

172 //

173 // Examples:

174 // http://www.google.com/file.html -> 3 (com)

175 // http://..google.com/file.html -> 3 (com)

176 // http://google.com./file.html -> 4 (com)

177 // http://a.b.co.uk/file.html -> 5 (co.uk)

178 // file:///C:/bar.html -> std::string::npos (no host)

179 // http://foo.com../file.html -> 0 (multiple trailing

180 // dots)

181 // http://192.168.0.1/file.html -> 0 (IP address)

182 // http://bar/file.html -> 0 (no subcomponents)

183 // http://co.uk/file.html -> 0 (host is a registry)

184 // http://foo.bar/file.html -> 0 or 3, depending (no rule; assume

185 // bar)

186 static size_t GetRegistryLength(const GURL& gurl,

187 bool allow_unknown_registries);

188

189 // Like the GURL version, but takes a host (which is canonicalized internally)

190 // instead of a full GURL.

191 static size_t GetRegistryLength(const std::string& host,

192 bool allow_unknown_registries);

193

194 private:

195 friend class RegistryControlledDomainTest;

196

197 // Internal workings of the static public methods. See above.

198 static std::string GetDomainAndRegistryImpl(const std::string& host);

199 static size_t GetRegistryLengthImpl(const std::string& host,

200 bool allow_unknown_registries);

201

202 typedef const struct DomainRule* (FindDomainPtr)(const char , unsigned int);

203

204 // Used for unit tests, so that a different perfect hash map from the full

205 // list is used. Set to NULL to use the Default function.

206 static void UseFindDomainFunction(FindDomainPtr function);

207

208 // Function that returns a DomainRule given a domain.

209 static FindDomainPtr find_domain_function_;

210

211

212 DISALLOW_IMPLICIT_CONSTRUCTORS(RegistryControlledDomainService);

213 };

214

215 } // namespace net

216

217 #endif // NET_BASE_REGISTRY_CONTROLLED_DOMAIN_H_

OLD	NEW

« no previous file with comments | « net/base/effective_tld_names_unittest2.gperf ('k') | net/base/registry_controlled_domain.cc » ('j') | no next file with comments »