net/cert/internal/verify_name_match.cc - Issue 1125333005: RFC 2459 name comparison.

Side by Side Diff: net/cert/internal/verify_name_match.cc

Issue 1125333005: RFC 2459 name comparison. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: review changes, implement unicode transcoding Created 5 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 // Copyright 2015 The Chromium Authors. All rights reserved.	1 // Copyright 2015 The Chromium Authors. All rights reserved.

2 // Use of this source code is governed by a BSD-style license that can be	2 // Use of this source code is governed by a BSD-style license that can be

3 // found in the LICENSE file.	3 // found in the LICENSE file.

4	4

	5 #include "base/strings/string16.h"

	6 #include "base/strings/string_util.h"

	7 #include "base/strings/utf_string_conversion_utils.h"

	8 #include "base/strings/utf_string_conversions.h"

	9 #include "base/sys_byteorder.h"

	10 #include "base/third_party/icu/icu_utf.h"

5 #include "net/cert/internal/verify_name_match.h"	11 #include "net/cert/internal/verify_name_match.h"

6 #include "net/der/input.h"	12 #include "net/der/input.h"

	13 #include "net/der/parser.h"

	14 #include "net/der/tag.h"

7	15

8 namespace net {	16 namespace net {

9	17

	18 namespace {

	19

	20 // Normalize a PrintableString value according to RFC 2459 section 4.1.2.4.

	21 bool NormalizePrintableStringValue(const der::Input& in, std::string* output) {

	22 // Normalized version will always be equal or shorter than input.

	23 // Copy to output and then normalize and truncate the output if necessary.

	24 output->assign(reinterpret_cast<const char*>(in.UnsafeData()), in.Length());

	25

	26 std::string::const_iterator read_iter = output->begin();

	27 std::string::iterator write_iter = output->begin();

	28

	29 for (; read_iter != output->end() && *read_iter == ' '; ++read_iter) {
	Ryan Sleevi 2015/06/18 01:30:12 COMMENT: I think it's important to explain why iss COMMENT: I think it's important to explain why isspace() [or related] don't need to be used, since it's not (entirely) obvious. Of the ASCII whitespace characters (' ', '\t', '\n', '\v', '\f', '\r'), PrintableString is only allowed to contain SPACE (Section 41.4, Table 10, of X.680 (2008)) // It's not necessary to cover all ASCII whitespace. Per X.680, Section 41.4, // the only whitespace character allowed is space.
	30 // Ignore leading whitespace.

	31 }

	32

	33 for (; read_iter != output->end(); ++read_iter) {

	34 const char c = *read_iter;

	35 if (c == ' ') {

	36 // If there are non-whitespace characters remaining in input, compress

	37 // multiple whitespace chars to a single space, otherwise ignore trailing

	38 // whitespace.

	39 std::string::const_iterator next_iter = read_iter + 1;

	40 if (next_iter != output->end() && *next_iter != ' ')

	41 *(write_iter++) = ' ';

	42 } else if (c >= 'A' && c <= 'Z') {

	43 // Fold case.

	44 *(write_iter++) = c + ('a' - 'A');

	45 } else if ((c >= 'a' && c <= 'z') \|\| (c >= '\'' && c <= ':') \|\| c == '=' \|\|

	46 c == '?') {

	47 // Accept remaining allowed characters (Note that * is not allowed by the

	48 // spec, but openssl allows it, and so there are a number of certs that

	49 // use it):

	50 // a-z

	51 // ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 :

	52 // = ?
	Ryan Sleevi 2015/06/18 01:30:12 I feel that this description (and the comment I su I feel that this description (and the comment I suggested) probably make more sense as an overall function level comment // Normalizes the DER-encoded PrintableString value \|in\| according to // RFC 2459, Section 4.1.2.4 // // Briefly, normalization involves removing leading and trailing // whitespace, folding multiple whitespace characters into a single // whitespace character, and normalizing on case (this function // normalizes to lowercase). // // During normalization, this function also validates that \|in\| // is properly encoded - that is, that it restricts to the character // set defined in X.680 (2008), Section 41.4, Table 10. X.680 defines // the valid characters as // a-z A-Z 0-9 (space) ' ( ) + , - . / : = ? // // However, due to an old OpenSSL encoding bug, a number of // certificates have also included '', which has historically been // allowed by implementations, and so is also allowed here. // // If \|in\| can be normalized, returns true and sets \|output\| to the // case folded, normalized value. If \|in\| is invalid, returns false. // NOTE: \|output\| will be modified regardless of the return, so // callers are responsible to check the result. mattm* 2015/06/19 22:04:24 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > I feel that this description (and the comment I suggested) probably make more > sense as an overall function level comment > > // Normalizes the DER-encoded PrintableString value \|in\| according to > // RFC 2459, Section 4.1.2.4 > // > // Briefly, normalization involves removing leading and trailing > // whitespace, folding multiple whitespace characters into a single > // whitespace character, and normalizing on case (this function > // normalizes to lowercase). > // > // During normalization, this function also validates that \|in\| > // is properly encoded - that is, that it restricts to the character > // set defined in X.680 (2008), Section 41.4, Table 10. X.680 defines > // the valid characters as > // a-z A-Z 0-9 (space) ' ( ) + , - . / : = ? > // > // However, due to an old OpenSSL encoding bug, a number of > // certificates have also included '*', which has historically been > // allowed by implementations, and so is also allowed here. > // > // If \|in\| can be normalized, returns true and sets \|output\| to the > // case folded, normalized value. If \|in\| is invalid, returns false. > // NOTE: \|output\| will be modified regardless of the return, so > // callers are responsible to check the result. Done.
	53 *(write_iter++) = c;

	54 } else {

	55 // Fail on any characters that are not valid for PrintableString.

	56 return false;

	57 }

	58 }

	59 if (write_iter != output->end())

	60 output->erase(write_iter, output->end());

	61 return true;

	62 }

	63

	64 // Normalize a UTF-8 encoded string in a manner compatible with RFC 2459. This

	65 // could also be thought of as a small subset of RFC 5280 rules. Only ASCII

	66 // case folding and whitespace folding is performed.
	Ryan Sleevi 2015/06/18 01:30:12 Reword: // Normalizes \|output\|, a UTF-8 encoded s Reword: // Normalizes \|output\|, a UTF-8 encoded string, as if it contained // only ASCII characters. // // This could be considered a partial subset of RFC 5280 rules, and // is compatible with RFC 2459/3280. // // In particular, RFC 5280, Section 7.1 describes how UTF8String // and PrintableString should be compared - using the LDAP StringPrep // profile of RFC 4518, with case folding and whitespace compression. // However, because it is optional for implementations and because // it's desirable to avoid the size cost of a the StringPrep tables, // this function treats \|output\| as if it was composed of ASCII. // // That is, rather than folding all whitespace characters, it only // folds ' '. Rather than case folding using locale-aware handling, // it only folds A-Z to a-z. // // This gives better results than outright rejecting (due to mismatched // encodings), or from doing a strict binary comparison (the minimum // required by RFC 5280), and is sufficient for those certificates // publicly deployed. mattm 2015/06/19 22:04:24 done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > Reword: > > // Normalizes \|output\|, a UTF-8 encoded string, as if it contained > // only ASCII characters. > // > // This could be considered a partial subset of RFC 5280 rules, and > // is compatible with RFC 2459/3280. > // > // In particular, RFC 5280, Section 7.1 describes how UTF8String > // and PrintableString should be compared - using the LDAP StringPrep > // profile of RFC 4518, with case folding and whitespace compression. > // However, because it is optional for implementations and because > // it's desirable to avoid the size cost of a the StringPrep tables, > // this function treats \|output\| as if it was composed of ASCII. > // > // That is, rather than folding all whitespace characters, it only > // folds ' '. Rather than case folding using locale-aware handling, > // it only folds A-Z to a-z. > // > // This gives better results than outright rejecting (due to mismatched > // encodings), or from doing a strict binary comparison (the minimum > // required by RFC 5280), and is sufficient for those certificates > // publicly deployed. done.
	67 bool NormalizeUtf8String(std::string* output) {

	68 std::string::const_iterator read_iter = output->begin();

	69 std::string::iterator write_iter = output->begin();

	70

	71 for (; read_iter != output->end() && *read_iter == ' '; ++read_iter) {

	72 // Ignore leading whitespace.

	73 }

	74

	75 for (; read_iter != output->end(); ++read_iter) {

	76 const char c = *read_iter;

	77 if (c == ' ') {

	78 // If there are non-whitespace characters remaining in input, compress

	79 // multiple whitespace chars to a single space, otherwise ignore trailing

	80 // whitespace.

	81 std::string::const_iterator next_iter = read_iter + 1;

	82 if (next_iter != output->end() && *next_iter != ' ')

	83 *(write_iter++) = ' ';

	84 } else if (c >= 'A' && c <= 'Z') {

	85 // Fold case.

	86 *(write_iter++) = c + ('a' - 'A');

	87 } else {

	88 *(write_iter++) = c;

	89 }

	90 }

	91 if (write_iter != output->end())

	92 output->erase(write_iter, output->end());

	93 return true;

	94 }
	Ryan Sleevi 2015/06/18 01:30:12 Is there any reason not to combine NormalizeUtf8St Is there any reason not to combine NormalizeUtf8String and NormalizePrintableStringValue? That is, NormalizeConvertedDirectoryString(bool enforce_printable_string, std::string* directory_string) { for (...) { if (c == ' ') { } else if (c >= 'A' && c <= 'Z') { } else { if (enforce_printable_string && !IsLegalPrintableString(c)) { // Contains a character illegal for PrintableString return false; } (write_iter++) = c; } } bool NormalizePrintableStringValue(const der::Input& in, std::string output) { output->assign(...); return NormalizeConvertedDirectoryString(true, output); } bool NormalizeUtf8StringValue(const der::Input& in, std::string* output) { output->assign(...); return NormalizeConvertedDirectoryString(false, output); } mattm 2015/06/19 22:04:24 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > Is there any reason not to combine NormalizeUtf8String and > NormalizePrintableStringValue? > > That is, > > NormalizeConvertedDirectoryString(bool enforce_printable_string, > std::string* directory_string) { > for (...) { > if (c == ' ') { > } else if (c >= 'A' && c <= 'Z') { > } else { > if (enforce_printable_string && !IsLegalPrintableString(c)) { > // Contains a character illegal for PrintableString > return false; > } > (write_iter++) = c; > } > > } > > bool NormalizePrintableStringValue(const der::Input& in, std::string output) { > output->assign(...); > return NormalizeConvertedDirectoryString(true, output); > } > > bool NormalizeUtf8StringValue(const der::Input& in, std::string* output) { > output->assign(...); > return NormalizeConvertedDirectoryString(false, output); > } Done.
	95

	96 // Convert a UTF8String value to string object and then normalize it.

	97 bool NormalizeUtf8StringValue(const der::Input& in, std::string* output) {

	98 output->assign(reinterpret_cast<const char*>(in.UnsafeData()), in.Length());

	99 return NormalizeUtf8String(output);

	100 }

	101

	102 // Convert BMPString value to UTF-8 and then normalize it.
	Ryan Sleevi 2015/06/18 01:30:12 STYLE: Per http://google-styleguide.googlecode.com STYLE: Per http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Function_Comm... All of these should be following descriptive ("Converts a UTF8String") rather than imperative ("Convert a UTF8String") form. mattm 2015/06/19 22:04:23 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > STYLE: Per > http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Function_Comm... > > All of these should be following descriptive ("Converts a UTF8String") rather > than imperative ("Convert a UTF8String") form. Done.
	103 bool NormalizeBmpStringValue(const der::Input& in, std::string* output) {

	104 if (in.Length() % 2 != 0)

	105 return false;

	106

	107 base::string16 s16(reinterpret_cast<const base::char16*>(in.UnsafeData()),
	Ryan Sleevi 2015/06/18 01:30:12 naming nit: My gut is that \|s16\| violates the nami naming nit: My gut is that \|s16\| violates the naming rules, but I'm not too picky here. Is there a better name? mattm 2015/06/19 22:04:23 yeah... maybe "in_16bit"? Trying to avoid somethin Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > naming nit: My gut is that \|s16\| violates the naming rules, but I'm not too > picky here. Is there a better name? yeah... maybe "in_16bit"? Trying to avoid something excessively verbose.
	108 in.Length() / 2);

	109 for (base::string16::iterator i = s16.begin(); i != s16.end(); ++i) {

	110 // BMPString is UCS-2 in big-endian order.

	111 i = base::NetToHost16(i);

	112

	113 // BMPString only supports codepoints in the Basic Multilingual Plane,

	114 // surrogates are not allowed.
	Ryan Sleevi 2015/06/18 01:30:12 grammar nit: either ',' -> ';' or ', surrogates' grammar nit: either ',' -> ';' or ', surrogates' -> '. Surrogates' mattm 2015/06/19 22:04:23 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > grammar nit: > > either ',' -> ';' or ', surrogates' -> '. Surrogates' Done.
	115 if (CBU_IS_SURROGATE(*i))

	116 return false;

	117 }

	118 if (!base::UTF16ToUTF8(s16.data(), s16.size(), output))

	119 return false;

	120 return NormalizeUtf8String(output);

	121 }

	122

	123 // Convert UniversalString value to UTF-8 and then normalize it.

	124 bool NormalizeUniversalStringValue(const der::Input& in, std::string* output) {

	125 if (in.Length() % 4 != 0)

	126 return false;

	127

	128 std::vector<uint32_t> s32(

	129 reinterpret_cast<const uint32_t*>(in.UnsafeData()),

	130 reinterpret_cast<const uint32_t*>(in.UnsafeData()) + in.Length() / 4);

	131 for (std::vector<uint32_t>::const_iterator i = s32.begin(); i != s32.end();

	132 ++i) {

	133 // UniversalString is UCS-4 in big-endian order.

	134 uint32_t codepoint = base::NetToHost32(*i);

	135 if (!CBU_IS_UNICODE_CHAR(codepoint))

	136 return false;

	137

	138 base::WriteUnicodeCharacter(codepoint, output);

	139 }

	140 return NormalizeUtf8String(output);

	141 }

	142

	143 // Convert the string \|value\| to UTF-8, normalize it, and store in \|output\|.

	144 bool NormalizeValue(const der::Tag tag,

	145 const der::Input& value,

	146 std::string* output) {

	147 switch (tag) {

	148 case der::kPrintableString:

	149 return NormalizePrintableStringValue(value, output);

	150 case der::kUtf8String:

	151 return NormalizeUtf8StringValue(value, output);

	152 case der::kUniversalString:

	153 return NormalizeUniversalStringValue(value, output);

	154 case der::kBmpString:

	155 return NormalizeBmpStringValue(value, output);

	156 default:

	157 NOTREACHED();

	158 return false;

	159 }

	160 }

	161

	162 // Return true if \|tag\| is a string type that NormalizeValue can handle.

	163 bool IsNormalizable(der::Tag tag) {
	Ryan Sleevi 2015/06/18 01:30:12 IsNormalizableDirectoryString ? IsNormalizableDirectoryString ? mattm 2015/06/19 22:04:23 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > IsNormalizableDirectoryString ? Done.
	164 switch (tag) {

	165 case der::kPrintableString:

	166 case der::kUtf8String:

	167 case der::kUniversalString:

	168 case der::kBmpString:
	nharper 2015/06/17 19:16:55 Do we care about TeletexStrings? Do we care about TeletexStrings? Ryan Sleevi 2015/06/18 01:30:12 IA5String as well, which comes up with domainCompo IA5String as well, which comes up with domainComponent (Despite the obtusity of X.680 and the reference to the "International Register of Coded Character Sets to be Used with Escape Sequences" which is such a PITA to find, it's Registrations 1, 6, SPACE, and DELETE https://www.itscj.ipsj.or.jp/iso-ir/001.pdf <-- Registration 1 https://www.itscj.ipsj.or.jp/iso-ir/006.pdf <-- Registration 6 Which are themselves just enumerating the ISO 646 space (the 32 control characters, 0-9, :-?, A-Z, a-z, DEL). See https://en.wikipedia.org/wiki/ISO/IEC_646 Which is... SURPRISE... ASCII :) [OK, strictly speaking the values of things like '#' can vary by region, but the character itself is 0x00 - 0x1F (control chars - Registration 1), 0x20 (space), 0x21 - 0x7E (printable characters - Registration 6), and 0x7F (delete). 0x00 - 0x7F. ASCII :) If you want to try to distill that into a brief comment, be my guest, but I'd go with // IA5String is ISO/IEC Registrations 1 and 6 from the ISO // "International Register of Coded Character Sets to be used // with Escape Sequences", plus space and delete. That's just the // polite way of saying 0x00 - 0x7F, aka ASCII (or, more formally, // ISO/IEC 646) Ryan Sleevi 2015/06/18 01:30:12 Ooops, botched commenting. No. But definitely shou Show quoted text On 2015/06/17 19:16:55, nharper wrote: > Do we care about TeletexStrings? Ooops, botched commenting. No. But definitely should add a comment why // TeletexString isn't normalized. Section 8 of RFC 5280 briefly // describes the historical confusion between treating TeletexString // as Latin1String vs T.61, and there are even incompatibilities within // T.61 implementations. As this time is virtually unused, simply // treat it with a binary comparison, as permitted by RFC 3280/5280. mattm 2015/06/19 22:04:23 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > IA5String as well, which comes up with domainComponent > > (Despite the obtusity of X.680 and the reference to the "International Register > of Coded Character Sets to be Used with Escape Sequences" which is such a PITA > to find, it's Registrations 1, 6, SPACE, and DELETE > > https://www.itscj.ipsj.or.jp/iso-ir/001.pdf <-- Registration 1 > https://www.itscj.ipsj.or.jp/iso-ir/006.pdf <-- Registration 6 > > Which are themselves just enumerating the ISO 646 space (the 32 control > characters, 0-9, :-?, A-Z, a-z, DEL). See > https://en.wikipedia.org/wiki/ISO/IEC_646 > > Which is... SURPRISE... ASCII :) [OK, strictly speaking the values of things > like '#' can vary by region, but the character itself is 0x00 - 0x1F (control > chars - Registration 1), 0x20 (space), 0x21 - 0x7E (printable characters - > Registration 6), and 0x7F (delete). > > 0x00 - 0x7F. ASCII :) > > If you want to try to distill that into a brief comment, be my guest, but I'd go > with > > // IA5String is ISO/IEC Registrations 1 and 6 from the ISO > // "International Register of Coded Character Sets to be used > // with Escape Sequences", plus space and delete. That's just the > // polite way of saying 0x00 - 0x7F, aka ASCII (or, more formally, > // ISO/IEC 646) Done. mattm 2015/06/19 22:04:24 Done. Show quoted text On 2015/06/18 01:30:12, Ryan Sleevi wrote: > On 2015/06/17 19:16:55, nharper wrote: > > Do we care about TeletexStrings? > > Ooops, botched commenting. No. But definitely should add a comment why > > // TeletexString isn't normalized. Section 8 of RFC 5280 briefly > // describes the historical confusion between treating TeletexString > // as Latin1String vs T.61, and there are even incompatibilities within > // T.61 implementations. As this time is virtually unused, simply > // treat it with a binary comparison, as permitted by RFC 3280/5280. Done.
	169 return true;

	170 default:

	171 return false;

	172 }

	173 return false;

	174 }

	175

	176 bool VerifyAttributeValueMatch(der::Parser* a, der::Parser* b) {

	177 der::Input a_value, b_value;

	178

	179 // Read the attribute types, which must be OBJECT IDENTIFIERs.

	180 if (!a->ReadTag(der::kOid, &a_value))

	181 return false;

	182 if (!b->ReadTag(der::kOid, &b_value))

	183 return false;

	184 // Attribute types must be equal.

	185 if (!a_value.Equals(b_value))

	186 return false;

	187

	188 // Read the attribute value.

	189 der::Tag a_tag, b_tag;

	190 if (!a->ReadTagAndValue(&a_tag, &a_value))

	191 return false;

	192 if (!b->ReadTagAndValue(&b_tag, &b_value))

	193 return false;

	194

	195 // There should be no more elements in the sequence after reading the

	196 // attribute type and value.

	197 if (a->HasMore() \|\| b->HasMore())

	198 return false;

	199

	200 if (IsNormalizable(a_tag) && IsNormalizable(b_tag)) {

	201 std::string a_normalized, b_normalized;

	202 if (!NormalizeValue(a_tag, a_value, &a_normalized) \|\|

	203 !NormalizeValue(b_tag, b_value, &b_normalized))

	204 return false;

	205 return a_normalized == b_normalized;

	206 }

	207 // Attributes encoded with different types may be assumed to be unequal.

	208 if (a_tag != b_tag)

	209 return false;

	210 // All other types use binary comparison.

	211 return a_value.Equals(b_value);

	212 }

	213

	214 bool VerifyRDNMatch(der::Parser* a, der::Parser* b) {

	215 // Must have at least one AttributeTypeAndValue.

	216 if (!a->HasMore() \|\| !b->HasMore())

	217 return false;

	218

	219 while (a->HasMore() && b->HasMore()) {
	davidben 2015/06/17 13:33:46 Since these are SETs, the order of the elements ma Since these are SETs, the order of the elements may change when you normalize the elements. It's in lexicographic order of the DER representation, so you'll end up sorting on the total length first I believe? Does it turn out that everything that needs normalization doesn't change the relative order of the lengths? If they don't need to chop off that many spaces or whatever... If the total length doesn't change, then, so long as each attribute type appears at most once (is that guaranteed?), I believe you know it won't have to reorder. Actually, looking at random certificates, each RDN only has one attribute, so maybe that's actually why you don't care? (What does it mean to have multiple of them anyway?) Having to reorder on normalization seems like all kinds pain though. Might just be worth a comment if this never ends up mattering? Worst that'll happen I think is that we treat two RDNs as different when, after applying the normalization rules we chose, they should have been equal. Ryan Sleevi 2015/06/18 00:28:44 (is that guaranteed) In theory, yes. In practice, Show quoted text On 2015/06/17 13:33:46, David Benjamin wrote: > If the total length doesn't change, then, so long as each attribute type appears > at most once (is that guaranteed?), I believe you know it won't have to reorder. (is that guaranteed) In theory, yes. In practice, no. X.501 (2012), Section 9.3 specifies that "The set that forms an RDN contains exactly one AttributeTypeAndValue for each attribute which contains distinguished values in the entry; that is, a given attribute type cannot appear twice in the same RDN. An attribute value that has been designated to appear in an RDN is called a distinguished value. There may be other values of the same attribute that are not distinguished values and thus may not be used in an RDN. An RDN for a given entry is formed by using one distinguished value from each attribute that has distinguished values." RFC 5280 incorporates X.501 (2005) in 4.1.2.4 "The issuer field is defined as the X.501 type Name". While not directly referencing X.501's requirements on inclusions (since X.501 also had crap like primaryDistinguished bits), the above understanding is expected. However, with that said, the canonical example of "stupid CA crap" is SEQUENCE { SET { DomainComponent = IA5String("com"), DomainComponent = IA5String("example"), DomainComponent = IA5String("ssl"), } } [Although I seem to remember an implementation violating 3280, which admittedly is specific to LDAP, and using DirectoryString-utf8string for the DC] So we end up with the same attribute type, but different values, all within the set. In the X.501 hierarchy, these are all part of the same level of the hiearchy, and considered alternative/equivalent naming (OK, technically, invalid naming; but if it was a DirectoryString and a SerialNumber, they'd be considered equivalent expressions of the same naming hierarchy) Show quoted text > > Actually, looking at random certificates, each RDN only has one attribute, so > maybe that's actually why you don't care? (What does it mean to have multiple of > them anyway?) Most common example is sticking a serialNumber field at the same hierarchy as the email address and the commonName. However, I want to be explicit: "looking at random certificates" is not the way we want to do this. We want to strictly follow the specs, and where appropriate/necessary, relax. We also want to implement the least bits necessary, WHEN they can be safely detached. davidben 2015/06/18 01:58:27 I'm quite aware of that. I think you missed the po Show quoted text On 2015/06/18 00:28:44, Ryan Sleevi wrote: > However, I want to be explicit: "looking at random certificates" is not the way > we want to do this. We want to strictly follow the specs, and where > appropriate/necessary, relax. We also want to implement the least bits > necessary, WHEN they can be safely detached. I'm quite aware of that. I think you missed the point of my comment. In context, "looking at random certificates" was a cursory attempt to figure out whether multiple AttributeTypeAndValue pairs ever came up. I'm saying that, in order to strictly follow the specs, I believe we need to reorder things because of how SETs in DER are encoded. That is, this logic will canonicalize: [Assume, for the sake of discussion, that CountryName and DomainComponent's OID representations have the same length.] SET { CountryName = IA5String("aa") DomainComponent = PrintableString(" b") } to SET { CountryName = IA5String("aa") DomainComponent = PrintableString("b") } But this is wrong. That is not valid DER. SETs unordered and serialized by sorting their elements by their DER representation. The tags of both pairs are SEQUENCE, so you compare the length next, and the first child now has a longer total length than the second. So the actual canonicalized form is: SET { DomainComponent = PrintableString("b") CountryName = IA5String("aa") } If we want to canonicalize and do so without violating DER, we need a reordering step. Which is going to be a huge pain, and so it may be worth considering not doing this. If we don't, that means we are making various assumptions, hence the discussion about whether multiple attributes ever come up in cases we have to canonicalize, etc. We'd also want a comment explaining the intentional omission, should we decide to make it. Ryan Sleevi 2015/06/18 03:30:01 No, because this code doesn't return the canonical Show quoted text On 2015/06/18 01:58:27, David Benjamin wrote: > If we want to canonicalize and do so without violating DER, we need a reordering > step. No, because this code doesn't return the canonicalized form back to the caller. It just verifies a match. Yes, there's a bug lurking here because within the RDN, you don't match to make sure they have the same order, you just make sure that \|b\| the same number of elements of \|a\| and that every value within \|b\| is present in \|a\|. I think we're talking about the same bug, but talking different solutions. Your remark is to sort them as part of canonicalization, which is one option. Another would be just a linear scan. Show quoted text > Which is going to be a huge pain, and so it may be worth considering not > doing this. If we don't, that means we are making various assumptions, hence the > discussion about whether multiple attributes ever come up in cases we have to > canonicalize, etc. We'd also want a comment explaining the intentional omission, > should we decide to make it. I have no idea why you say it would be a huge pain. You could linear scan \|b\| for every element in \|a\| cheaply. You could ingest both \|a\| and \|b\| into two vectors of der::Input tuples, sort on the type, and then just iterate then (with or without a linear scan substep for misencoded certs that have multiple values of the same type). You said "without violating DER" (so, re-encoding), but I'm not sure where that would come up. Our internal indexing can use whatever sort function we want. The only time I can think a normalized DER would come up with would be OS X integration, but then we would potentially already have issues because of the stringprep. That is, any normalization for OS API integration would be a separate function solely for that integration. Here, we just need to make sure that a and b fully intersect, and we can do so however we want. mattm 2015/06/19 22:04:23 Did the "ingest into two vectors and match" thing. Show quoted text On 2015/06/18 03:30:01, Ryan Sleevi wrote: > On 2015/06/18 01:58:27, David Benjamin wrote: > > If we want to canonicalize and do so without violating DER, we need a > reordering > > step. > > No, because this code doesn't return the canonicalized form back to the caller. > It just verifies a match. > > Yes, there's a bug lurking here because within the RDN, you don't match to > make sure they have the same order, you just make sure that \|b\| the same number > of elements of \|a\| and that every value within \|b\| is present in \|a\|. > > I think we're talking about the same bug, but talking different solutions. Your > remark is to sort them as part of canonicalization, which is one option. Another > would be just a linear scan. > > > Which is going to be a huge pain, and so it may be worth considering not > > doing this. If we don't, that means we are making various assumptions, hence > the > > discussion about whether multiple attributes ever come up in cases we have to > > canonicalize, etc. We'd also want a comment explaining the intentional > omission, > > should we decide to make it. > > I have no idea why you say it would be a huge pain. You could linear scan \|b\| > for every element in \|a\| cheaply. You could ingest both \|a\| and \|b\| into two > vectors of der::Input tuples, sort on the type, and then just iterate then (with > or without a linear scan substep for misencoded certs that have multiple values > of the same type). > > You said "without violating DER" (so, re-encoding), but I'm not sure where that > would come up. Our internal indexing can use whatever sort function we want. The > only time I can think a normalized DER would come up with would be OS X > integration, but then we would potentially already have issues because of the > stringprep. That is, any normalization for OS API integration would be a > separate function solely for that integration. Here, we just need to make sure > that a and b fully intersect, and we can do so however we want. Did the "ingest into two vectors and match" thing. I didn't do the sort part, since it seems unlikely any sane cert would have enough elements in a RDN for it to matter. mattm 2015/06/19 22:04:24 Good catch. Show quoted text On 2015/06/17 13:33:46, David Benjamin wrote: > Since these are SETs, the order of the elements may change when you normalize > the elements. It's in lexicographic order of the DER representation, so you'll > end up sorting on the total length first I believe? Does it turn out that > everything that needs normalization doesn't change the relative order of the > lengths? If they don't need to chop off that many spaces or whatever... > > If the total length doesn't change, then, so long as each attribute type appears > at most once (is that guaranteed?), I believe you know it won't have to reorder. > > Actually, looking at random certificates, each RDN only has one attribute, so > maybe that's actually why you don't care? (What does it mean to have multiple of > them anyway?) > > Having to reorder on normalization seems like all kinds pain though. Might just > be worth a comment if this never ends up mattering? Worst that'll happen I think > is that we treat two RDNs as different when, after applying the normalization > rules we chose, they should have been equal. Good catch.
	220 der::Parser a_attr_type_and_value;

	221 der::Parser b_attr_type_and_value;

	222 if (!a->ReadSequence(&a_attr_type_and_value) \|\|

	223 !b->ReadSequence(&b_attr_type_and_value))

	224 return false;

	225 if (!VerifyAttributeValueMatch(&a_attr_type_and_value,

	226 &b_attr_type_and_value))

	227 return false;

	228 }

	229

	230 // If one of the RDNs has more elements than the other, not a match.

	231 if (a->HasMore() \|\| b->HasMore())

	232 return false;

	233

	234 return true;

	235 }

	236

	237 } // namespace

	238

	239 // TODO(mattm): is returning false on parsing errors ok, or should it try to

	240 // fall back to binary comparison on unexpected input?

10 bool VerifyNameMatch(const der::Input& a, const der::Input& b) {	241 bool VerifyNameMatch(const der::Input& a, const der::Input& b) {

11 // TODO(mattm): use normalization as specified in RFC 5280 section 7.	242 der::Parser a_parser(a);

12 return a.Equals(b);	243 der::Parser b_parser(b);

	244 der::Parser a_rdn_sequence;

	245 der::Parser b_rdn_sequence;

	246

	247 if (!a_parser.ReadSequence(&a_rdn_sequence) \|\|

	248 !b_parser.ReadSequence(&b_rdn_sequence)) {

	249 return false;

	250 }

	251

	252 // No data should remain in the inputs after the RDN sequence.

	253 if (a_parser.HasMore() \|\| b_parser.HasMore())

	254 return false;

	255

	256 // Must have at least one RDN.

	257 if (!a_rdn_sequence.HasMore() \|\| !b_rdn_sequence.HasMore())

	258 return false;

	259

	260 while (a_rdn_sequence.HasMore() && b_rdn_sequence.HasMore()) {

	261 der::Parser a_rdn, b_rdn;

	262 if (!a_rdn_sequence.ReadConstructed(der::kSet, &a_rdn) \|\|

	263 !b_rdn_sequence.ReadConstructed(der::kSet, &b_rdn)) {

	264 return false;

	265 }

	266 if (!VerifyRDNMatch(&a_rdn, &b_rdn))

	267 return false;

	268 }

	269

	270 // If one of the sequences has more elements than the other, not a match.

	271 if (a_rdn_sequence.HasMore() \|\| b_rdn_sequence.HasMore())

	272 return false;

	273

	274 return true;

13 }	275 }

14	276

15 } // namespace net	277 } // namespace net

OLD	NEW

« no previous file with comments | « no previous file | net/cert/internal/verify_name_match_unittest.cc » ('j') | no next file with comments »