Issue 99183: Canonicalize IPv6 addresses.

Issue 99183: Canonicalize IPv6 addresses. (Closed)

Created:
11 years, 7 months ago by eroman

Modified:
9 years, 6 months ago

Reviewers:
brettw, wtc, pmarks

CC:
chromium-reviews_googlegroups.com

Base URL:
http://google-url.googlecode.com/svn/trunk/

Visibility:
Public.

Description

Canonicalize IPv6 addresses. BUG=http://code.google.com/p/google-url/issues/detail?id=2 Committed: http://code.google.com/p/google-url/source/detail?r=104

Patch Set 2 : add missing unit tests #

Patch Set 4 : Add support for scope-ids #

Patch Set 5 : cleanup and fix lint errors #

Patch Set 6 : only normalize scope-id on WIN32, since it isn't necessarily a number on linux #

Patch Set 7 : kill some whitespace #

Total comments: 9

Patch Set 8 : Address brettw's comments #

Total comments: 6

Patch Set 9 : remove scopeid support & fix itoa_s radix=16 for non-windows #

Patch Set 10 : Use zero-compression in the canonicalized output #

Total comments: 2

Patch Set 11 : fix case where colon wasn't appended #

Total comments: 12

Patch Set 12 : Address pmarks loop change, and brettw's comments #

Total comments: 4

Patch Set 13 : address pmarks comments #

Created: 11 years, 7 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+452 lines, -59 lines)			Patch
M	src/url_canon_host.cc	View	1 2 3 4 5 6 7 8 9 10 11 12	1 chunk	+1 line, -1 line	0 comments	Download
M	src/url_canon_internal.cc	View	9 10 11 12	1 chunk	+7 lines, -2 lines	0 comments	Download
M	src/url_canon_ip.cc	View	1 2 3 4 5 6 7 8 9 10 11 12	5 chunks	+364 lines, -49 lines	0 comments	Download
M	src/url_canon_unittest.cc	View	2 3 4 5 6 7 8 9 10 11 12	4 chunks	+80 lines, -7 lines	0 comments	Download

Messages

Total messages: 23 (0 generated)

Expand Messages | Collapse Messages

eroman

brettw: please review the whole. wtc: please review the test cases in |url_canon_unittest.cc| in case ...

11 years, 7 months ago (2009-04-30 07:03:16 UTC) #1

brettw

Sorry I didn't get to all of this yet. I'll do it tomorrow. http://codereview.chromium.org/99183/diff/1018/31 File ...

11 years, 7 months ago (2009-05-04 08:55:08 UTC) #2

brettw

I did some more, still not done. http://codereview.chromium.org/99183/diff/1018/29 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/1018/29#newcode374 Line 374: int ...

11 years, 7 months ago (2009-05-05 09:25:24 UTC) #3

eroman

http://codereview.chromium.org/99183/diff/1018/29 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/1018/29#newcode374 Line 374: int end = address.end(); On 2009/05/05 09:25:25, brettw ...

11 years, 7 months ago (2009-05-05 19:42:03 UTC) #4

http://codereview.chromium.org/99183/diff/1018/29
File src/url_canon_ip.cc (right):

http://codereview.chromium.org/99183/diff/1018/29#newcode374
Line 374: int end = address.end();
On 2009/05/05 09:25:25, brettw wrote:
> If 'address' is invalid, the length will be -1 and you'll have problems. Is it
> guaranteed to be valid?

Done.
Added a check for non-empty.

(This turns out to have worked, but for an awkward reason. While it wasn't
possible for |address| to be invalid, it was possible for it to be empty. The
empty case would work since |spec[end]| is always either ']' or '%' rather than
a memory error)

http://codereview.chromium.org/99183/diff/1018/29#newcode382
Line 382: bool is_contraction = is_colon && i != end && spec[i + 1] == ':';
On 2009/05/05 09:25:25, brettw wrote:
> "end" is 1 past the end of the string, and even if it was the end, this would
be
> wrong. I think you want "i < end - 1"

Done.
Changed to |i < end - 1|.

(This turns out to have worked for the same awkward reason as above. |spec[end]|
is always ']' or '%' rather than a memory error).

http://codereview.chromium.org/99183/diff/1018/29#newcode387
Line 387: if (is_colon || i == end) {
On 2009/05/05 09:25:25, brettw wrote:
> Same thing about "end", and in several places below. Maybe you should rename
it
> to "last" and set it to end() - 1?

Here the use of |end| is intentional.

The loop goes from (i = begin; i <= end; ++i), so "end" is inclusive. The checks
|i == end| are for ending a token when we have run out of input. This is modeled
after |DoFindIPv4Components| which does a similar type of loop.

To clarify this aspect, I have changed the loop comment:

- for (int i = begin; /* nothing */; i++) {
+ for (int i = begin; /* i <= end */; i++) {

http://codereview.chromium.org/99183/diff/1018/29#newcode420
Line 420: // There can be at most one contraction in the literal.
On 2009/05/05 09:25:25, brettw wrote:
> This is indented 1 space too much.

Fixed.

brettw

http://codereview.chromium.org/99183/diff/5001/4002 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/5001/4002#newcode540 Line 540: int cur_index_in_address = 0; // Current output location ...

11 years, 7 months ago (2009-05-06 09:13:06 UTC) #5

pmarks

Skimming the test data, I notice that your canonical format doesn't use any zero compression. ...

11 years, 7 months ago (2009-05-07 06:00:30 UTC) #6

pmarks

Also, I think you may want to drop the "%" interface identifier parsing. See RFC3986, ...

11 years, 7 months ago (2009-05-09 23:34:29 UTC) #7

wtc

On 2009/05/09 23:34:29, pmarks wrote: > Also, I think you may want to drop the ...

11 years, 7 months ago (2009-05-11 18:49:11 UTC) #8

eroman

Thanks for all the feedback! I hope to address these comments soon and upload a ...

11 years, 7 months ago (2009-05-11 18:56:10 UTC) #9

eroman

Finally got around to updating this! > Skimming the test data, I notice that your ...

11 years, 7 months ago (2009-05-13 22:36:50 UTC) #10

pmarks

http://codereview.chromium.org/99183/diff/6013/6014 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/6013/6014#newcode585 Line 585: !IsInRange(contraction_range, i + 2) && You're removing the ...

11 years, 7 months ago (2009-05-13 23:24:08 UTC) #11

pmarks

On 2009/05/13 23:24:08, pmarks wrote: > (Also, I didn't notice any tests that cover cases ...

11 years, 7 months ago (2009-05-13 23:27:39 UTC) #12

pmarks

Oh, and this needs to be asked: Why are you re-implementing inet_ntop(), instead of just ...

11 years, 7 months ago (2009-05-13 23:42:46 UTC) #13

wtc

On 2009/05/13 23:42:46, pmarks wrote: > Oh, and this needs to be asked: Why are ...

11 years, 7 months ago (2009-05-14 00:07:33 UTC) #14

pmarks

On 2009/05/14 00:07:33, wtc wrote: > Do you mean this CL is just a re-implementation ...

11 years, 7 months ago (2009-05-14 00:15:17 UTC) #15

eroman

> Oh, and this needs to be asked: Why are you re-implementing > inet_ntop(), instead ...

11 years, 7 months ago (2009-05-14 00:34:09 UTC) #16

pmarks

http://codereview.chromium.org/99183/diff/7011/7012 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/7011/7012#newcode578 Line 578: for (int i = 0; i < 16; ...

11 years, 7 months ago (2009-05-14 03:13:27 UTC) #17

brettw

On 2009/05/14 00:15:17, pmarks wrote: > Between "use the OS libraries to save code" and ...

11 years, 7 months ago (2009-05-17 04:07:05 UTC) #18

brettw

LGTM if you agree with everything http://codereview.chromium.org/99183/diff/7011/7012 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/7011/7012#newcode354 Line 354: // Reached ...

11 years, 7 months ago (2009-05-17 04:14:16 UTC) #19

LGTM if you agree with everything

http://codereview.chromium.org/99183/diff/7011/7012
File src/url_canon_ip.cc (right):

http://codereview.chromium.org/99183/diff/7011/7012#newcode354
Line 354: // Reached the end of the input, DONE.
I think this comment would make more sense as an end of line comment after the
return (since getting to where the comment is doesn't mean we reached the end).

http://codereview.chromium.org/99183/diff/7011/7012#newcode432
Line 432: uint32_t IPv6HexComponentToNumber(const CHAR* spec,
Is there any reason this can't be a uint16_t?

http://codereview.chromium.org/99183/diff/7011/7012#newcode486
Line 486: uint32_t number = IPv6HexComponentToNumber<CHAR>(
If we can change to uint16_t above, be sure to change this one as well.

http://codereview.chromium.org/99183/diff/7011/7012#newcode578
Line 578: for (int i = 0; i < 16; i += 2) {
On 2009/05/14 03:13:27, pmarks wrote:
> Here's a more efficient version of the for loop.  It lets you get rid of
> IsInRange():
> 
>   for (int i = 0; i < 16; ) {
>     if (i == contraction_range.begin && contraction_range.len > 0) {
>       // Jump over the contraction.
>       if (i == 0)
>         output->push_back(':');
>       output->push_back(':');
>       i = contraction_range.end();
> 
>     } else {
>       // Consume the next 16 bits from |address|.
>       const uint32_t x = address[i] << 8 | address[i + 1];

Maybe this should just be an int since we pass to itoa below? I'd also lose the
const.

>       i += 2;
> 
>       // Stringify the 16 bit number (at most requires 4 hex digits).
>       char str[5];
>       _itoa_s(x, str, 16);
>       for (int ch = 0; str[ch] != '\0'; ++ch)
>         output->push_back(str[ch]);
> 
>       // Put a colon after each number, except the last.
>       if (i < 16) {
>         output->push_back(':');
>       }

I wouldn't use {} here to keep it like the rest of the file.

http://codereview.chromium.org/99183/diff/7011/7014
File src/url_canon_unittest.cc (right):

http://codereview.chromium.org/99183/diff/7011/7014#newcode610
Line 610: 
Can you add a testcase with a single colon at the beginning? There is some code
to handle this and I don't see a test.

eroman

Thanks for the code reviews. http://codereview.chromium.org/99183/diff/7011/7012 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/7011/7012#newcode354 Line 354: // Reached the ...

11 years, 7 months ago (2009-05-18 20:29:26 UTC) #20

Thanks for the code reviews.

http://codereview.chromium.org/99183/diff/7011/7012
File src/url_canon_ip.cc (right):

http://codereview.chromium.org/99183/diff/7011/7012#newcode354
Line 354: // Reached the end of the input, DONE.
On 2009/05/17 04:14:16, brettw wrote:
> I think this comment would make more sense as an end of line comment after the
> return (since getting to where the comment is doesn't mean we reached the
end).

Done.

http://codereview.chromium.org/99183/diff/7011/7012#newcode432
Line 432: uint32_t IPv6HexComponentToNumber(const CHAR* spec,
On 2009/05/17 04:14:16, brettw wrote:
> Is there any reason this can't be a uint16_t?

Done.

http://codereview.chromium.org/99183/diff/7011/7012#newcode486
Line 486: uint32_t number = IPv6HexComponentToNumber<CHAR>(
On 2009/05/17 04:14:16, brettw wrote:
> If we can change to uint16_t above, be sure to change this one as well.

Done.

http://codereview.chromium.org/99183/diff/7011/7012#newcode578
Line 578: for (int i = 0; i < 16; i += 2) {
On 2009/05/17 04:14:16, brettw wrote:
> On 2009/05/14 03:13:27, pmarks wrote:
> > Here's a more efficient version of the for loop.  It lets you get rid of
> > IsInRange():
> > 
> >   for (int i = 0; i < 16; ) {
> >     if (i == contraction_range.begin && contraction_range.len > 0) {
> >       // Jump over the contraction.
> >       if (i == 0)
> >         output->push_back(':');
> >       output->push_back(':');
> >       i = contraction_range.end();
> > 
> >     } else {
> >       // Consume the next 16 bits from |address|.
> >       const uint32_t x = address[i] << 8 | address[i + 1];
> 
> Maybe this should just be an int since we pass to itoa below? I'd also lose
the
> const.

Done.

> 
> 
> >       i += 2;
> > 
> >       // Stringify the 16 bit number (at most requires 4 hex digits).
> >       char str[5];
> >       _itoa_s(x, str, 16);
> >       for (int ch = 0; str[ch] != '\0'; ++ch)
> >         output->push_back(str[ch]);
> > 
> >       // Put a colon after each number, except the last.
> >       if (i < 16) {
> >         output->push_back(':');
> >       }
> 
> I wouldn't use {} here to keep it like the rest of the file.

Done.

> 
>

http://codereview.chromium.org/99183/diff/7011/7012#newcode578
Line 578: for (int i = 0; i < 16; i += 2) {
On 2009/05/14 03:13:27, pmarks wrote:
> Here's a more efficient version of the for loop.  It lets you get rid of
> IsInRange():
> 
>   for (int i = 0; i < 16; ) {
>     if (i == contraction_range.begin && contraction_range.len > 0) {
>       // Jump over the contraction.
>       if (i == 0)
>         output->push_back(':');
>       output->push_back(':');
>       i = contraction_range.end();
> 
>     } else {
>       // Consume the next 16 bits from |address|.
>       const uint32_t x = address[i] << 8 | address[i + 1];
>       i += 2;
> 
>       // Stringify the 16 bit number (at most requires 4 hex digits).
>       char str[5];
>       _itoa_s(x, str, 16);
>       for (int ch = 0; str[ch] != '\0'; ++ch)
>         output->push_back(str[ch]);
> 
>       // Put a colon after each number, except the last.
>       if (i < 16) {
>         output->push_back(':');
>       }
>     }
>   }
> 

Done.
That is a better loop indeed, thanks!

http://codereview.chromium.org/99183/diff/7011/7014
File src/url_canon_unittest.cc (right):

http://codereview.chromium.org/99183/diff/7011/7014#newcode610
Line 610: 
On 2009/05/17 04:14:16, brettw wrote:
> Can you add a testcase with a single colon at the beginning? There is some
code
> to handle this and I don't see a test.

Done.

pmarks

http://codereview.chromium.org/99183/diff/6030/7019 File src/url_canon_ip.cc (right): http://codereview.chromium.org/99183/diff/6030/7019#newcode572 Line 572: for (int i = 0; i < 16; ...

11 years, 7 months ago (2009-05-18 21:12:48 UTC) #21

eroman

11 years, 7 months ago (2009-05-18 22:10:54 UTC) #23

http://codereview.chromium.org/99183/diff/6030/7019
File src/url_canon_ip.cc (right):

http://codereview.chromium.org/99183/diff/6030/7019#newcode572
Line 572: for (int i = 0; i < 16; ) {
On 2009/05/18 21:12:48, pmarks wrote:
> I suppose you should get rid of the space before the ), as lint doesn't seem
to
> like it.

Done.

http://codereview.chromium.org/99183/diff/6030/7021
File src/url_canon_unittest.cc (right):

http://codereview.chromium.org/99183/diff/6030/7021#newcode613
Line 613: {"[0:0::0:0:8:]", L"[:0:0::0:0:8:]", "", url_parse::Component(),
false},
On 2009/05/18 21:12:48, pmarks wrote:
> The first two strings aren't the same.  Is this intentional?

Fixed. (no, not intentional).

Expand Messages | Collapse Messages