|
|
Chromium Code Reviews|
Created:
6 years, 9 months ago by Anuj Modified:
6 years, 9 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionDon't unescape BiDi control characters in URL components
As per http://tools.ietf.org/html/rfc3987#section-4.1, the BiDi control
characters are not allowed in IRI.
Add constants for the new BiDi control characters from http://www.unicode.org/reports/tr9/ in rtl.h.
BUG=337746
TBR=rsleevi
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=254091
Patch Set 1 #
Total comments: 16
Patch Set 2 : Addressed comments #
Total comments: 18
Patch Set 3 : Addressed comments - 2 #Patch Set 4 : Addressed comments - 3 #
Messages
Total messages: 29 (0 generated)
PTAL
https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode101 net/base/escape.cc:101: // char value. Nit: // Attempts to unescape the sequence at |index| within |escaped_text|. If // successful, sets |value| to the unescaped value. Returns whether unescaping // succeeded. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode104 net/base/escape.cc:104: int index, This should be a size_t. Nit: Indenting (2 lines) https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode106 net/base/escape.cc:106: const typename STR::value_type most_sig_digit( This function should also check whether escaped_text[index] == '%'. You should also check whether escaped_text.size() <= (i + 2). Doing that check here will allow simplifying the caller some; see below. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode147 net/base/escape.cc:147: if (current_char == '%' && i + 2 < max) { If you add the checks mentioned above, |max| can be eliminated entirely, and this can become: unsigned char value; if (UnescapeUnsignedCharAtIndex(escaped_text, i, &value)) { ... https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode159 net/base/escape.cc:159: // kRightToLeftOverride = "%E2%80%AE" Nit: Don't use kNames for things that are just comments. Also, give the Unicode character constants and names for these: // Per http://tools.ietf.org/html/rfc3987#section-4.1, the following BiDi // control characters are not allowed to appear unescaped in URLs: // // U+200E LEFT-TO-RIGHT MARK (%E2%80%8E) // U+200F RIGHT-TO-LEFT MARK (%E2%80%8F) // U+202A LEFT-TO-RIGHT EMBEDDING (%E2%80%AA) // U+202B RIGHT-TO-LEFT EMBEDDING (%E2%80%AB) // U+202C POP DIRECTIONAL FORMATTING (%E2%80%AC) // U+202D LEFT-TO-RIGHT OVERRIDE (%E2%80%AD) // U+202E RIGHT-TO-LEFT OVERRIDE (%E2%80%AE) https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode160 net/base/escape.cc:160: if (value == 0xE2 && i + 8 < max) { If you add the checks mentioned above, you can eliminate all the length checks here. You do need to add checks that the function call succeeded. (Those should have been present anyway.) https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode167 net/base/escape.cc:167: value == 0x8F) { Nit: Simpler: if ((value == 0x8E) || (value == 0x8F) || ((value >= 0xAA) && (value <= 0xAE))) { ... https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode173 net/base/escape.cc:173: // Restore value if BiDi control character not found. Prefer declaring a different temp to hold the second and third bytes instead of overwriting |value| and then having to restore it here.
Added more unittests. Also included the new BiDi control characters added in the Unicode TR9 as referenced by the original RFC 3987. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode101 net/base/escape.cc:101: // char value. On 2014/02/27 04:43:04, Peter Kasting wrote: > Nit: > > // Attempts to unescape the sequence at |index| within |escaped_text|. If > // successful, sets |value| to the unescaped value. Returns whether unescaping > // succeeded. Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode104 net/base/escape.cc:104: int index, On 2014/02/27 04:43:04, Peter Kasting wrote: > This should be a size_t. > > Nit: Indenting (2 lines) Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode106 net/base/escape.cc:106: const typename STR::value_type most_sig_digit( On 2014/02/27 04:43:04, Peter Kasting wrote: > This function should also check whether escaped_text[index] == '%'. > > You should also check whether escaped_text.size() <= (i + 2). Doing that check > here will allow simplifying the caller some; see below. Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode147 net/base/escape.cc:147: if (current_char == '%' && i + 2 < max) { On 2014/02/27 04:43:04, Peter Kasting wrote: > If you add the checks mentioned above, |max| can be eliminated entirely, and > this can become: > > unsigned char value; > if (UnescapeUnsignedCharAtIndex(escaped_text, i, &value)) { > ... Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode159 net/base/escape.cc:159: // kRightToLeftOverride = "%E2%80%AE" On 2014/02/27 04:43:04, Peter Kasting wrote: > Nit: Don't use kNames for things that are just comments. Also, give the Unicode > character constants and names for these: > > // Per http://tools.ietf.org/html/rfc3987#section-4.1, the following BiDi > // control characters are not allowed to appear unescaped in URLs: > // > // U+200E LEFT-TO-RIGHT MARK (%E2%80%8E) > // U+200F RIGHT-TO-LEFT MARK (%E2%80%8F) > // U+202A LEFT-TO-RIGHT EMBEDDING (%E2%80%AA) > // U+202B RIGHT-TO-LEFT EMBEDDING (%E2%80%AB) > // U+202C POP DIRECTIONAL FORMATTING (%E2%80%AC) > // U+202D LEFT-TO-RIGHT OVERRIDE (%E2%80%AD) > // U+202E RIGHT-TO-LEFT OVERRIDE (%E2%80%AE) Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode160 net/base/escape.cc:160: if (value == 0xE2 && i + 8 < max) { On 2014/02/27 04:43:04, Peter Kasting wrote: > If you add the checks mentioned above, you can eliminate all the length checks > here. > > You do need to add checks that the function call succeeded. (Those should have > been present anyway.) Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode167 net/base/escape.cc:167: value == 0x8F) { On 2014/02/27 04:43:04, Peter Kasting wrote: > Nit: Simpler: > > if ((value == 0x8E) || (value == 0x8F) || > ((value >= 0xAA) && (value <= 0xAE))) { ... Done. https://codereview.chromium.org/181483008/diff/1/net/base/escape.cc#newcode173 net/base/escape.cc:173: // Restore value if BiDi control character not found. On 2014/02/27 04:43:04, Peter Kasting wrote: > Prefer declaring a different temp to hold the second and third bytes instead of > overwriting |value| and then having to restore it here. Done.
https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h File base/i18n/rtl.h (right): https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h#newcode32 base/i18n/rtl.h:32: const char16 kPopDirectionalIsolate = 0x2069; Why are the changes in this file needed? https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:107: if (index + 2 < escaped_text.size()) { Nit: Use an early-return here, as well as on the IsHexDigit() checks below, to minimize indenting and braces. Parens around binary subexpr. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:108: char current_char = static_cast<char>(escaped_text[index]); Nit: Inline into the next statement. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:175: unsigned char tmp_value1; Nit: Just call this "temp" and use it for both tmp_value1 and tmp_value2; see below for how I'd do this. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:176: if (value == 0xD8) { Nit: Combine conditionals here and hoist the comment: // Check for ALM. unsigned char temp; if ((value == 0xD8) && UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && (temp == 0x9C)) { ... https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:180: result.append(escaped_text, i, 6); Nit: Indented too far https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:186: if (value == 0xE2) { Nit: Similarly: // Check for other BiDi control characters. if ((value == 0xE2) && UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && ((temp == 0x80) && UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && ((temp == 0x8E) || (temp == 0x8F) || ((temp >= 0xAA) && (temp <= 0xAE)))) || ((temp == 0x81) && UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && (temp >= 0xA6) && (temp <= 0xA9))) { ...
Let me know if you don't like the change in rtl.h - I will like to add it sooner or later, but won't insist for this change. https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h File base/i18n/rtl.h (right): https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h#newcode32 base/i18n/rtl.h:32: const char16 kPopDirectionalIsolate = 0x2069; On 2014/02/27 21:37:09, Peter Kasting wrote: > Why are the changes in this file needed? Not needed. It just seemed like a related thing to be updated. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:107: if (index + 2 < escaped_text.size()) { On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Use an early-return here, as well as on the IsHexDigit() checks below, to > minimize indenting and braces. > > Parens around binary subexpr. Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:108: char current_char = static_cast<char>(escaped_text[index]); On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Inline into the next statement. Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:175: unsigned char tmp_value1; On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Just call this "temp" and use it for both tmp_value1 and tmp_value2; see > below for how I'd do this. Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:176: if (value == 0xD8) { On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Combine conditionals here and hoist the comment: > > // Check for ALM. > unsigned char temp; > if ((value == 0xD8) && > UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && > (temp == 0x9C)) { > ... Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:180: result.append(escaped_text, i, 6); On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Indented too far Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:186: if (value == 0xE2) { On 2014/02/27 21:37:09, Peter Kasting wrote: > Nit: Similarly: > > // Check for other BiDi control characters. > if ((value == 0xE2) && > UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && > ((temp == 0x80) && > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > ((temp == 0x8E) || (temp == 0x8F) || > ((temp >= 0xAA) && (temp <= 0xAE)))) || > ((temp == 0x81) && > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > (temp >= 0xA6) && (temp <= 0xA9))) { > ... I will pass on this change. I think compiler should be able to handle this much optimization.
https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h File base/i18n/rtl.h (right): https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h#newcode32 base/i18n/rtl.h:32: const char16 kPopDirectionalIsolate = 0x2069; On 2014/02/27 21:55:35, Anuj wrote: > On 2014/02/27 21:37:09, Peter Kasting wrote: > > Why are the changes in this file needed? > > Not needed. It just seemed like a related thing to be updated. We shouldn't declare constants we don't refer to. Let's only add something here if there's a file that references it. (Similarly, if any of the existing constants are never used, we should remove them.) https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:186: if (value == 0xE2) { On 2014/02/27 21:55:35, Anuj wrote: > On 2014/02/27 21:37:09, Peter Kasting wrote: > > Nit: Similarly: > > > > // Check for other BiDi control characters. > > if ((value == 0xE2) && > > UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && > > ((temp == 0x80) && > > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > > ((temp == 0x8E) || (temp == 0x8F) || > > ((temp >= 0xAA) && (temp <= 0xAE)))) || > > ((temp == 0x81) && > > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > > (temp >= 0xA6) && (temp <= 0xA9))) { > > ... > > > I will pass on this change. I think compiler should be able to handle this much > optimization. It's not a question of optimized code, it's a question of trying to read what's here. The use of additional temps makes this confusing and the variable names don't help. An alternate implementation that is similar to yours in structure but preserves the "shorter code" aspect would be to rename |value| to |first_byte|, |temp| to |second_byte|, and then do this: // Check for other BiDi control characters. if ((first_byte == 0xE2) && UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &second_byte) && ((second_byte == 0x80) || (second_byte == 0x81)) { unsigned char third_byte; if (UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &third_byte) && ((second_byte == 0x80) ? ((third_byte == 0x8E) || (third_byte == 0x8F) || ((third_byte >= 0xAA) && (third_byte <= 0xAE))) : ((third_byte >= 0xA6) && (third_byte <= 0xA9)))) { ...
https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h File base/i18n/rtl.h (right): https://codereview.chromium.org/181483008/diff/40001/base/i18n/rtl.h#newcode32 base/i18n/rtl.h:32: const char16 kPopDirectionalIsolate = 0x2069; On 2014/02/27 22:08:38, Peter Kasting wrote: > On 2014/02/27 21:55:35, Anuj wrote: > > On 2014/02/27 21:37:09, Peter Kasting wrote: > > > Why are the changes in this file needed? > > > > Not needed. It just seemed like a related thing to be updated. > > We shouldn't declare constants we don't refer to. Let's only add something here > if there's a file that references it. (Similarly, if any of the existing > constants are never used, we should remove them.) Done. https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/181483008/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:186: if (value == 0xE2) { On 2014/02/27 22:08:38, Peter Kasting wrote: > On 2014/02/27 21:55:35, Anuj wrote: > > On 2014/02/27 21:37:09, Peter Kasting wrote: > > > Nit: Similarly: > > > > > > // Check for other BiDi control characters. > > > if ((value == 0xE2) && > > > UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &temp) && > > > ((temp == 0x80) && > > > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > > > ((temp == 0x8E) || (temp == 0x8F) || > > > ((temp >= 0xAA) && (temp <= 0xAE)))) || > > > ((temp == 0x81) && > > > UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &temp) && > > > (temp >= 0xA6) && (temp <= 0xA9))) { > > > ... > > > > > > I will pass on this change. I think compiler should be able to handle this > much > > optimization. > > It's not a question of optimized code, it's a question of trying to read what's > here. The use of additional temps makes this confusing and the variable names > don't help. > > An alternate implementation that is similar to yours in structure but preserves > the "shorter code" aspect would be to rename |value| to |first_byte|, |temp| to > |second_byte|, and then do this: > > // Check for other BiDi control characters. > if ((first_byte == 0xE2) && > UnescapeUnsignedCharAtIndex(escaped_text, i + 3, &second_byte) && > ((second_byte == 0x80) || (second_byte == 0x81)) { > unsigned char third_byte; > if (UnescapeUnsignedCharAtIndex(escaped_text, i + 6, &third_byte) && > ((second_byte == 0x80) ? > ((third_byte == 0x8E) || (third_byte == 0x8F) || > ((third_byte >= 0xAA) && (third_byte <= 0xAE))) : > ((third_byte >= 0xA6) && (third_byte <= 0xA9)))) { > ... I made the exact same change just as you sent this comment :)
Lgtm on the security front. Anuj, thank you very much for taking care of this, and Peter thanks for handling the detailed review.
LGTM
The CQ bit was checked by skanuj@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skanuj@chromium.org/181483008/80001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_chromium_compile_dbg
The CQ bit was checked by jschuh@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skanuj@chromium.org/181483008/80001
The CQ bit was unchecked by jschuh@chromium.org
The CQ bit was checked by jschuh@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Retried try job too often on chromium_presubmit for step(s) presubmit http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=chromium_p...
@brettw - would you mind rubber-stamping this for base? it was already reviewed?
Oops, @brett may have wrote this originally but he isn't the owner. Adding @sleevi as a TBR since this has been thoroughly reviewed already. He can yell at me later if he wants.
The CQ bit was checked by jschuh@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skanuj@chromium.org/181483008/80001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skanuj@chromium.org/181483008/80001
The CQ bit was unchecked by commit-bot@chromium.org
Retried try job too often on win_rel for step(s) unit_tests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
The CQ bit was checked by jschuh@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skanuj@chromium.org/181483008/80001
Message was sent while issue was closed.
Change committed as 254091 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
