|
|
DescriptionUnescape BiDi control chars while parsing data: urls
The fix for bug 337746 prevented unescaping of BiDi control
characters in URLs. This breaks the loading of data: URLs
because BiDi control chars appear escaped in the loaded HTML.
This patch adds a special case for the parsing of BiDi control
chars. This shouldn't change the way URLs are shown in the omnibox
or any other UI. URLs with BiDi control characters are always
displayed as escaped in the omnibox. This behavior is also
consistent with Firefox.
BUG=423901
Committed: https://crrev.com/c62e2eeb225afbc123db27c2d3500a7bbbab6a48
Cr-Commit-Position: refs/heads/master@{#300584}
Patch Set 1 #Patch Set 2 : Fix build #Patch Set 3 : Add a browsertest #
Total comments: 22
Patch Set 4 : mmenke and tsepez comments #Patch Set 5 : Fix test comment #
Total comments: 4
Patch Set 6 : Inline IsLastTwoBytesofThreeByteBidiControlChar #
Total comments: 6
Patch Set 7 : Remove BIDI_CONTROL_CHARS flag #
Total comments: 2
Patch Set 8 : Fix comment and rebase #
Messages
Total messages: 29 (6 generated)
meacer@chromium.org changed reviewers: + mmenke@chromium.org
Matt, since you commented on the bug, do you mind taking a look? :) Thanks!
On 2014/10/16 23:42:55, Mustafa Emre Acer wrote: > Matt, since you commented on the bug, do you mind taking a look? :) Thanks! May not get to this until Monday. I'm happy to do an initial review, but I think we should both have a security reviewer and someone who knows URLs better than I (A src/URL owner) take a pass as well, to make sure this is reasonable. I admit to being paranoid, particularly since this is related to a decision made for security reasons.
On 2014/10/16 23:55:32, mmenke wrote: > On 2014/10/16 23:42:55, Mustafa Emre Acer wrote: > > Matt, since you commented on the bug, do you mind taking a look? :) Thanks! > > May not get to this until Monday. Sure, no worries. > > I'm happy to do an initial review, but I think we should both have a security > reviewer and someone who knows URLs better than I (A src/URL owner) take a pass > as well, to make sure this is reasonable. I admit to being paranoid, > particularly since this is related to a decision made for security reasons. I'll add two more reviewers.
meacer@chromium.org changed reviewers: + brettw@chromium.org, tsepez@chromium.org
brettw for url related parts tsepez for security sanity check Could you please take a look?
This looks pretty reasonable - hadn't realized you were just going to do this when returning the body for the data URL, as opposed to changing stuff at the GURL level - this approach alleviates my concerns, but the extra scrutiny can't hurt. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... File chrome/browser/ui/browser_navigator_browsertest.cc (right): https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1382: IN_PROC_BROWSER_TEST_F(BrowserNavigatorTest, Not sure if this is the best browsertest file for this test, but otherwise, tests looks reasonable. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1385: ASSERT_TRUE(test_server()->Start()); Not needed https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1390: GURL data_url("data:text/html;charset=utf-8, <html><title>" + title + Is the space before HTML needed? https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1393: // have the title with the escaped RTL mark. If we're expect something not to be escape, shouldn't we store it in a string, and compare it to p.target_contents->GetURL().spec()? https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1399: chrome::NavigateParams p(MakeNavigateParams()); This name violates the google style guide. Suggest either params or navigate_params. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1411: nit: Remove extra line break. https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... File net/base/data_url_unittest.cc (right): https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... net/base/data_url_unittest.cc:191: "data:text/html;charset=utf-8,<b>\xE2\x80\x8Ftest</b>", Any reason not to just use text/plain? Don't think using HTML gets us a whole lot, not that it really hurts, either. https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... net/base/data_url_unittest.cc:208: "data:text/html;charset=utf-8,<b>%E2%80%8Ftest</b>", When this is wrapped in a GURL, this becomes the same thing as the first test, right? Fine with testing it both ways, just want to make sure I'm understanding. If so, think it's worth noting in a comment.
Nothing but nits. But be sure to get buy-in from Brett, he's brought up some subtleties in the past when I've been playing with URLs. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:176: // However, not unescaping these characters in data urls result in Nit: "Not unescaping" is a double negative. Maybe just say "escaping". Also prefer "data:" rather than "data". The comment also ties a specific notion (data urls) to a generic mechanism (BIDI_CONTROL_CHARACTER flag). The danger is that if we change the places that call this, then the comment becomes incorrect. I'm really not sure what to say here, other than there might need to be a comment about "DO NOT use BIDI_CONTROL_CHARS flag without talking to a security person" somewhere. Feel free to ignore all this. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:182: // Check for ALM. Nit: expand ALM to Arabic Letter Mark. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:183: if ((first_byte == 0xD8) && Nit: It took me longer to understand this code than I would have liked. Perhaps a pair of inline bools IsTwoByteBiDiControlChar() and IsThreeByteBidiControlChar() might make this clearer, as that gets you away from a single logical expression using ? && || etc. and allows for some block structure, early returns, etc.
https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... File chrome/browser/ui/browser_navigator_browsertest.cc (right): https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1382: IN_PROC_BROWSER_TEST_F(BrowserNavigatorTest, On 2014/10/17 15:24:43, mmenke wrote: > Not sure if this is the best browsertest file for this test, but otherwise, > tests looks reasonable. I looked at other browsertests but couldn't find anything data: url related. This seemed to be the best place. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1385: ASSERT_TRUE(test_server()->Start()); On 2014/10/17 15:24:43, mmenke wrote: > Not needed Copy/paste artifact from previous test case. Removed. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1390: GURL data_url("data:text/html;charset=utf-8, <html><title>" + title + On 2014/10/17 15:24:43, mmenke wrote: > Is the space before HTML needed? Removed. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1393: // have the title with the escaped RTL mark. On 2014/10/17 15:24:43, mmenke wrote: > If we're expect something not to be escape, shouldn't we store it in a string, > and compare it to p.target_contents->GetURL().spec()? Done. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1399: chrome::NavigateParams p(MakeNavigateParams()); On 2014/10/17 15:24:43, mmenke wrote: > This name violates the google style guide. Suggest either params or > navigate_params. Another copy paste artifact. Fixed other test cases too. https://codereview.chromium.org/643963004/diff/40001/chrome/browser/ui/browse... chrome/browser/ui/browser_navigator_browsertest.cc:1411: On 2014/10/17 15:24:43, mmenke wrote: > nit: Remove extra line break. Done. https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... File net/base/data_url_unittest.cc (right): https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... net/base/data_url_unittest.cc:191: "data:text/html;charset=utf-8,<b>\xE2\x80\x8Ftest</b>", On 2014/10/17 15:24:43, mmenke wrote: > Any reason not to just use text/plain? Don't think using HTML gets us a whole > lot, not that it really hurts, either. Done. https://codereview.chromium.org/643963004/diff/40001/net/base/data_url_unitte... net/base/data_url_unittest.cc:208: "data:text/html;charset=utf-8,<b>%E2%80%8Ftest</b>", On 2014/10/17 15:24:44, mmenke wrote: > When this is wrapped in a GURL, this becomes the same thing as the first test, > right? Fine with testing it both ways, just want to make sure I'm > understanding. If so, think it's worth noting in a comment. Added a note. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:176: // However, not unescaping these characters in data urls result in On 2014/10/17 17:08:48, Tom Sepez wrote: > Nit: "Not unescaping" is a double negative. Maybe just say "escaping". Also > prefer "data:" rather than "data". > > The comment also ties a specific notion (data urls) to a generic mechanism > (BIDI_CONTROL_CHARACTER flag). The danger is that if we change the places that > call this, then the comment becomes incorrect. This is true. I wanted to test if the actual url being parsed is a data: url but unfortunately there is no way to know about it at this point in the code. We'll have to rely on callers to do the right thing. > > I'm really not sure what to say here, other than there might need to be a > comment about "DO NOT use BIDI_CONTROL_CHARS flag without talking to a security > person" somewhere. There was a comment to that effect in escape.h. Making it clearer and adding it here as well. > > Feel free to ignore all this. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:182: // Check for ALM. On 2014/10/17 17:08:49, Tom Sepez wrote: > Nit: expand ALM to Arabic Letter Mark. Done. https://codereview.chromium.org/643963004/diff/40001/net/base/escape.cc#newco... net/base/escape.cc:183: if ((first_byte == 0xD8) && On 2014/10/17 17:08:49, Tom Sepez wrote: > Nit: It took me longer to understand this code than I would have liked. Perhaps > a pair of inline bools IsTwoByteBiDiControlChar() and > IsThreeByteBidiControlChar() might make this clearer, as that gets you away from > a single logical expression using ? && || etc. and allows for some block > structure, early returns, etc. Pulled these into methods (with somewhat questionable method names).
lgtm
https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc#newco... net/base/escape.cc:140: unsigned char third_byte) { This function name is a lie. Its actually IsLastTwoBytesofThreeByteBidiControlCharIfSecondByteIs0x80Or0x81. Given that, I think it's better to just inline it. https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc#newco... net/base/escape.cc:145: } DCHECK_EQ(0x81, second_byte)? If this is a separate function, as you have it, it's a sanity check, if you inline the function as I suggest above, it's more of a reminder that second_byte != 0x80 implies second_byte == 0x81
https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc#newco... net/base/escape.cc:140: unsigned char third_byte) { On 2014/10/20 15:46:40, mmenke wrote: > This function name is a lie. Its actually > IsLastTwoBytesofThreeByteBidiControlCharIfSecondByteIs0x80Or0x81. > > Given that, I think it's better to just inline it. Sounds reasonable, inlined. https://codereview.chromium.org/643963004/diff/80001/net/base/escape.cc#newco... net/base/escape.cc:145: } On 2014/10/20 15:46:40, mmenke wrote: > DCHECK_EQ(0x81, second_byte)? If this is a separate function, as you have it, > it's a sanity check, if you inline the function as I suggest above, it's more of > a reminder that second_byte != 0x80 implies second_byte == 0x81 Done.
LGTM, but should wait for brettw to weigh in.
https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc#newc... net/base/escape.cc:214: // However, escaping these characters in data: urls result in This sentence doesn't make sense to me. Here, we're unescaping characters but this comment talks about escaping them (since they're not ASCII, they will always be escaped when converted to a data URL). The old code was avoiding unescaping them for security reasons. The new code unescapes them sometimes when converting to the actual data, but not for display purposes. The comment should make this clear. https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc#newc... net/base/escape.cc:215: // escaped BiDi control characters being displayed in the rendered html, The use of the word "render" in this patch (both here and in the description) is confusing to me. I think of rendering a URL as printing it to the user, as in the omnibox. Maybe we can use the phrase "load the data URL" or something. To blink, data URLs are loaded like any other URL. https://codereview.chromium.org/643963004/diff/100001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/643963004/diff/100001/net/base/escape.h#newco... net/base/escape.h:92: CONTROL_CHARS = 8, I wonder if adding the new flag is necessary? Can we just call these bidi things control characters? It seems there are only two cases: you want to display the URL in some safe way (no control chars, no bidi), or you want to get the binary data out of the URL and don't care what's in there (you want everything). Also consider that many data URLs will be binary data like images. There might be bidi chars in there by random chance.
https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc File net/base/escape.cc (right): https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc#newc... net/base/escape.cc:214: // However, escaping these characters in data: urls result in On 2014/10/20 18:04:42, brettw wrote: > This sentence doesn't make sense to me. Here, we're unescaping characters but > this comment talks about escaping them (since they're not ASCII, they will > always be escaped when converted to a data URL). The old code was avoiding > unescaping them for security reasons. The new code unescapes them sometimes when > converting to the actual data, but not for display purposes. The comment should > make this clear. Done. https://codereview.chromium.org/643963004/diff/100001/net/base/escape.cc#newc... net/base/escape.cc:215: // escaped BiDi control characters being displayed in the rendered html, On 2014/10/20 18:04:42, brettw wrote: > The use of the word "render" in this patch (both here and in the description) is > confusing to me. I think of rendering a URL as printing it to the user, as in > the omnibox. Maybe we can use the phrase "load the data URL" or something. To > blink, data URLs are loaded like any other URL. Edited CL description to make it clear. https://codereview.chromium.org/643963004/diff/100001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/643963004/diff/100001/net/base/escape.h#newco... net/base/escape.h:92: CONTROL_CHARS = 8, On 2014/10/20 18:04:42, brettw wrote: > I wonder if adding the new flag is necessary? Can we just call these bidi things > control characters? It seems there are only two cases: you want to display the > URL in some safe way (no control chars, no bidi), or you want to get the binary > data out of the URL and don't care what's in there (you want everything). Also > consider that many data URLs will be binary data like images. There might be > bidi chars in there by random chance. Good point. I removed the new flag and made unescaping Bidi control chars to be controlled by CONTROL_CHARS flag. Matt's question about SafeBrowsing still remains though.
LGTM, I didn't check the low-level logic.
On 2014/10/21 06:11:16, brettw wrote: > LGTM, I didn't check the low-level logic. I did validate it wasn't changed (Which the exception of the added flag), and it still LGTM.
meacer@chromium.org changed reviewers: + msw@chromium.org
msw, can you please OWNERS review chrome/browser/ui/browser_navigator_browsertest.cc?
c/b/ui lgtm with a comment nit. https://codereview.chromium.org/643963004/diff/120001/chrome/browser/ui/brows... File chrome/browser/ui/browser_navigator_browsertest.cc (right): https://codereview.chromium.org/643963004/diff/120001/chrome/browser/ui/brows... chrome/browser/ui/browser_navigator_browsertest.cc:1383: // escaped in the URL but they should be unescaped in the rendered page. nit: this test doesn't seem to check anything about the rendered page...
Thanks all. https://codereview.chromium.org/643963004/diff/120001/chrome/browser/ui/brows... File chrome/browser/ui/browser_navigator_browsertest.cc (right): https://codereview.chromium.org/643963004/diff/120001/chrome/browser/ui/brows... chrome/browser/ui/browser_navigator_browsertest.cc:1383: // escaped in the URL but they should be unescaped in the rendered page. On 2014/10/21 19:19:24, msw wrote: > nit: this test doesn't seem to check anything about the rendered page... Changed to "loaded HTML".
The CQ bit was checked by meacer@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/643963004/140001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_swarming on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by meacer@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/643963004/140001
Message was sent while issue was closed.
Committed patchset #8 (id:140001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/c62e2eeb225afbc123db27c2d3500a7bbbab6a48 Cr-Commit-Position: refs/heads/master@{#300584} |