|
|
Created:
5 years, 3 months ago by Peter Kasting Modified:
5 years, 3 months ago Reviewers:
brettw CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionCorrectly handle problematic nested escapes in URL paths.
Specifically, if unescaping in the input leads to the output URL
containing a new escaped sequence, e.g. converting the input "%%30%30" to "%00", escape the leading '%' as "%25" to ensure the output sequence is not treated as a new valid escape sequence.
This ensures that canonicalizing the same URL a second time won't make changes
to it, which is important for avoiding crashes and other bugs in a variety of
places in both debug and release builds.
BUG=533361
TEST=Visit http://andrisatteka.blogspot.com/2015/09/a-simple-string-to-crash-google-chrome.html , hover the link there, Chrome should not crash.
Committed: https://crrev.com/6982036dc15ba85cdbf7339265f030ff7582ff3b
Cr-Commit-Position: refs/heads/master@{#350086}
Patch Set 1 #Patch Set 2 : New behavior #
Total comments: 8
Patch Set 3 : Review comments #
Total comments: 2
Messages
Total messages: 23 (6 generated)
pkasting@chromium.org changed reviewers: + brettw@chromium.org
I'm not sure whether this is the most performant or elegant way to implement this, but it seems to work, and I think the concept makes sense. See also comments on bug.
I only looked at this briefly, will think about it in more detail tomorrow morning. I think it's likely this is the direction we want to go in. Does it make sense to move the % handling in that function to a helper (it's getting kind of out of control) or is there too much local context to make that look clean?
I only looked at this briefly, will think about it in more detail tomorrow morning. I think it's likely this is the direction we want to go in. Does it make sense to move the % handling in that function to a helper (it's getting kind of out of control) or is there too much local context to make that look clean?
On 2015/09/21 05:10:33, brettw wrote: > I only looked at this briefly, will think about it in more detail tomorrow > morning. I think it's likely this is the direction we want to go in. > > Does it make sense to move the % handling in that function to a helper (it's > getting kind of out of control) or is there too much local context to make that > look clean? It could maybe be moved to a helper. Haven't thought about it. As I was looking at the function more, I see that it already detects this sequence as invalid -- look in the old code at what we'd do if DecodeEscaped() failed. The comment there talks about "IE7 rejects this, IE6 etc. pass it through unchanged" and suggests that if we have problems we should consider more aggressively rejecting these URLs. That would be another, faster and simpler solution to this. Yet another solution, as suggested on the bug in comment 37, would be to detect that we'd unescaped to a sequence that was itself another escape sequence, and instead of rejecting the unescaping, replace the initial '%' in this latter sequence with an escaped '%' ("%25"). This would arguably represent the original URL "the best" of any solution here. Think carefully among these three options -- it's not obvious to me what the best answer is. In the limit I suppose we could get Anne's opinion, since he's probably done the most thinking about URLs in the last year or two.
New snap up with agreed-upon behavior, PTAL. I went ahead and added the variable to remember the last problematic '%' to minimize the performance impact during heavy unescaping. This wasn't hard and ended up making a few places a bit nicer. I also moved the complicated bits I'm adding to a new helper function for readability.
Still double-checking the rest... https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc File url/url_canon_path.cc (right): https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc#n... url/url_canon_path.cc:211: if ((next_input_index == input_len) || !Is8BitChar(spec[next_input_index])) I think the Is8BitChar can actually be a 7-bit test. In DoPartialPath below it does "(sizeof(CHAR) > sizeof(char) && uch >= 0x80)" which I think can be the check here as well. https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.cc File url/url_canon_unittest.cc (right): https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.... url/url_canon_unittest.cc:1003: #if 0 Left by mistake? https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.... url/url_canon_unittest.cc:1066: {"/%0%30", L"/%0%30", "/%2500", Component(0, 6), true}, Can you make these cases use different characters for each possible character rather than "%30" and "0", to prove we're not doing something silly duplicating one of them to get the output?
lgtm https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc File url/url_canon_path.cc (right): https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc#n... url/url_canon_path.cc:199: const char last_unescaped_char = output->at(length - 1); Can you rename this to last_appended_char or something? It may or may not have just been unescaped, so I found this name initially confusing.
Take one more look, in particular at my answer to your question about |last_unescaped_char|. https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc File url/url_canon_path.cc (right): https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc#n... url/url_canon_path.cc:199: const char last_unescaped_char = output->at(length - 1); On 2015/09/21 22:37:06, brettw wrote: > Can you rename this to last_appended_char or something? It may or may not have > just been unescaped, so I found this name initially confusing. This character is guaranteed to have just been unescaped. I'm not sure when you were thinking it wasn't? (|last_appended_character| would not only be less accurate, it would be very confusing in the |append_next_char| case when we append a new, non-unescaped character after this.) https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_path.cc#n... url/url_canon_path.cc:211: if ((next_input_index == input_len) || !Is8BitChar(spec[next_input_index])) On 2015/09/21 22:28:34, brettw wrote: > I think the Is8BitChar can actually be a 7-bit test. In DoPartialPath below it > does "(sizeof(CHAR) > sizeof(char) && uch >= 0x80)" which I think can be the > check here as well. We can actually do better than that: just check for the character being >= 0x80 without looking at sizeof(CHAR), as it's also safe to reject non-7-bit characters in the 8-bit input case. Also changed "sizeof(char)" to "1" (guaranteed by the C standard) below while I was looking at this. https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.cc File url/url_canon_unittest.cc (right): https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.... url/url_canon_unittest.cc:1003: #if 0 On 2015/09/21 22:28:34, brettw wrote: > Left by mistake? Yeah, oops :( https://codereview.chromium.org/1358433004/diff/20001/url/url_canon_unittest.... url/url_canon_unittest.cc:1066: {"/%0%30", L"/%0%30", "/%2500", Component(0, 6), true}, On 2015/09/21 22:28:34, brettw wrote: > Can you make these cases use different characters for each possible character > rather than "%30" and "0", to prove we're not doing something silly duplicating > one of them to get the output? Done.
lgtm https://codereview.chromium.org/1358433004/diff/40001/url/url_canon_path.cc File url/url_canon_path.cc (right): https://codereview.chromium.org/1358433004/diff/40001/url/url_canon_path.cc#n... url/url_canon_path.cc:199: const char last_unescaped_char = output->at(length - 1); I agree my suggestion is also confusing. I don't actually care that much, so if you don't see an obvious better suggestion, it's fine with me. I was thinking of the case where the input is "%z". The "z" itself was not unescaped, which I initially thought this was referring. https://codereview.chromium.org/1358433004/diff/40001/url/url_canon_path.cc#n... url/url_canon_path.cc:211: if ((next_input_index == input_len) || (spec[next_input_index] >= 0x80)) If this compiles, it's fine, but I was concerned you might get a warning from some systems about a condition always being true when CHAR == signed char.
On 2015/09/21 23:25:19, brettw wrote: > I agree my suggestion is also confusing. I don't actually care that much, so if > you don't see an obvious better suggestion, it's fine with me. I was thinking of > the case where the input is "%z". The "z" itself was not unescaped, which I > initially thought this was referring. I misunderstood the code, the current name seems fine.
The CQ bit was checked by pkasting@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1358433004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1358433004/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_chromium_rel_ng on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...) win8_chromium_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win8_chromium_ng/...)
The CQ bit was checked by pkasting@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1358433004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1358433004/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_chromium_rel_ng on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by pkasting@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1358433004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1358433004/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/6982036dc15ba85cdbf7339265f030ff7582ff3b Cr-Commit-Position: refs/heads/master@{#350086} |