|
|
DescriptionFileURLToFilePath: Don't unescape '/' and '\\'.
GURL leaves these escaped, and unescaping them in paths changes the
meaning of the path.
Added two values to the UnescapeRule enumeration:
PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS.
In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS,
in favor of one or both the two new values, and eventually remove
the value, as it's easily to use in an unsafe manner.
BUG=586657
Committed: https://crrev.com/30408ae67a9f6aea074b2883ba861613f52bd246
Cr-Commit-Position: refs/heads/master@{#377013}
Patch Set 1 #Patch Set 2 : Add test #
Total comments: 1
Patch Set 3 : Fix typo #
Total comments: 7
Patch Set 4 : Response to comments #Patch Set 5 : Add constant, fix spelling, add test, remove smewhat tangential test added earlier #
Total comments: 3
Patch Set 6 : Made comment less scary #
Messages
Total messages: 25 (10 generated)
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path (And also breaks GURL's canonicalization). BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering have GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are mapped to the same file path via FileURLToFilePath, but have different GURLs, seems like they aren't canonicalized...). BUG=586657 ==========
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering have GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are mapped to the same file path via FileURLToFilePath, but have different GURLs, seems like they aren't canonicalized...). BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering have GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are canonicalized to different file URLs by GURL, but mapped to the same file path via FileURLToFilePath, seems like that rather contradicts the definition of canonicalization). Suspect we have a lot of other issues there, though. BUG=586657 ==========
mmenke@chromium.org changed reviewers: + eroman@chromium.org
Eric: Not really sure where else to send this. Open to ideas (Both for whom I should send it to, and for how to solve the underlying bug). Suppose I could send it to brettw. https://codereview.chromium.org/1704163003/diff/20001/net/base/filename_util_... File net/base/filename_util_unittest.cc (right): https://codereview.chromium.org/1704163003/diff/20001/net/base/filename_util_... net/base/filename_util_unittest.cc:803: L"--test.html"}, This isn't exactly related to the fix. There's another use of URL_SPECIAL_CHARS a in GetFileNameFromURL, and this tests that we correctly sanitize it (There are tests here when these special characters are in the filename field already, but no tests for them when they come from URLs).
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering have GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are canonicalized to different file URLs by GURL, but mapped to the same file path via FileURLToFilePath, seems like that rather contradicts the definition of canonicalization). Suspect we have a lot of other issues there, though. BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering making GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are canonicalized to different file URLs by GURL, but mapped to the same file path via FileURLToFilePath, seems like that rather contradicts the definition of canonicalization). Suspect we have a lot of other issues there, though. BUG=586657 ==========
Eric: I'll plan to pick this back up, re-working the enum (Split URL_SPECIAL_CHARS in two, keeping URL_SPECIAL_CHARS around, but planning to remove it, after modifying all callsites to use one or both of the new values, which I will do in followup CLs), unless you want to push back more on the alternative approach of modifying GURL. Modifying GURL is kinda terrifying - the code is so interleaved, with a bunch of paths. I'm a bit concerned about doing it correctly...I still think that may be worth doing, but I'm starting to think we also want to split URL_SPECIAL_CHARS in two, since we still need to keep these characters escaped in non-file URLs, and I'm concerned about other uses of URL_SPECIAL_CHARS. Sound reasonable?
lgtm https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h#newco... net/base/escape.h:99: // that's what FireFox and IE do. Worth noting the obvious, that there are compatibility and web-security ramifications to changing this for general URLs. Apache for instance by default does not unescape these in paths: http://httpd.apache.org/docs/current/mod/core.html#allowencodedslashes Web servers may be checking for directory escaping in urls by just ../ and not the escaped variants, so if Chrome started transforming could be problematic. As I understand this comment though it is specifically for file URLs, so sounds fine. https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h#newco... net/base/escape.h:109: SPOOFING_AND_CONTROL_CHARS = 16, optional style: I find the 1 << X notation simpler to edit for bit flags. https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... File net/base/filename_util_unittest.cc (right): https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... net/base/filename_util_unittest.cc:244: {L"C:\\foo%2f..%5cbar", "file:///C:\\foo%2f..%5cbar"}, What are the unescaping rules for %2e (dot) ?
I think we have some disagreement here, and I want to reach some consensus on where we're going before I land this CL. https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h#newco... net/base/escape.h:99: // that's what FireFox and IE do. On 2016/02/18 19:37:28, eroman wrote: > Worth noting the obvious, that there are compatibility and web-security > ramifications to changing this for general URLs. > > Apache for instance by default does not unescape these in paths: > http://httpd.apache.org/docs/current/mod/core.html#allowencodedslashes > > Web servers may be checking for directory escaping in urls by just ../ and not > the escaped variants, so if Chrome started transforming could be problematic. > > As I understand this comment though it is specifically for file URLs, so sounds > fine. Actually, I'm thinking that the compatibility and web-security ramifications are more around *not* using this for URLs (Of any type), and that's where the scary comment should go instead. Unescaping slashes on GURLs or GURL components for any reason seems like something that's generally not a good idea, with the exception of Javascript URLs. We mostly unescape for either display, or picking local file names (On file or http URLs). For display, it changes the semantic meaning of a URL, so seems like a bad idea. For local files, it seems like a potential attack vector, regardless of whether we're talking file or http URLs. I think we should get rid of URL_SPECIAL_CHARS in favor of this, and a separate flag just for path characters - I've added a TODO about that, and a warning to URL_SPECIAL_CHARS. https://codereview.chromium.org/1704163003/diff/40001/net/base/escape.h#newco... net/base/escape.h:109: SPOOFING_AND_CONTROL_CHARS = 16, On 2016/02/18 19:37:28, eroman wrote: > optional style: I find the 1 << X notation simpler to edit for bit flags. SGTM, done. https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... File net/base/filename_util_unittest.cc (right): https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... net/base/filename_util_unittest.cc:244: {L"C:\\foo%2f..%5cbar", "file:///C:\\foo%2f..%5cbar"}, On 2016/02/18 19:37:28, eroman wrote: > What are the unescaping rules for %2e (dot) ? Dots are unescaped in paths, since they have no magic meaning http or file URLs (They may have a magic meaning to the OS for how to handle opening, but that's another matter). The periods removed in the other test I added are removed because of magic code to sanitize a file name for download.
https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... File net/base/filename_util_unittest.cc (right): https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... net/base/filename_util_unittest.cc:244: {L"C:\\foo%2f..%5cbar", "file:///C:\\foo%2f..%5cbar"}, On 2016/02/18 20:30:22, mmenke wrote: > On 2016/02/18 19:37:28, eroman wrote: > > What are the unescaping rules for %2e (dot) ? > > Dots are unescaped in paths, since they have no magic meaning http or file URLs > (They may have a magic meaning to the OS for how to handle opening, but that's > another matter). > > The periods removed in the other test I added are removed because of magic code > to sanitize a file name for download. But dots do have a magical meaning, which is no less magical than slashes and backslashes IMO. GURL will collapse paths during canonicalization, so for instance foo/bar/../x ---> foo/x Right? This will also occur when the dots are percent escaped, as in %2e%2e. So from a policy perspective I still don't really understand why we would choose to unescape dots, but not slashes and backslashes. Both have implications for how file paths are opened on disk. I get that this CL is trying to at least match the consistency from the point of view of GURL, and LGTM from that side. But as a larger policy I don't quite understand why special preference for slashes but not dots.
On 2016/02/18 21:17:22, eroman wrote: > https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... > File net/base/filename_util_unittest.cc (right): > > https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... > net/base/filename_util_unittest.cc:244: {L"C:\\foo%2f..%5cbar", > "file:///C:\\foo%2f..%5cbar"}, > On 2016/02/18 20:30:22, mmenke wrote: > > On 2016/02/18 19:37:28, eroman wrote: > > > What are the unescaping rules for %2e (dot) ? > > > > Dots are unescaped in paths, since they have no magic meaning http or file > URLs > > (They may have a magic meaning to the OS for how to handle opening, but that's > > another matter). > > > > The periods removed in the other test I added are removed because of magic > code > > to sanitize a file name for download. > > But dots do have a magical meaning, which is no less magical than slashes and > backslashes IMO. > > GURL will collapse paths during canonicalization, so for instance > foo/bar/../x ---> foo/x > > Right? Those are dots surrounded by slashes, so I consider that more a property of the slash than the dot. > This will also occur when the dots are percent escaped, as in %2e%2e. > > So from a policy perspective I still don't really understand why we would choose > to unescape dots, but not slashes and backslashes. "." is not a reserved character in URLs, "/" is (https://www.ietf.org/rfc/rfc2396.txt, see section 2.2), so "/" has to be kept escaped, unlike ".", or it will change the meaning of a URL. http://blah/foo%2e is a file or directory named "foo.", which is the same as "http://blah/foo.", while http://blah/foo%2f" is a file named "foo/", but that is *not* the same as http://blah/foo%2f. Also, https://blah/foo/../ and https://blah/foo/%2e%2e/ both mean the same thing as https://blah/. > Both have implications for how file paths are opened on disk. > > I get that this CL is trying to at least match the consistency from the point of > view of GURL, and LGTM from that side. But as a larger policy I don't quite > understand why special preference for slashes but not dots.
On 2016/02/18 22:11:06, mmenke wrote: > On 2016/02/18 21:17:22, eroman wrote: > > > https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... > > File net/base/filename_util_unittest.cc (right): > > > > > https://codereview.chromium.org/1704163003/diff/40001/net/base/filename_util_... > > net/base/filename_util_unittest.cc:244: {L"C:\\foo%2f..%5cbar", > > "file:///C:\\foo%2f..%5cbar"}, > > On 2016/02/18 20:30:22, mmenke wrote: > > > On 2016/02/18 19:37:28, eroman wrote: > > > > What are the unescaping rules for %2e (dot) ? > > > > > > Dots are unescaped in paths, since they have no magic meaning http or file > > URLs > > > (They may have a magic meaning to the OS for how to handle opening, but > that's > > > another matter). > > > > > > The periods removed in the other test I added are removed because of magic > > code > > > to sanitize a file name for download. > > > > But dots do have a magical meaning, which is no less magical than slashes and > > backslashes IMO. > > > > GURL will collapse paths during canonicalization, so for instance > > foo/bar/../x ---> foo/x > > > > Right? > > Those are dots surrounded by slashes, so I consider that more a property of the > slash than the dot. > > > This will also occur when the dots are percent escaped, as in %2e%2e. > > > > So from a policy perspective I still don't really understand why we would > choose > > to unescape dots, but not slashes and backslashes. > > "." is not a reserved character in URLs, "/" is > (https://www.ietf.org/rfc/rfc2396.txt, see section 2.2), so "/" has to be kept > escaped, unlike ".", or it will change the meaning of a URL. > > http://blah/foo%2e is a file or directory named "foo.", which is the same as > "http://blah/foo.", while http://blah/foo%2f" is a file named "foo/", but that > is *not* the same as http://blah/foo%2f. Not the same as http://blah/foo/, rather. > Also, https://blah/foo/../ and https://blah/foo/%2e%2e/ both mean the same thing > as https://blah/. > > > Both have implications for how file paths are opened on disk. > > > > I get that this CL is trying to at least match the consistency from the point > of > > view of GURL, and LGTM from that side. But as a larger policy I don't quite > > understand why special preference for slashes but not dots.
lgtm, but for a more pro review you would want brettw :)
On 2016/02/19 01:01:14, eroman wrote: > lgtm, but for a more pro review you would want brettw :) Will do
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. May be worth considering making GURL unescape these characters in file URLs, though that could lead to other issues. Unescaping characters that GURL doesn't escape in the first place also seems to break canonicalization (If two URLs are canonicalized to different file URLs by GURL, but mapped to the same file path via FileURLToFilePath, seems like that rather contradicts the definition of canonicalization). Suspect we have a lot of other issues there, though. BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. Added two values to the UnescapeRule enumeration: PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS, in favor of one or both the two new values, and eventually remove the value, as it's easily to use in an unsafe manner. BUG=586657 ==========
mmenke@chromium.org changed reviewers: + brettw@chromium.org
[+brettw]: PTAL https://codereview.chromium.org/1704163003/diff/80001/net/base/escape_unittes... File net/base/escape_unittest.cc (right): https://codereview.chromium.org/1704163003/diff/80001/net/base/escape_unittes... net/base/escape_unittest.cc:165: "Some%20random text %25-OK"}, Sorry for the reformatting - it was due to git cl format. Basically just replaced URL_SPECIAL_CHARS with URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS everywhere, added the above test case, and added two test cases at the end of the section.
lgtm https://codereview.chromium.org/1704163003/diff/80001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/1704163003/diff/80001/net/base/escape.h#newco... net/base/escape.h:92: // should be used rarely, and only with extreme caution. I think it would be worth mentioning here that the normal use-case for this is for URLs that don't assign meaning to a path like data URLs. Otherwise this comment seems overly discouraging of this type of use-case.
Thanks for the feedback! https://codereview.chromium.org/1704163003/diff/80001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/1704163003/diff/80001/net/base/escape.h#newco... net/base/escape.h:92: // should be used rarely, and only with extreme caution. On 2016/02/22 23:33:24, brettw wrote: > I think it would be worth mentioning here that the normal use-case for this is > for URLs that don't assign meaning to a path like data URLs. Otherwise this > comment seems overly discouraging of this type of use-case. Done
The CQ bit was checked by mmenke@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from eroman@chromium.org, brettw@chromium.org Link to the patchset: https://codereview.chromium.org/1704163003/#ps100001 (title: "Made comment less scary")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1704163003/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1704163003/100001
Message was sent while issue was closed.
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. Added two values to the UnescapeRule enumeration: PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS, in favor of one or both the two new values, and eventually remove the value, as it's easily to use in an unsafe manner. BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. Added two values to the UnescapeRule enumeration: PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS, in favor of one or both the two new values, and eventually remove the value, as it's easily to use in an unsafe manner. BUG=586657 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001)
Message was sent while issue was closed.
Description was changed from ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. Added two values to the UnescapeRule enumeration: PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS, in favor of one or both the two new values, and eventually remove the value, as it's easily to use in an unsafe manner. BUG=586657 ========== to ========== FileURLToFilePath: Don't unescape '/' and '\\'. GURL leaves these escaped, and unescaping them in paths changes the meaning of the path. Added two values to the UnescapeRule enumeration: PATH_SEPARATORS and URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. In followup CLs, I intend to replace all uses of URL_SPECIAL_CHARS, in favor of one or both the two new values, and eventually remove the value, as it's easily to use in an unsafe manner. BUG=586657 Committed: https://crrev.com/30408ae67a9f6aea074b2883ba861613f52bd246 Cr-Commit-Position: refs/heads/master@{#377013} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/30408ae67a9f6aea074b2883ba861613f52bd246 Cr-Commit-Position: refs/heads/master@{#377013} |