|
|
Chromium Code Reviews|
Created:
6 years, 2 months ago by Jaekyun Seok (inactive) Modified:
6 years, 2 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
DescriptionAdd a function to escape a query part of url
GURL can contain unescaped characters in query parameters.
So this function is needed to escape them in some cases.
For example, a GURL should be escaped before it passes to
Java layer because invalid url can't be parsed there.
BUG=419257
Patch Set 1 #Patch Set 2 : Rename the function to EscapeQueryParameters #
Total comments: 1
Patch Set 3 : Follow RFC 2396 to escape a query part of url #
Total comments: 1
Messages
Total messages: 23 (3 generated)
jaekyun@chromium.org changed reviewers: + pauljensen@chromium.org, willchan@chromium.org
Please review this change. Actually the function added by this change is already being used in chrome on android, but I want to upstream it because it doesn't depend on any other C++ code in chrome on android, and I believe that it is worth being used in other places commonly. Thanks,
jaekyun@chromium.org changed reviewers: + asanka@chromium.org
Asanka, please review this change.
jaekyun@chromium.org changed reviewers: + rsleevi@chromium.org
Ryan, please review this change.
I think this API is too big of a foot gun. Adding unescaped strings to a query string and then trying to escape them after the fact is ill advised. You have no guarantee that the key/value pairs are what is intended at that point. I understand that you are using this code to solve a narrow problem, but I don't believe this is the correct fix.
On 2014/10/07 04:40:01, asanka wrote: > I think this API is too big of a foot gun. Adding unescaped strings to a query > string and then trying to escape them after the fact is ill advised. You have no > guarantee that the key/value pairs are what is intended at that point. > > I understand that you are using this code to solve a narrow problem, but I don't > believe this is the correct fix. I don't believe that we have no guarantee that the key/value pairs are what is intended at that point because the escaped url is essentially equivalent to the original url. For example, my function converts only unescaped strings in query section like the followings. http://example.com/test?q=a|b|c ==> http://example.com/test?q=a%7Cb%7Cc http://example.com/test?q=a%7Cb%7Cc ==> http://example.com/test?q=a%7Cb%7Cc (no change) http://example.com/test?q=a|b|c&q=1|2|3 ==> http://example.com/test?q=a%7Cb%7Cc&q=1%7C2%7C3 (no missing query) http://example.com/test?q=a|b|c&q2=1|2|3 ==> http://example.com/test?q=a%7Cb%7Cc&q2=1%7C2%7C3 (no change in query order) Could you please give some examples that my function makes the original intention missed?
FYI, my function doesn't escape whole query section after unifying pairs of unescaped key and unescaped value, but instead it unifies pairs each of which is made by escaping unescaped key and unescaped value. For example, "q=a|b|c&q2=1|2|3" will be converted to "q=a%7Cb%7Cc&q2=1%7C2%7C3", not to "q%3Da%7Cb%7Cc%26q2%3D1%7C2%7C3". You can see the case in the unittest.
It seems that I named the function wrongly because it didn't escape query part, but it escapes each of query parameters. So I renamed it to "EscapeQueryParameters".
Ping?
Everyone is at the networking summit, so expect delays. Sorry. On Tue, Oct 7, 2014 at 3:26 PM, <jaekyun@chromium.org> wrote: > Ping? > > https://codereview.chromium.org/632843003/ To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Sorry about the delay and also sorry about the not lgtm.
As I mentioned, this is still not a valid general API. If there are unescaped
characters in the query part or the URL, then it is no longer safe to assume
that the parameter boundaries are valid. This maybe true for the issue you are
trying to fix, but this change adds an API that claims to do something it
cannot.
A couple of notes/suggestions:
* RFC 3986 does not define key=value&key=value type values and parameters. It
defines the query part as being non-hierarchical data. The API you are using
should only be used if the URL points to a resource that is known to only accept
parameters in that format. Otherwise you may be escaping delimiters that are
needed by the app.
* The pipe ('|') has no semantic meaning to the URI. It is neither considered
'reserved' nor 'unreserved'. It should be percent encoded. This is not the API
for doing so since such characters can appear in the path portion as well. I'm
guessing that such characters in the path will also cause the same issue you are
trying to fix. This is why I think this isn't the correct fix for your issue.
* That said, if a URI parser breaks on seeing a pipe, then that parser should be
fixed.
* If a properly escaped URI is unescaped improperly then that should also be
fixed.
Please see my inline comments.
On 2014/10/08 13:42:27, asanka wrote:
> Sorry about the delay and also sorry about the not lgtm.
>
> As I mentioned, this is still not a valid general API. If there are unescaped
> characters in the query part or the URL, then it is no longer safe to assume
> that the parameter boundaries are valid. This maybe true for the issue you are
> trying to fix, but this change adds an API that claims to do something it
> cannot.
>
> A couple of notes/suggestions:
>
> * RFC 3986 does not define key=value&key=value type values and parameters. It
> defines the query part as being non-hierarchical data. The API you are using
> should only be used if the URL points to a resource that is known to only
accept
> parameters in that format. Otherwise you may be escaping delimiters that are
> needed by the app.
This is not to escape query of a general URI, but to escape query of a general
URL.
Moreover, AppendOrReplaceQueryParameter() defined in the same file also assume
key=value&key=value type values and parameters.
Actually most logic is copied from it.
So I believe that such assumption is common when handling url.
>
> * The pipe ('|') has no semantic meaning to the URI. It is neither considered
> 'reserved' nor 'unreserved'. It should be percent encoded. This is not the API
> for doing so since such characters can appear in the path portion as well. I'm
> guessing that such characters in the path will also cause the same issue you
are
> trying to fix. This is why I think this isn't the correct fix for your issue.
In https://codereview.chromium.org/615853006/, brettw said like the followings.
"Generally it doesn't escape stuff unless it needs to. In particular for the
query section, IE is very permissive and we generally match that. | wouldn't be
allowed in some other sections."
So I don't believe that a pipe can be included in the path of GURL.
>
> * That said, if a URI parser breaks on seeing a pipe, then that parser should
be
> fixed.
I don't believe that I can fix javan.net.URI for this because unescaped
characters in query part aren't allowed in RFC 3986.
>
> * If a properly escaped URI is unescaped improperly then that should also be
> fixed.
The issue isn't this case. Instead a url from a server includes unescaped
characters in query part.
On 2014/10/08 13:42:27, asanka wrote: > Sorry about the delay and also sorry about the not lgtm. > > As I mentioned, this is still not a valid general API. If there are unescaped > characters in the query part or the URL, then it is no longer safe to assume > that the parameter boundaries are valid. This maybe true for the issue you are > trying to fix, but this change adds an API that claims to do something it > cannot. Additionally, my function escapes query parameters of GURL, which means the parameters are parsed well by GURL even though there are some unescaped characters. So I believe that GURL will be invalid for a url if its parameter boundaries aren't valid, and such url won't be passed into my function.
FYI, I confirmed that GURL escaped '|' in other places automatically.
For example,
GURL("https://www.google.co.kr/a|b/a").spec() ==>
https://www.google.co.kr/a%7Cb/a
GURL("https://www.goog|le.co.kr/ab/a").spec() ==>
https://www.goog%7Cle.co.kr/ab/a
And GURL doesn't allow '|' in scheme part like the followings.
GURL("h|ttps://www.google.co.kr/ab/a").is_valid() ==> false
Asanka, do you still have a concern that my function is not a valid general API? As I explained, my function escapes query parameters of GURL, and so I think that an output GURL will be equivalent to an input GURL as long as the input is valid. Could you please give some opposing examples if you don't agree with my thought?
I'm going to echo Asanka's not LGTM. The reasons he's provided are solid. The syntax of name=value&othername=othervalue is _not_ part of either RFC 3986 OR the successor we care about, https://url.spec.whatwg.org/ It is not good to do this. https://codereview.chromium.org/632843003/diff/20001/net/base/url_util.h File net/base/url_util.h (right): https://codereview.chromium.org/632843003/diff/20001/net/base/url_util.h#newc... net/base/url_util.h:81: // name and a value of each of query parameters are escaped like %XX%XX. Grammatically, this is not valid. "Returns a new GURL based on |url|, but with unsafe characters in the name and value of query parameters of |url| escaped" (e.g. "same to |url|" doesn't read right)
I don't still understand why my function isn't valid even though other ones use the same logic to parse or update query parameters. Then do you think our existing query parsing/updating logic is wrong?
PTAL. I've uploaded a totally new patch to escape unescaped characters in a query part of url. In this patch, I followed RFC 2396, and so I believe that now my function is a valid general API. Thanks,
On 2014/10/09 21:28:19, Jaekyun Seok wrote:
> I don't still understand why my function isn't valid even though other ones
use
> the same logic to parse or update query parameters.
>
> Then do you think our existing query parsing/updating logic is wrong?
The other functions here are used to construct query strings for URLs that will
be used with known endpoints. You are introducing an API for escaping the query
portion (or parts there-of) of arbitrary URLs from unknown sources. What we've
been trying to tell you is that the latter is not possible nor safe since you
don't know what the delimiters of the query.
If you are trying to add general spec compliant cannonicalization, then the code
you should be modifying is at /url (e.g. /url/url_cannon_query.{h,cc}). The set
of characters that are currently considered to be valid query characters are
those marked as CHAR_QUERY in this table: Your goal with this CL is to introduce
https://code.google.com/p/chromium/codesearch#chromium/src/url/url_canon_inte....
Anything that doesn't have that flag will be percent escaped when GURL parses
it. This is what you are seeing happening to the pipe character in path
components of the URL.
Our URL cannonicalization is pretty permissive. This is a quality that's shared
across browsers (e.g. FF and IE). So making it stricter would require some
amount of background work to make sure we aren't breaking things. My money's on
lots of things breaking.
Your goal (from the bug) appears to be to introduce some API for sanitizing a
URL so that it will be usable with a non-permissive parser that's not associated
with the target of the URL. The correct solution (one that minimizes collateral)
might be to introduce a separate method that will sanitize the URL for its
intended purpose. For example:
* Reject schemes that are known not to be handled.
* Strip credentials.
* Strip fragments.
* Sanitize path and/or query.
Such a method can't be in /url or /net/base since it will necessarily not be
generic.
https://codereview.chromium.org/632843003/diff/80001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/632843003/diff/80001/net/base/escape.h#newcode59 net/base/escape.h:59: // the mark characters(-_.!~*'()). You are once again going to be bitten by https://url.spec.whatwg.org/#query-state which follows a slightly different algorithm that doesn't require escaping '|'.
I see. I will add a sanitizer for query part in a proper location. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
