Issue 632843003: Add a function to escape a query part of url

Issue 632843003: Add a function to escape a query part of url (Closed)

Created:
6 years, 2 months ago by Jaekyun Seok (inactive)

Modified:
6 years, 2 months ago

Reviewers:
pauljensen, asanka, willchan no longer on Chromium, Ryan Sleevi

CC:
chromium-reviews, cbentzel+watch_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Add a function to escape a query part of url GURL can contain unescaped characters in query parameters. So this function is needed to escape them in some cases. For example, a GURL should be escaped before it passes to Java layer because invalid url can't be parsed there. BUG=419257

Patch Set 1 #

Patch Set 2 : Rename the function to EscapeQueryParameters #

Total comments: 1

Patch Set 3 : Follow RFC 2396 to escape a query part of url #

Total comments: 1

Created: 6 years, 2 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+46 lines, -1 line)			Patch
M	net/base/escape.h	View	1 2	1 chunk	+5 lines, -0 lines	1 comment	Download
M	net/base/escape.cc	View	1 2	4 chunks	+23 lines, -1 line	0 comments	Download
M	net/base/escape_unittest.cc	View	1 2	1 chunk	+18 lines, -0 lines	0 comments	Download

Messages

Total messages: 23 (3 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Jaekyun Seok (inactive)

Please review this change. Actually the function added by this change is already being used ...

6 years, 2 months ago (2014-10-06 22:59:11 UTC) #2

asanka

I think this API is too big of a foot gun. Adding unescaped strings to ...

6 years, 2 months ago (2014-10-07 04:40:01 UTC) #7

Jaekyun Seok (inactive)

On 2014/10/07 04:40:01, asanka wrote: > I think this API is too big of a ...

6 years, 2 months ago (2014-10-07 05:04:02 UTC) #8

Jaekyun Seok (inactive)

FYI, my function doesn't escape whole query section after unifying pairs of unescaped key and ...

6 years, 2 months ago (2014-10-07 06:42:15 UTC) #9

Jaekyun Seok (inactive)

It seems that I named the function wrongly because it didn't escape query part, but ...

6 years, 2 months ago (2014-10-07 12:55:00 UTC) #10

willchan no longer on Chromium

Everyone is at the networking summit, so expect delays. Sorry. On Tue, Oct 7, 2014 ...

6 years, 2 months ago (2014-10-07 22:28:38 UTC) #12

asanka

Sorry about the delay and also sorry about the not lgtm. As I mentioned, this ...

6 years, 2 months ago (2014-10-08 13:42:27 UTC) #13

Jaekyun Seok (inactive)

Please see my inline comments. On 2014/10/08 13:42:27, asanka wrote: > Sorry about the delay ...

6 years, 2 months ago (2014-10-08 22:28:42 UTC) #14

Please see my inline comments.

On 2014/10/08 13:42:27, asanka wrote:
> Sorry about the delay and also sorry about the not lgtm.
> 
> As I mentioned, this is still not a valid general API. If there are unescaped
> characters in the query part or the URL, then it is no longer safe to assume
> that the parameter boundaries are valid. This maybe true for the issue you are
> trying to fix, but this change adds an API that claims to do something it
> cannot.
> 
> A couple of notes/suggestions:
> 
> * RFC 3986 does not define key=value&key=value type values and parameters. It
> defines the query part as being non-hierarchical data. The API you are using
> should only be used if the URL points to a resource that is known to only
accept
> parameters in that format. Otherwise you may be escaping delimiters that are
> needed by the app.

This is not to escape query of a general URI, but to escape query of a general
URL.
Moreover, AppendOrReplaceQueryParameter() defined in the same file also assume
key=value&key=value type values and parameters.
Actually most logic is copied from it.
So I believe that such assumption is common when handling url.

> 
> * The pipe ('|') has no semantic meaning to the URI. It is neither considered
> 'reserved' nor 'unreserved'. It should be percent encoded. This is not the API
> for doing so since such characters can appear in the path portion as well. I'm
> guessing that such characters in the path will also cause the same issue you
are
> trying to fix. This is why I think this isn't the correct fix for your issue.

In https://codereview.chromium.org/615853006/, brettw said like the followings.

"Generally it doesn't escape stuff unless it needs to. In particular for the
query section, IE is very permissive and we generally match that. | wouldn't be
allowed in some other sections."

So I don't believe that a pipe  can be included in the path of GURL.

> 
> * That said, if a URI parser breaks on seeing a pipe, then that parser should
be
> fixed.

I don't believe that I can fix javan.net.URI for this because unescaped
characters in query part aren't allowed in RFC 3986.

> 
> * If a properly escaped URI is unescaped improperly then that should also be
> fixed.

The issue isn't this case. Instead a url from a server includes unescaped
characters in query part.

Jaekyun Seok (inactive)

On 2014/10/08 13:42:27, asanka wrote: > Sorry about the delay and also sorry about the ...

6 years, 2 months ago (2014-10-08 22:55:28 UTC) #15

Jaekyun Seok (inactive)

FYI, I confirmed that GURL escaped '|' in other places automatically. For example, GURL("https://www.google.co.kr/a|b/a").spec() ==> ...

6 years, 2 months ago (2014-10-09 11:31:08 UTC) #16

Jaekyun Seok (inactive)

Asanka, do you still have a concern that my function is not a valid general ...

6 years, 2 months ago (2014-10-09 20:59:22 UTC) #17

Ryan Sleevi

I'm going to echo Asanka's not LGTM. The reasons he's provided are solid. The syntax ...

6 years, 2 months ago (2014-10-09 21:11:05 UTC) #18

Jaekyun Seok (inactive)

I don't still understand why my function isn't valid even though other ones use the ...

6 years, 2 months ago (2014-10-09 21:28:19 UTC) #19

Jaekyun Seok (inactive)

PTAL. I've uploaded a totally new patch to escape unescaped characters in a query part ...

6 years, 2 months ago (2014-10-10 04:59:25 UTC) #20

asanka

On 2014/10/09 21:28:19, Jaekyun Seok wrote: > I don't still understand why my function isn't ...

6 years, 2 months ago (2014-10-10 20:58:26 UTC) #21

On 2014/10/09 21:28:19, Jaekyun Seok wrote:
> I don't still understand why my function isn't valid even though other ones
use
> the same logic to parse or update query parameters.
>
> Then do you think our existing query parsing/updating logic is wrong?

The other functions here are used to construct query strings for URLs that will
be used with known endpoints. You are introducing an API for escaping the query
portion (or parts there-of) of arbitrary URLs from unknown sources. What we've
been trying to tell you is that the latter is not possible nor safe since you
don't know what the delimiters of the query.

If you are trying to add general spec compliant cannonicalization, then the code
you should be modifying is at /url (e.g. /url/url_cannon_query.{h,cc}). The set
of characters that are currently considered to be valid query characters are
those marked as CHAR_QUERY in this table: Your goal with this CL is to introduce
https://code.google.com/p/chromium/codesearch#chromium/src/url/url_canon_inte....
Anything that doesn't have that flag will be percent escaped when GURL parses
it. This is what you are seeing happening to the pipe character in path
components of the URL.

Our URL cannonicalization is pretty permissive. This is a quality that's shared
across browsers (e.g. FF and IE). So making it stricter would require some
amount of background work to make sure we aren't breaking things. My money's on
lots of things breaking.

Your goal (from the bug) appears to be to introduce some API for sanitizing a
URL so that it will be usable with a non-permissive parser that's not associated
with the target of the URL. The correct solution (one that minimizes collateral)
might be to introduce a separate method that will sanitize the URL for its
intended purpose. For example:
* Reject schemes that are known not to be handled.
* Strip credentials.
* Strip fragments.
* Sanitize path and/or query.

Such a method can't be in /url or /net/base since it will necessarily not be
generic.

asanka

https://codereview.chromium.org/632843003/diff/80001/net/base/escape.h File net/base/escape.h (right): https://codereview.chromium.org/632843003/diff/80001/net/base/escape.h#newcode59 net/base/escape.h:59: // the mark characters(-_.!~*'()). You are once again going ...

6 years, 2 months ago (2014-10-10 20:59:16 UTC) #22

Jaekyun Seok (inactive)

6 years, 2 months ago (2014-10-13 00:11:45 UTC) #23

I see. I will add a sanitizer for query part in a proper location.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages