|
|
Created:
5 years, 2 months ago by Lasse Reichstein Nielsen Modified:
4 years ago CC:
reviews_dartlang.org, nweiz Base URL:
https://github.com/dart-lang/sdk.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionAdd data-URI support class to dart:core (next to Uri).
R=floitsch@google.com, fschneider@google.com
Committed: https://github.com/dart-lang/sdk/commit/bbc66c2c41e61d82d6b85fdd16e2b0f1204c2a33
Committed: https://github.com/dart-lang/sdk/commit/ed0b187d583493432b10aaaac2844f05f4bcce3e
Patch Set 1 #Patch Set 2 : Update CHANGELOG.md #
Total comments: 60
Patch Set 3 : Moving to new dart:uri library - NOT WORKING! #Patch Set 4 : Closer to working. #Patch Set 5 : Added library now working (thanks fschneider). Back to the main content. #
Total comments: 2
Patch Set 6 : Address comments. #Patch Set 7 : Add more tests, refactor, rename to DataUriHelper. Need better name! #
Total comments: 2
Patch Set 8 : Moving back to dart:core, rename to UriData. #
Total comments: 12
Patch Set 9 : Address comments. #Patch Set 10 : Fix tests - dart2js doesn't do real bit-ops, jsshell broken on windows. #
Total comments: 3
Patch Set 11 : fix test expectations #Patch Set 12 : Remove VM assertion that appears to be incorrect. #
Total comments: 3
Messages
Total messages: 31 (9 generated)
lrn@google.com changed reviewers: + floitsch@google.com, sgjesse@google.com
kevmoo@google.com changed reviewers: + kevmoo@google.com
Please add corresponding updates to the CHANGELOG :-)
Are we going to run with this? Natalie just sent a CL to add it to http_parser: https://codereview.chromium.org/1390353008/
I'm moving Uri and DataUri to a new dart:uri library (with core exporting Uri as normal), but apart from that, it is my current plan. Ping everybody else.
nweiz@google.com changed reviewers: + nweiz@google.com
DBC https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2633: class DataUri { It seems strange that this isn't a subclass of Uri. It certainly satisfies the "is a" criterion, and I'd expect to be able to pass a data URI anywhere I could pass a normal URI. Having to explicitly convert back and forth sounds awkward. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2676: factory DataUri.fromString(String content, Why can't you base64-encode text, or percent-encode binary data? If my text contains a lot of non-URL-safe characters (particularly non-ASCII characters) that would need to be encoded anyway, base64 may well end up being more compact. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2678: Iterable<DataUriParameter> parameters}) { Right now, the encoding is implicitly always UTF-8, but the corresponding charset parameter isn't automatically included. Ideally, the user would be able to specify an encoding and a charset parameter would be added based on that. Otherwise, this should add charset=UTF-8, and the documentation should be explicit about what encoding is used. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2684: buffer.write(Uri.encodeComponent(content)); URI.encodeComponent doesn't encode a number of reserved characters ("!~*'()"). I assume most parsers will do the right thing anyway, but this is a place where the implementation diverges from the spec. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2714: factory DataUri.fromUri(Uri uri) { This should document when it will throw FormatExceptions, especially since some syntax errors are detected eagerly and some are not. Same goes for parse(). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2730: } According to https://simonsapin.github.io/data-urls/ (which is referenced by browser vendors and so is at least somewhat authoritative), the query should be included in the parsing algorithm and the fragment should be ignored. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2749: identical(mimeType, "text/plain") || Consider omitting the text/plain mime type, since it's the default anyway. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2915: encoding: LATIN1)); Why does this assume a LATIN1 encoding? It should be based on the parameters, and per spec it should default to ASCII. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2983: String contentAsString({Encoding encoding: UTF8}) { The encoding should be taken from the URI's parameters, at least by default. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3000: * A data URI may contain parameters between the the MIMI type and the Nit: "MIMI" -> "MIME" https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3008: var end = _separatorIndices[i]; It looks like this will incorrectly accept invalid URIs that mix up semicolons, commas, and maybe equals as well. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3010: String value = Uri._uriDecode(_text, start: equals + 1, end: end); What about whitespace? If the spec is interpreted like other similar specs, there should be implicit whitespace between all tokens which will end up encoded as percent-escapes, and thus find its way into the slices of text this uses. If we want to match browser behavior (which we should), we ought to strip this whitespace. Also, mime types and parameters that are whitespace-only should be disallowed. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3111: class DataUriParameter { Why isn't this just a map? Maps are much easier to use than custom data types. https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... File tests/corelib/data_uri_test.dart (right): https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:20: } Also test: * Percent-encoding non-token characters in the media type and its parameters. * Non-ASCII media type and parameters. * Valid whitespace around various tokens. * Invalid pure-whitespace tokens. https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:83: bytes[i] = i; Won't the latter half of this list be outside the range of Uint8? Shouldn't that throw an error? https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:124: Expect.equals("data:;base64,", uri.text); Why are these in testErrors? https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:136: Expect.listEquals([0], uri.contentAsBytes()); Also these?
Good comments, not through them yet. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2676: factory DataUri.fromString(String content, You can't base-64 encode text - base64 encoding only works on bytes and strings are not bytes. I could add an implicit UTF-8 encoding of the text, but I prefer to keep this simple and explicit. If you need to UTF-8 encode, you choose to do that. The other direction: bytes to percent-escaped is possible, but rarely useful (unless you happen to know that your bytes contain a lot of ASCII characters that won't be escaped, and in that case I don't mind asking you to create a string from them). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2678: Iterable<DataUriParameter> parameters}) { Good point. The default, if nothing is written, is charset=US-ASCII which won't be correct here. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2684: buffer.write(Uri.encodeComponent(content)); ACK. The syntax for parameter keys an values are RFC 2045 tokens which do not contain percent encodings (they may contain percent, but it wouldn't mean it's encoded). We should not encode here, just validate, and do it against the _tokenCharTable, and fail on any invalid character. If you need to encode something, you should do it explicitly before calling here. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2714: factory DataUri.fromUri(Uri uri) { Good point. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2730: } True - '?' is a valid uric, so the query should be included. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2749: identical(mimeType, "text/plain") || Good idea. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2915: encoding: LATIN1)); This function is not creating text, only bytes, so "%c4" should evaluate to the byte 0xc4. This is really a hack to treat escapes as the bytes they represent, which is what LATIN-1 encoding does. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2983: String contentAsString({Encoding encoding: UTF8}) { I really, really don't want to parse the "charset" header. It can take a lot of values that we won't be able to satisfy anyway. I prefer to leave the parsing of parameters to the user who can then err appropriately if the parameters are not understandable. On the other hand, we could promise to understand any of the names accepted by Encoding.getByName (basically ASCII, LATIN-1 and UTF-8 by a number of names), and if we can't understand the charset parameter, we can fall back on LATIN-1 (because LATIN-1 decoding can't fail - all byte sequences are valid - so we just give you the bytes). If we do that, we should also have a getter that gives you the encoding that would be used (maybe "Encoding get charsetEncoding"). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3000: * A data URI may contain parameters between the the MIMI type and the :) https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3008: var end = _separatorIndices[i]; The parser should avoid that. This assumes that the separator indices are correct and provided by the parser. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3010: String value = Uri._uriDecode(_text, start: equals + 1, end: end); The URI grammars don't generally allow space characters. There is no implicit whitespace between the tokens - spaces are simply not allowed. If you include a space character, it needs to be escaped, but then it will also be treated as meaningful - escaped characters are included literally. Are you sure browsers allow whitespace in data URIs? I don't want to validate the parameters (I have no idea what they mean anyway), so as long as they are syntactically correct, it should be fine. Space is not allowed in RFC 2045 token (which parameter key and value must be). Percent is allowed, but probably shouldn't mean percent-encoding, so I guess we should just check each string against token chars and fail if they are invalid. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3111: class DataUriParameter { Because the same parameter name may occur more than once. Maps are not suitable for that (HTTP headers have have that problem).
https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2676: factory DataUri.fromString(String content, On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > You can't base-64 encode text - base64 encoding only works on bytes and strings > are not bytes. > > I could add an implicit UTF-8 encoding of the text, but I prefer to keep this > simple and explicit. If you need to UTF-8 encode, you choose to do that. I was thinking something parallel to File.writeAsString and similar APIs: it takes a string as well as an encoding parameter that defaults to UTF-8. > The other direction: bytes to percent-escaped is possible, but rarely useful > (unless you happen to know that your bytes contain a lot of ASCII characters > that won't be escaped, and in that case I don't mind asking you to create a > string from them). Making users convert to a string won't work in general; Dart's UTF-16 strings can't fully represent binary data in a way that will be correctly translated to UTF-8 (or any other encoding). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2684: buffer.write(Uri.encodeComponent(content)); On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > ACK. The syntax for parameter keys an values are RFC 2045 tokens which do not > contain percent encodings (they may contain percent, but it wouldn't mean it's > encoded). > We should not encode here, just validate, and do it against the _tokenCharTable, > and fail on any invalid character. > If you need to encode something, you should do it explicitly before calling > here. You should certainly percent-encode *here*, because it's the data. But the data URI spec says you should also percent-encode parameters: "parameter values should use the URL Escaped encoding instead of quoted string if the parameter values contain any 'tspecial'.". https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2915: encoding: LATIN1)); On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > This function is not creating text, only bytes, so "%c4" should evaluate to the > byte 0xc4. This is really a hack to treat escapes as the bytes they represent, > which is what LATIN-1 encoding does. It would be good to document that in a comment. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2983: String contentAsString({Encoding encoding: UTF8}) { On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > I really, really don't want to parse the "charset" header. > It can take a lot of values that we won't be able to satisfy anyway. > I prefer to leave the parsing of parameters to the user who can then err > appropriately if the parameters are not understandable. > > On the other hand, we could promise to understand any of the names accepted by > Encoding.getByName (basically ASCII, LATIN-1 and UTF-8 by a number of names), > and if we can't understand the charset parameter, we can fall back on LATIN-1 > (because LATIN-1 decoding can't fail - all byte sequences are valid - so we just > give you the bytes). > > If we do that, we should also have a getter that gives you the encoding that > would be used (maybe "Encoding get charsetEncoding"). I think never failing by default is less useful than generally doing the right thing by default. I'd say: * If the user passes in an encoding, use that. * Otherwise, if the charset declares a recognized encoding, use that. * Otherwise, fail with a useful message. If the user wants to be sure that they absolutely never fail to decode, they can still do so, but the default is to provide a good error message in cases where we're reasonably confident that a correct decoding isn't possible using dart:convert. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3008: var end = _separatorIndices[i]; On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > The parser should avoid that. This assumes that the separator indices are > correct and provided by the parser. That might be true. It's a good thing to write tests for, though. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3010: String value = Uri._uriDecode(_text, start: equals + 1, end: end); On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > The URI grammars don't generally allow space characters. There is no implicit > whitespace between the tokens - spaces are simply not allowed. If you include a > space character, it needs to be escaped, but then it will also be treated as > meaningful - escaped characters are included literally. > > Are you sure browsers allow whitespace in data URIs? Chrome accepts literal spaces in data URIs, but not "%20". I suppose because we don't support literal spaces in Uri objects anyway we don't need to worry about it. > I don't want to validate the parameters (I have no idea what they mean anyway), > so as long as they are syntactically correct, it should be fine. Space is not > allowed in RFC 2045 token (which parameter key and value must be). Percent is > allowed, but probably shouldn't mean percent-encoding, so I guess we should just > check each string against token chars and fail if they are invalid. The data URI spec says that percent-encoding can be used to escape non-token chars in parameter values. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3111: class DataUriParameter { On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > Because the same parameter name may occur more than once. > Maps are not suitable for that (HTTP headers have have that problem). Can they? The MIME spec isn't explicit about this, but it seems to imply that it's possible to find a single canonical value of a parameter with a given name. Do you know of content types that require the use of multiple parameters with the same name?
lrn@google.com changed reviewers: + iposva@google.com
@iposva: Every changed file except uri.dart is only about creating a new dart:uri library. Please check the VM parts. @loitsch, @nweiz: PTAL again. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2633: class DataUri { The Uri class is designed only for hierarchial URIs and data: URIs are not hierarchial, so it is not really an "is-a" relation. They both relate to the generic URI concept, but they are different and have widely different meanings. If you have a URI string with a "data:" scheme, you can create a DataUri object to access its parts, just as you can create a Uri object for an "http:" schemed URI. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2676: factory DataUri.fromString(String content, True. We probably should have an encoding here: The "content" string is converted to bytes, and there is no reason to fix that to UTF-8. (Well, except that we have to write the code to do the encoding since the functions in Uri only works on strings). The content of a data: URI is always a sequence of bytes, so encoding information is important - so we should use it and store it. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2678: Iterable<DataUriParameter> parameters}) { I've changed this to add an "Encoding charset" parameter which is used to convert the string to bytes (and which may throw if the string isn't compatible). It defaults to US-ASCII if you don't specify anything. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2684: buffer.write(Uri.encodeComponent(content)); Ack, rereading the RFC again. And again. It's almost as if it's incompletely specified :) The format of a data URI is: 'data:' (type '/' subtype)? (';' attribute '=' value)* (';base64')? ',' uric* where type/subtype/attribute/value comes from RFC 2045 and uric comes from RFC 2396. The type, subtype, attribute and value are percent-encoded if they contain characters that are not token characters (ASCII minus SPACE, CTLs and tspecial) - or '%' itself, I guess. If the content (a sequence of bytes) is not base-64 encoded (RFC 4648), then it percent escapes (RFC 2396) non-uric characters. It's not clear what happens if a non-ASCII character is included in attribute or value. It needs to be encoded to bytes somehow because that's all we can represent with percent escapes. There is an RFC(2231) which defines how to add encoding to attributes (foo*=utf-8'en-us'%A8%94 - the 'foo*' means that the name is foo and the content is special). We probably don't want to support that. I think I'll just UTF-8 + percent encode parameter keys and values if they contain non-ASCII characters, and percent-encode non-token ASCII chars. For the data part, the allowed character is uric := reserved | unreserved | escape , so it doesn't need to escape reserved characters. It's still not exactly the correct characters, I'll make a new table with the correct characters to not escape (which is annoyingly close to Uri._encodFullTable, but doesn't contain `#`. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2714: factory DataUri.fromUri(Uri uri) { Documented on parse, referenced here. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2730: } True. Fixed. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2915: encoding: LATIN1)); No longer necessary using the new BASE64 decoder (but it does assume that percent-escapes only occurs in the padding). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2983: String contentAsString({Encoding encoding: UTF8}) { Sounds reasonable. If there is no charset parameter, the default is US-ASCII, so the failure should only happen if we have a present-but-unrecognizable charset parameter. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3008: var end = _separatorIndices[i]; It's not even possible to test it, because it's all hidden behind private functions and constructors. I'll add some assertions to ensure that invariants hold. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3010: String value = Uri._uriDecode(_text, start: equals + 1, end: end); > The data URI spec says that percent-encoding can be used to escape non-token > chars in parameter values. ACK, so we must escape existing percent characters. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3010: String value = Uri._uriDecode(_text, start: equals + 1, end: end); It seems Chrome distinguishes between "data:text/html;%20charset=UTF-8,<%c2%80>" and "data:text/html;charset=UTF-8,<%c2%80>", where only the latter is interpreted as UTF-8, so whitespace is not ignored. (What is shown on in the address bar is not the plain URI, it does some tweaks to make it more readable). https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3111: class DataUriParameter { Good point. It doesn't actually look like parameters can be repeated, so I'll change it to a Map<String, String>. https://codereview.chromium.org/1381033002/diff/80001/sdk/lib/uri/uri.dart File sdk/lib/uri/uri.dart (right): https://codereview.chromium.org/1381033002/diff/80001/sdk/lib/uri/uri.dart#ne... sdk/lib/uri/uri.dart:2678: /** The entire content of the data URI, including the leading `data:`. */ This getter is identical to toString. Should I just remove it? (Considering that I forgot about it completely when writing tests, using toString everywhere instead). https://codereview.chromium.org/1381033002/diff/80001/sdk/lib/uri/uri.dart#ne... sdk/lib/uri/uri.dart:2917: List<int> contentAsBytes() { This sounds like it should be a getter, but the corresponding "contentAsString" is a function because it takes an encoding as argument. WTD?
https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2633: class DataUri { On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > The Uri class is designed only for hierarchial URIs and data: URIs are not > hierarchial, so it is not really an "is-a" relation. They both relate to the > generic URI concept, but they are different and have widely different meanings. That's not communicated anywhere in the documentation, and it's certainly not communicated by the name. If it only supported hierarchical URLs, then it should be called "Url", not "Uri". As it is, the name and the documentation clearly say that it does represent the generic URI concept, even if it has some getters that are particularly useful for the hierarchical subset. That also doesn't match the way it's used in practice. APIs like Isolate.spawnUri take Uris, and work perfectly well when they're non-hierarchical—they even tend to support Data URIs in particular. It seems perverse to say that these core methods support data URIs but not the canonical Dart object used to represent data URIs. Even this class itself supports fromUri and toUri. How do those methods make sense if a Uri should only be hierarchical? > If you have a URI string with a "data:" scheme, you can create a DataUri object > to access its parts, just as you can create a Uri object for an "http:" schemed > URI. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2684: buffer.write(Uri.encodeComponent(content)); On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > Ack, rereading the RFC again. And again. > It's almost as if it's incompletely specified :) > > The format of a data URI is: > 'data:' (type '/' subtype)? (';' attribute '=' value)* > (';base64')? ',' uric* > > where type/subtype/attribute/value comes from RFC 2045 and uric comes from RFC > 2396. > The type, subtype, attribute and value are percent-encoded if they contain > characters that are not token characters (ASCII minus SPACE, CTLs and tspecial) > - or '%' itself, I guess. > If the content (a sequence of bytes) is not base-64 encoded (RFC 4648), then it > percent escapes (RFC 2396) non-uric characters. > > It's not clear what happens if a non-ASCII character is included in attribute or > value. It needs to be encoded to bytes somehow because that's all we can > represent with percent escapes. > There is an RFC(2231) which defines how to add encoding to attributes > (foo*=utf-8'en-us'%A8%94 - the 'foo*' means that the name is foo and the content > is special). We probably don't want to support that. > > I think I'll just UTF-8 + percent encode parameter keys and values if they > contain non-ASCII characters, and percent-encode non-token ASCII chars. I think that's the right behavior. I believe there's a blanket rule for URIs that percent-escapes should be interpreted as UTF-8 unless otherwise specified. > For the data part, the allowed character is uric := reserved | unreserved | > escape , so it doesn't need to escape reserved characters. It's still not > exactly the correct characters, I'll make a new table with the correct > characters to not escape (which is annoyingly close to Uri._encodFullTable, but > doesn't contain `#`.
Just noticed this while trying to get up to speed on this CL and the reasoning for creating yet another dart: library. -Ivan https://codereview.chromium.org/1381033002/diff/120001/sdk/lib/uri/uri.dart File sdk/lib/uri/uri.dart (right): https://codereview.chromium.org/1381033002/diff/120001/sdk/lib/uri/uri.dart#n... sdk/lib/uri/uri.dart:224: bool isRegName(int ch) { Why is this not a static private function? Where is this function being used?
https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2633: class DataUri { On 2015/10/29 00:28:36, nweiz wrote: > On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > > The Uri class is designed only for hierarchial URIs and data: URIs are not > > hierarchial, so it is not really an "is-a" relation. They both relate to the > > generic URI concept, but they are different and have widely different > meanings. > > > That's not communicated anywhere in the documentation, and it's certainly not > communicated by the name. If it only supported hierarchical URLs, then it should > be called "Url", not "Uri". As it is, the name and the documentation clearly say > that it does represent the generic URI concept, even if it has some getters that > are particularly useful for the hierarchical subset. > > That also doesn't match the way it's used in practice. APIs like > Isolate.spawnUri take Uris, and work perfectly well when they're > non-hierarchical—they even tend to support Data URIs in particular. It seems > perverse to say that these core methods support data URIs but not the canonical > Dart object used to represent data URIs. > > Even this class itself supports fromUri and toUri. How do those methods make > sense if a Uri should only be hierarchical? Good points. So this class is not really a "data URI", it's more like a "data URI helper" class. It's not a URI, but it gives access to the scheme specific structure of Data URIs. It's still not functionality I want to merge into the URI class, or that I would want to share with the URI functionality, so I guess the best option is to find a better name for the class. DataUriHelper is ... bad. ("Helper", "Manager", "Impl" etc. are non-descriptive words added to distinguish something from something else that would otherwise have the same name). DataUriParser makes more sense, but misses that it is a way to create them. Suggestions? https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... File tests/corelib/data_uri_test.dart (right): https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:20: } Good points. Also, I should test the "base64" and "percentEncode" parameters on the constructors. > * Percent-encoding non-token characters in the media type and its parameters. > * Non-ASCII media type and parameters. Extended testInvalidCharacters to include non-ASCII characters. It already tests non-token ASCII characters, including space. > * Valid whitespace around various tokens. > * Invalid pure-whitespace tokens. I still don't think there is such a thing as "valid whitespace" - all whitespace will be percent-encoded, and therefore not be whitespace, but escapes. I should catch empty attribute and value strings (property keys) somehow, it's not allowed by the "token" production. https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:83: bytes[i] = i; On 2015/10/15 21:09:04, nweiz wrote: > Won't the latter half of this list be outside the range of Uint8? Shouldn't that > throw an error? Storing into an Uint8List will truncate, so I get 0..255 twice, just to make sure that we allow more values after any value. https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:124: Expect.equals("data:;base64,", uri.text); Mostly as sanity checks. I'll move them to main and extend them a little. https://codereview.chromium.org/1381033002/diff/20001/tests/corelib/data_uri_... tests/corelib/data_uri_test.dart:136: Expect.listEquals([0], uri.contentAsBytes()); That's really base-64 decoding now, so I'll just remove them. https://codereview.chromium.org/1381033002/diff/120001/sdk/lib/uri/uri.dart File sdk/lib/uri/uri.dart (right): https://codereview.chromium.org/1381033002/diff/120001/sdk/lib/uri/uri.dart#n... sdk/lib/uri/uri.dart:224: bool isRegName(int ch) { I don't think it is being used at all. That's actually odd, must be a merge problem. There is a static _isRegNameChar function.
lrn@google.com changed reviewers: + rnystrom@google.com - floitsch@google.com, iposva@google.com, kevmoo@google.com
PTAL
On 2015/11/06 14:03:53, Lasse Reichstein Nielsen wrote: > PTAL Please add corresponding changes to the CHANGELOG - I see you're touching a couple of classes – I'd love the API delta to be very clear
floitsch@google.com changed reviewers: + floitsch@google.com
LGTM, although I missed the argument, why DataUri should be in dart:core. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2315: if (end == null) end = text.length; if we have start and end, shouldn't we at least check that start <= end? https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2320: simple = codeUnit != _PERCENT && codeUnit != _PLUS; I prefer: if (codeUnit == _PERCENT || codeUnit == _PLUS) { simple = false; break; (and remove the && simple check in the loop-condition) } In the end it's doing the same, but imho it's more obvious of what happens, since everything is in one place. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:784: base64: base64).uri; move ".uri" to the next line, so it's visible. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:810: percentEncoded: percentEncoded).uri; ditto. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:1420: static bool _isUnreservedChar(int char) { Move that function closer to the table. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:2378: if (end == null) end = text.length; Nit: end ??= text.length; https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:2928: * The [Uri] that this `UriData` is giving access to.. Remove ".' https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:3157: // parse parameters and/or "base64". Nit: "Parse"
https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2315: if (end == null) end = text.length; This is a private function, I usually don't check parameters there because I'm the only one calling it. If I'm worried, I'll add an assert. I also usually don't make them have optional parameters. I'll convert this to non-optional parameters, and add an assert for good measure. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2320: simple = codeUnit != _PERCENT && codeUnit != _PLUS; On 2015/11/07 02:56:26, floitsch wrote: > I prefer: > > if (codeUnit == _PERCENT || codeUnit == _PLUS) { > simple = false; > break; (and remove the && simple check in the loop-condition) > } > > In the end it's doing the same, but imho it's more obvious of what happens, > since everything is in one place. Done. https://codereview.chromium.org/1381033002/diff/20001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2633: class DataUri { On 2015/11/03 18:02:52, Lasse Reichstein Nielsen wrote: > On 2015/10/29 00:28:36, nweiz wrote: > > On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > > > The Uri class is designed only for hierarchial URIs and data: URIs are not > > > hierarchial, so it is not really an "is-a" relation. They both relate to the > > > generic URI concept, but they are different and have widely different > > meanings. > > > > > > That's not communicated anywhere in the documentation, and it's certainly not > > communicated by the name. If it only supported hierarchical URLs, then it > should > > be called "Url", not "Uri". As it is, the name and the documentation clearly > say > > that it does represent the generic URI concept, even if it has some getters > that > > are particularly useful for the hierarchical subset. > > > > That also doesn't match the way it's used in practice. APIs like > > Isolate.spawnUri take Uris, and work perfectly well when they're > > non-hierarchical—they even tend to support Data URIs in particular. It seems > > perverse to say that these core methods support data URIs but not the > canonical > > Dart object used to represent data URIs. > > > > Even this class itself supports fromUri and toUri. How do those methods make > > sense if a Uri should only be hierarchical? > > > Good points. > So this class is not really a "data URI", it's more like a "data URI helper" > class. It's not a URI, but it gives access to the scheme specific structure of > Data URIs. > > It's still not functionality I want to merge into the URI class, or that I would > want to share with the URI functionality, so I guess the best option is to find > a better name for the class. > > DataUriHelper is ... bad. ("Helper", "Manager", "Impl" etc. are non-descriptive > words added to distinguish something from something else that would otherwise > have the same name). > > DataUriParser makes more sense, but misses that it is a way to create them. > > Suggestions? https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:784: base64: base64).uri; On 2015/11/07 02:56:26, floitsch wrote: > move ".uri" to the next line, so it's visible. Done. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:810: percentEncoded: percentEncoded).uri; On 2015/11/07 02:56:27, floitsch wrote: > ditto. Done. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:1420: static bool _isUnreservedChar(int char) { On 2015/11/07 02:56:27, floitsch wrote: > Move that function closer to the table. Done. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:2378: if (end == null) end = text.length; Removed completely. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:2928: * The [Uri] that this `UriData` is giving access to.. On 2015/11/07 02:56:26, floitsch wrote: > Remove ".' Done. https://codereview.chromium.org/1381033002/diff/140001/sdk/lib/core/uri.dart#... sdk/lib/core/uri.dart:3157: // parse parameters and/or "base64". On 2015/11/07 02:56:27, floitsch wrote: > Nit: "Parse" Done.
Message was sent while issue was closed.
Committed patchset #9 (id:160001) manually as bbc66c2c41e61d82d6b85fdd16e2b0f1204c2a33 (presubmit successful).
Message was sent while issue was closed.
Description was changed from ========== Add data-URI support class to dart:core (next to Uri). R=floitsch@google.com Committed: https://github.com/dart-lang/sdk/commit/bbc66c2c41e61d82d6b85fdd16e2b0f1204c2a33 ========== to ========== Add data-URI support class to dart:core (next to Uri). R=floitsch@google.com Committed: https://github.com/dart-lang/sdk/commit/bbc66c2c41e61d82d6b85fdd16e2b0f1204c2a33 ==========
Will land again when the (unrelated?) VM bug it triggers is fixed. https://codereview.chromium.org/1381033002/diff/180001/tests/try/poi/serializ... File tests/try/poi/serialize_test.dart (right): https://codereview.chromium.org/1381033002/diff/180001/tests/try/poi/serializ... tests/try/poi/serialize_test.dart:589: } Fragile test, breaks if you touch dart:core. Should be fixed (but *sigh* it's easier to fix the test than to mark it as failing and create an issue for it). Notice: I do not understand this test, I just make the expectation match the actual result, and hope it's correct. https://codereview.chromium.org/1381033002/diff/180001/tools/testing/dart/tes... File tools/testing/dart/test_options.dart (right): https://codereview.chromium.org/1381033002/diff/180001/tools/testing/dart/tes... tools/testing/dart/test_options.dart:885: if (option.keys.contains(name)) { That's exactly what "contains" does. https://codereview.chromium.org/1381033002/diff/180001/tools/testing/dart/tes... File tools/testing/dart/test_suite.dart (right): https://codereview.chromium.org/1381033002/diff/180001/tools/testing/dart/tes... tools/testing/dart/test_suite.dart:227: var jsshellDir = '${TestUtils.dartDir.toNativePath()}/tools/testing/bin'; Didn't work on Windows. I guess we don't continously run tests with jsshell on Windows.
On 2015/11/11 16:05:12, Lasse Reichstein Nielsen wrote: > Will land again when the (unrelated?) VM bug it triggers is fixed. > This should not block you. Land it, supppressing the failing test and filing an issue with the VM (if not already done).
lrn@google.com changed reviewers: + fschneider@google.com, regis@google.com
https://codereview.chromium.org/1381033002/diff/220001/runtime/vm/flow_graph_... File runtime/vm/flow_graph_optimizer.cc (left): https://codereview.chromium.org/1381033002/diff/220001/runtime/vm/flow_graph_... runtime/vm/flow_graph_optimizer.cc:182: ASSERT(!call->ic_data()->IssuedJSWarning()); This was hit by standalone/javascript_compatibility_test/{20,21,50,51} - unless the optimization threshold was raised from 5 to 7 in that test.
lgtm https://codereview.chromium.org/1381033002/diff/220001/runtime/vm/flow_graph_... File runtime/vm/flow_graph_optimizer.cc (left): https://codereview.chromium.org/1381033002/diff/220001/runtime/vm/flow_graph_... runtime/vm/flow_graph_optimizer.cc:182: ASSERT(!call->ic_data()->IssuedJSWarning()); On 2015/11/12 11:48:25, Lasse Reichstein Nielsen wrote: > This was hit by standalone/javascript_compatibility_test/{20,21,50,51} - unless > the optimization threshold was raised from 5 to 7 in that test. The reason you hit this is because the string interpolation function is now optimized earlier. The assert is not valid -- a call that throws a JS compatibility warning error may have empty IC data, but the IssuedJSWarning flag set.
Message was sent while issue was closed.
Committed patchset #12 (id:220001) manually as ed0b187d583493432b10aaaac2844f05f4bcce3e (presubmit successful).
Message was sent while issue was closed.
kevmoo@google.com changed reviewers: + kevmoo@google.com
Message was sent while issue was closed.
https://codereview.chromium.org/1381033002/diff/220001/CHANGELOG.md File CHANGELOG.md (right): https://codereview.chromium.org/1381033002/diff/220001/CHANGELOG.md#newcode9 CHANGELOG.md:9: * `dart:core` Tiny nit: in the future keep these updates sorted by library name. But having the changelog update at all is freakin' awesome. Thanks! |