sdk/lib/core/uri.dart - Issue 1381033002: Add data-URI support class to dart:core (next to Uri).

Side by Side Diff: sdk/lib/core/uri.dart

Issue 1381033002: Add data-URI support class to dart:core (next to Uri). (Closed) Base URL: https://github.com/dart-lang/sdk.git@master

Patch Set: Update CHANGELOG.md Created 5 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 // Copyright (c) 2012, the Dart project authors. Please see the AUTHORS file	1 // Copyright (c) 2012, the Dart project authors. Please see the AUTHORS file

2 // for details. All rights reserved. Use of this source code is governed by a	2 // for details. All rights reserved. Use of this source code is governed by a

3 // BSD-style license that can be found in the LICENSE file.	3 // BSD-style license that can be found in the LICENSE file.

4	4

5 part of dart.core;	5 part of dart.core;

6	6

7 /**	7 /**

8 * A parsed URI, such as a URL.	8 * A parsed URI, such as a URL.

9 *	9 *

10 * See also:	10 * See also:

(...skipping 2291 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
2302 *	2302 *

2303 * This function is similar to the JavaScript-function `decodeURI`.	2303 * This function is similar to the JavaScript-function `decodeURI`.

2304 *	2304 *

2305 * If [plusToSpace] is `true`, plus characters will be converted to spaces.	2305 * If [plusToSpace] is `true`, plus characters will be converted to spaces.

2306 *	2306 *

2307 * The decoder will create a byte-list of the percent-encoded parts, and then	2307 * The decoder will create a byte-list of the percent-encoded parts, and then

2308 * decode the byte-list using [encoding]. The default encodingis UTF-8.	2308 * decode the byte-list using [encoding]. The default encodingis UTF-8.

2309 */	2309 */

2310 static String _uriDecode(String text,	2310 static String _uriDecode(String text,

2311 {bool plusToSpace: false,	2311 {bool plusToSpace: false,

2312 Encoding encoding: UTF8}) {	2312 Encoding encoding: UTF8,

	2313 int start: 0,

	2314 int end}) {

	2315 if (end == null) end = text.length;
	floitsch 2015/11/07 02:56:26 if we have start and end, shouldn't we at least ch if we have start and end, shouldn't we at least check that start <= end? Lasse Reichstein Nielsen 2015/11/09 10:27:38 This is a private function, I usually don't check This is a private function, I usually don't check parameters there because I'm the only one calling it. If I'm worried, I'll add an assert. I also usually don't make them have optional parameters. I'll convert this to non-optional parameters, and add an assert for good measure.
2313 // First check whether there is any characters which need special handling.	2316 // First check whether there is any characters which need special handling.

2314 bool simple = true;	2317 bool simple = true;

2315 for (int i = 0; i < text.length && simple; i++) {	2318 for (int i = start; i < end && simple; i++) {

2316 var codeUnit = text.codeUnitAt(i);	2319 var codeUnit = text.codeUnitAt(i);

2317 simple = codeUnit != _PERCENT && codeUnit != _PLUS;	2320 simple = codeUnit != _PERCENT && codeUnit != _PLUS;
	floitsch 2015/11/07 02:56:26 I prefer: if (codeUnit == _PERCENT \|\| codeUnit == I prefer: if (codeUnit == _PERCENT \|\| codeUnit == _PLUS) { simple = false; break; (and remove the && simple check in the loop-condition) } In the end it's doing the same, but imho it's more obvious of what happens, since everything is in one place. Lasse Reichstein Nielsen 2015/11/09 10:27:38 Done. Show quoted text On 2015/11/07 02:56:26, floitsch wrote: > I prefer: > > if (codeUnit == _PERCENT \|\| codeUnit == _PLUS) { > simple = false; > break; (and remove the && simple check in the loop-condition) > } > > In the end it's doing the same, but imho it's more obvious of what happens, > since everything is in one place. Done.
2318 }	2321 }

2319 List<int> bytes;	2322 List<int> bytes;

2320 if (simple) {	2323 if (simple) {

2321 if (encoding == UTF8 \|\| encoding == LATIN1) {	2324 if (encoding == UTF8 \|\| encoding == LATIN1) {

2322 return text;	2325 return text.substring(start, end);

	2326 } else if (start == 0 && end == text.length) {

	2327 bytes = text.codeUnits;

2323 } else {	2328 } else {

2324 bytes = text.codeUnits;	2329 var decoder = encoding.decoder;

	2330 var result;

	2331 var conversionSink = decoder.startChunkedConversion(

	2332 new ChunkedConversionSink((list) {

	2333 result = list.join();

	2334 }));

	2335 if (conversionSink is ByteConversionSink) {

	2336 conversionSink.addSlice(text.codeUnits, start, end, true);

	2337 } else {

	2338 conversionSink.add(text.codeUnits.sublist(start, end));

	2339 conversionSink.close();

	2340 }

	2341 return result;

2325 }	2342 }

2326 } else {	2343 } else {

2327 bytes = new List();	2344 bytes = new List();

2328 for (int i = 0; i < text.length; i++) {	2345 for (int i = start; i < end; i++) {

2329 var codeUnit = text.codeUnitAt(i);	2346 var codeUnit = text.codeUnitAt(i);

2330 if (codeUnit > 127) {	2347 if (codeUnit > 127) {

2331 throw new ArgumentError("Illegal percent encoding in URI");	2348 throw new ArgumentError("Illegal percent encoding in URI");

2332 }	2349 }

2333 if (codeUnit == _PERCENT) {	2350 if (codeUnit == _PERCENT) {

2334 if (i + 3 > text.length) {	2351 if (i + 3 > text.length) {

2335 throw new ArgumentError('Truncated URI');	2352 throw new ArgumentError('Truncated URI');

2336 }	2353 }

2337 bytes.add(_hexCharPairToByte(text, i + 1));	2354 bytes.add(_hexCharPairToByte(text, i + 1));

2338 i += 2;	2355 i += 2;

(...skipping 251 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
2590 // 0123456789:; = ?	2607 // 0123456789:; = ?

2591 0xafff, // 0x30 - 0x3f 1111111111110101	2608 0xafff, // 0x30 - 0x3f 1111111111110101

2592 // @ABCDEFGHIJKLMNO	2609 // @ABCDEFGHIJKLMNO

2593 0xffff, // 0x40 - 0x4f 1111111111111111	2610 0xffff, // 0x40 - 0x4f 1111111111111111

2594 // PQRSTUVWXYZ _	2611 // PQRSTUVWXYZ _

2595 0x87ff, // 0x50 - 0x5f 1111111111100001	2612 0x87ff, // 0x50 - 0x5f 1111111111100001

2596 // abcdefghijklmno	2613 // abcdefghijklmno

2597 0xfffe, // 0x60 - 0x6f 0111111111111111	2614 0xfffe, // 0x60 - 0x6f 0111111111111111

2598 // pqrstuvwxyz ~	2615 // pqrstuvwxyz ~

2599 0x47ff]; // 0x70 - 0x7f 1111111111100010	2616 0x47ff]; // 0x70 - 0x7f 1111111111100010

	2617

2600 }	2618 }

	2619

	2620 // --------------------------------------------------------------------

	2621 // Data URI

	2622 // --------------------------------------------------------------------

	2623

	2624 /**

	2625 * A representation of a `data:` URI.

	2626 *

	2627 * Data URIs are non-hierarchial URIs that contain can contain any data.

	2628 * They are defined by [RFC 2397](https://tools.ietf.org/html/rfc2397).

	2629 *

	2630 * This class allows parsing the URI text and extracting individual parts of the

	2631 * URI, as well as building the URI text from structured parts.

	2632 */

	2633 class DataUri {
	nweiz 2015/10/15 21:09:03 It seems strange that this isn't a subclass of Uri It seems strange that this isn't a subclass of Uri. It certainly satisfies the "is a" criterion, and I'd expect to be able to pass a data URI anywhere I could pass a normal URI. Having to explicitly convert back and forth sounds awkward. Lasse Reichstein Nielsen 2015/10/28 13:55:47 The Uri class is designed only for hierarchial URI The Uri class is designed only for hierarchial URIs and data: URIs are not hierarchial, so it is not really an "is-a" relation. They both relate to the generic URI concept, but they are different and have widely different meanings. If you have a URI string with a "data:" scheme, you can create a DataUri object to access its parts, just as you can create a Uri object for an "http:" schemed URI. nweiz 2015/10/29 00:28:36 That's not communicated anywhere in the documentat Show quoted text On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > The Uri class is designed only for hierarchial URIs and data: URIs are not > hierarchial, so it is not really an "is-a" relation. They both relate to the > generic URI concept, but they are different and have widely different meanings. That's not communicated anywhere in the documentation, and it's certainly not communicated by the name. If it only supported hierarchical URLs, then it should be called "Url", not "Uri". As it is, the name and the documentation clearly say that it does represent the generic URI concept, even if it has some getters that are particularly useful for the hierarchical subset. That also doesn't match the way it's used in practice. APIs like Isolate.spawnUri take Uris, and work perfectly well when they're non-hierarchical—they even tend to support Data URIs in particular. It seems perverse to say that these core methods support data URIs but not the canonical Dart object used to represent data URIs. Even this class itself supports fromUri and toUri. How do those methods make sense if a Uri should only be hierarchical? Show quoted text > If you have a URI string with a "data:" scheme, you can create a DataUri object > to access its parts, just as you can create a Uri object for an "http:" schemed > URI. Lasse Reichstein Nielsen 2015/11/03 18:02:52 Good points. So this class is not really a "data U Show quoted text On 2015/10/29 00:28:36, nweiz wrote: > On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > > The Uri class is designed only for hierarchial URIs and data: URIs are not > > hierarchial, so it is not really an "is-a" relation. They both relate to the > > generic URI concept, but they are different and have widely different > meanings. > > > That's not communicated anywhere in the documentation, and it's certainly not > communicated by the name. If it only supported hierarchical URLs, then it should > be called "Url", not "Uri". As it is, the name and the documentation clearly say > that it does represent the generic URI concept, even if it has some getters that > are particularly useful for the hierarchical subset. > > That also doesn't match the way it's used in practice. APIs like > Isolate.spawnUri take Uris, and work perfectly well when they're > non-hierarchical—they even tend to support Data URIs in particular. It seems > perverse to say that these core methods support data URIs but not the canonical > Dart object used to represent data URIs. > > Even this class itself supports fromUri and toUri. How do those methods make > sense if a Uri should only be hierarchical? Good points. So this class is not really a "data URI", it's more like a "data URI helper" class. It's not a URI, but it gives access to the scheme specific structure of Data URIs. It's still not functionality I want to merge into the URI class, or that I would want to share with the URI functionality, so I guess the best option is to find a better name for the class. DataUriHelper is ... bad. ("Helper", "Manager", "Impl" etc. are non-descriptive words added to distinguish something from something else that would otherwise have the same name). DataUriParser makes more sense, but misses that it is a way to create them. Suggestions? Lasse Reichstein Nielsen 2015/11/09 10:27:37 On 2015/11/03 18:02:52, Lasse Reichstein Nielsen w Show quoted text On 2015/11/03 18:02:52, Lasse Reichstein Nielsen wrote: > On 2015/10/29 00:28:36, nweiz wrote: > > On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > > > The Uri class is designed only for hierarchial URIs and data: URIs are not > > > hierarchial, so it is not really an "is-a" relation. They both relate to the > > > generic URI concept, but they are different and have widely different > > meanings. > > > > > > That's not communicated anywhere in the documentation, and it's certainly not > > communicated by the name. If it only supported hierarchical URLs, then it > should > > be called "Url", not "Uri". As it is, the name and the documentation clearly > say > > that it does represent the generic URI concept, even if it has some getters > that > > are particularly useful for the hierarchical subset. > > > > That also doesn't match the way it's used in practice. APIs like > > Isolate.spawnUri take Uris, and work perfectly well when they're > > non-hierarchical—they even tend to support Data URIs in particular. It seems > > perverse to say that these core methods support data URIs but not the > canonical > > Dart object used to represent data URIs. > > > > Even this class itself supports fromUri and toUri. How do those methods make > > sense if a Uri should only be hierarchical? > > > Good points. > So this class is not really a "data URI", it's more like a "data URI helper" > class. It's not a URI, but it gives access to the scheme specific structure of > Data URIs. > > It's still not functionality I want to merge into the URI class, or that I would > want to share with the URI functionality, so I guess the best option is to find > a better name for the class. > > DataUriHelper is ... bad. ("Helper", "Manager", "Impl" etc. are non-descriptive > words added to distinguish something from something else that would otherwise > have the same name). > > DataUriParser makes more sense, but misses that it is a way to create them. > > Suggestions?
	2634 static const int _noScheme = -1;

	2635 /**

	2636 * Contains the text content of a `data:` URI, with or without a

	2637 * leading `data:`.

	2638 *

	2639 * If [_separatorIndices] starts with `4` (the index of the `:`), then

	2640 * there is a leading `data:`, otherwise _separatorIndices starts with

	2641 * `-1`.

	2642 */

	2643 final String _text;

	2644

	2645 /**

	2646 * List of the separators (';', '=' and ',') in the text.

	2647 *

	2648 * Starts with the index of the index of the `:` in `data:` of the mimeType.

	2649 * That is always either -1 or 4, depending on whether `_text` includes the

	2650 * `data:` scheme or not.

	2651 *

	2652 * The first speparator ends the mime type. We don't bother with finding

	2653 * the '/' inside the mime type.

	2654 *

	2655 * Each two separators after that marks a parameter key and value.

	2656 *

	2657 * If there is a single separator left, it ends the "base64" marker.

	2658 *

	2659 * So the following separators are found for a text:

	2660 *

	2661 * data:text/plain;foo=bar;base64,ARGLEBARGLE=

	2662 * ^ ^ ^ ^ ^

	2663 *

	2664 */

	2665 List<int> _separatorIndices;

	2666

	2667 DataUri._(this._text,

	2668 this._separatorIndices);

	2669

	2670 /** The entire content of the data URI, including the leading `data:`. */

	2671 String get text => _separatorIndices[0] == _noScheme ? "data:$_text" : _text;

	2672

	2673 /**

	2674 * Creates a `data:` URI containing the contents as percent-encoded text.

	2675 */

	2676 factory DataUri.fromString(String content,
	nweiz 2015/10/15 21:09:03 Why can't you base64-encode text, or percent-encod Why can't you base64-encode text, or percent-encode binary data? If my text contains a lot of non-URL-safe characters (particularly non-ASCII characters) that would need to be encoded anyway, base64 may well end up being more compact. Lasse Reichstein Nielsen 2015/10/16 14:38:45 You can't base-64 encode text - base64 encoding on You can't base-64 encode text - base64 encoding only works on bytes and strings are not bytes. I could add an implicit UTF-8 encoding of the text, but I prefer to keep this simple and explicit. If you need to UTF-8 encode, you choose to do that. The other direction: bytes to percent-escaped is possible, but rarely useful (unless you happen to know that your bytes contain a lot of ASCII characters that won't be escaped, and in that case I don't mind asking you to create a string from them). nweiz 2015/10/19 19:51:20 I was thinking something parallel to File.writeAsS Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > You can't base-64 encode text - base64 encoding only works on bytes and strings > are not bytes. > > I could add an implicit UTF-8 encoding of the text, but I prefer to keep this > simple and explicit. If you need to UTF-8 encode, you choose to do that. I was thinking something parallel to File.writeAsString and similar APIs: it takes a string as well as an encoding parameter that defaults to UTF-8. Show quoted text > The other direction: bytes to percent-escaped is possible, but rarely useful > (unless you happen to know that your bytes contain a lot of ASCII characters > that won't be escaped, and in that case I don't mind asking you to create a > string from them). Making users convert to a string won't work in general; Dart's UTF-16 strings can't fully represent binary data in a way that will be correctly translated to UTF-8 (or any other encoding). Lasse Reichstein Nielsen 2015/10/28 13:55:47 True. We probably should have an encoding here: Th True. We probably should have an encoding here: The "content" string is converted to bytes, and there is no reason to fix that to UTF-8. (Well, except that we have to write the code to do the encoding since the functions in Uri only works on strings). The content of a data: URI is always a sequence of bytes, so encoding information is important - so we should use it and store it.
	2677 {mimeType: "text/plain",

	2678 Iterable<DataUriParameter> parameters}) {
	nweiz 2015/10/15 21:09:03 Right now, the encoding is implicitly always UTF-8 Right now, the encoding is implicitly always UTF-8, but the corresponding charset parameter isn't automatically included. Ideally, the user would be able to specify an encoding and a charset parameter would be added based on that. Otherwise, this should add charset=UTF-8, and the documentation should be explicit about what encoding is used. Lasse Reichstein Nielsen 2015/10/16 14:38:45 Good point. The default, if nothing is written, is Good point. The default, if nothing is written, is charset=US-ASCII which won't be correct here. Lasse Reichstein Nielsen 2015/10/28 13:55:47 I've changed this to add an "Encoding charset" par I've changed this to add an "Encoding charset" parameter which is used to convert the string to bytes (and which may throw if the string isn't compatible). It defaults to US-ASCII if you don't specify anything.
	2679 StringBuffer buffer = new StringBuffer();

	2680 List indices = [_noScheme];

	2681 _writeUri(mimeType, parameters, buffer, indices);

	2682 indices.add(buffer.length);

	2683 buffer.write(',');

	2684 buffer.write(Uri.encodeComponent(content));
	nweiz 2015/10/15 21:09:04 URI.encodeComponent doesn't encode a number of res URI.encodeComponent doesn't encode a number of reserved characters ("!~'()"). I assume most parsers will do the right thing anyway, but this is a place where the implementation diverges from the spec. Lasse Reichstein Nielsen* 2015/10/16 14:38:45 ACK. The syntax for parameter keys an values are R ACK. The syntax for parameter keys an values are RFC 2045 tokens which do not contain percent encodings (they may contain percent, but it wouldn't mean it's encoded). We should not encode here, just validate, and do it against the _tokenCharTable, and fail on any invalid character. If you need to encode something, you should do it explicitly before calling here. nweiz 2015/10/19 19:51:20 You should certainly percent-encode here, becaus Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > ACK. The syntax for parameter keys an values are RFC 2045 tokens which do not > contain percent encodings (they may contain percent, but it wouldn't mean it's > encoded). > We should not encode here, just validate, and do it against the _tokenCharTable, > and fail on any invalid character. > If you need to encode something, you should do it explicitly before calling > here. You should certainly percent-encode here, because it's the data. But the data URI spec says you should also percent-encode parameters: "parameter values should use the URL Escaped encoding instead of quoted string if the parameter values contain any 'tspecial'.". Lasse Reichstein Nielsen 2015/10/28 13:55:47 Ack, rereading the RFC again. And again. It's almo Ack, rereading the RFC again. And again. It's almost as if it's incompletely specified :) The format of a data URI is: 'data:' (type '/' subtype)? (';' attribute '=' value)* (';base64')? ',' uric* where type/subtype/attribute/value comes from RFC 2045 and uric comes from RFC 2396. The type, subtype, attribute and value are percent-encoded if they contain characters that are not token characters (ASCII minus SPACE, CTLs and tspecial) - or '%' itself, I guess. If the content (a sequence of bytes) is not base-64 encoded (RFC 4648), then it percent escapes (RFC 2396) non-uric characters. It's not clear what happens if a non-ASCII character is included in attribute or value. It needs to be encoded to bytes somehow because that's all we can represent with percent escapes. There is an RFC(2231) which defines how to add encoding to attributes (foo=utf-8'en-us'%A8%94 - the 'foo' means that the name is foo and the content is special). We probably don't want to support that. I think I'll just UTF-8 + percent encode parameter keys and values if they contain non-ASCII characters, and percent-encode non-token ASCII chars. For the data part, the allowed character is uric := reserved \| unreserved \| escape , so it doesn't need to escape reserved characters. It's still not exactly the correct characters, I'll make a new table with the correct characters to not escape (which is annoyingly close to Uri._encodFullTable, but doesn't contain `#`. nweiz 2015/10/29 00:28:36 I think that's the right behavior. I believe there Show quoted text On 2015/10/28 13:55:47, Lasse Reichstein Nielsen wrote: > Ack, rereading the RFC again. And again. > It's almost as if it's incompletely specified :) > > The format of a data URI is: > 'data:' (type '/' subtype)? (';' attribute '=' value)* > (';base64')? ',' uric* > > where type/subtype/attribute/value comes from RFC 2045 and uric comes from RFC > 2396. > The type, subtype, attribute and value are percent-encoded if they contain > characters that are not token characters (ASCII minus SPACE, CTLs and tspecial) > - or '%' itself, I guess. > If the content (a sequence of bytes) is not base-64 encoded (RFC 4648), then it > percent escapes (RFC 2396) non-uric characters. > > It's not clear what happens if a non-ASCII character is included in attribute or > value. It needs to be encoded to bytes somehow because that's all we can > represent with percent escapes. > There is an RFC(2231) which defines how to add encoding to attributes > (foo=utf-8'en-us'%A8%94 - the 'foo' means that the name is foo and the content > is special). We probably don't want to support that. > > I think I'll just UTF-8 + percent encode parameter keys and values if they > contain non-ASCII characters, and percent-encode non-token ASCII chars. I think that's the right behavior. I believe there's a blanket rule for URIs that percent-escapes should be interpreted as UTF-8 unless otherwise specified. Show quoted text > For the data part, the allowed character is uric := reserved \| unreserved \| > escape , so it doesn't need to escape reserved characters. It's still not > exactly the correct characters, I'll make a new table with the correct > characters to not escape (which is annoyingly close to Uri._encodFullTable, but > doesn't contain `#`.
	2685 return new DataUri._(buffer.toString(), indices);

	2686 }

	2687

	2688 /**

	2689 * Creates a `data:` URI string containing the base-64 encoded content bytes.

	2690 *

	2691 * It defaults to having the mime-type `application/octet-stream`.

	2692 */

	2693 factory DataUri.fromBytes(List<int> bytes,

	2694 {mimeType: "application/octet-stream",

	2695 Iterable<DataUriParameter> parameters}) {

	2696 StringBuffer buffer = new StringBuffer();

	2697 List indices = [_noScheme];

	2698 _writeUri(mimeType, parameters, buffer, indices);

	2699 indices.add(buffer.length);

	2700 buffer.write(';base64,');

	2701 indices.add(buffer.length - 1);

	2702 BASE64.encoder.startChunkedConversion(

	2703 new StringConversionSink.fromStringSink(buffer))

	2704 .addSlice(bytes, 0, bytes.length, true);

	2705 return new DataUri._(buffer.toString(), indices);

	2706 }

	2707

	2708 /**

	2709 * Creates a `DataUri` from a [Uri] which must have `data` as [Uri.scheme].

	2710 *

	2711 * The [uri] must have scheme `data` and no authority, query or fragment,

	2712 * and the path must be valid as a data URI.

	2713 */

	2714 factory DataUri.fromUri(Uri uri) {
	nweiz 2015/10/15 21:09:03 This should document when it will throw FormatExce This should document when it will throw FormatExceptions, especially since some syntax errors are detected eagerly and some are not. Same goes for parse(). Lasse Reichstein Nielsen 2015/10/16 14:38:45 Good point. Good point. Lasse Reichstein Nielsen 2015/10/28 13:55:47 Documented on parse, referenced here. Documented on parse, referenced here.
	2715 if (uri.scheme != "data") {

	2716 throw new ArgumentError.value(uri, "uri",

	2717 "Scheme must be 'data'");

	2718 }

	2719 if (uri.hasAuthority) {

	2720 throw new ArgumentError.value(uri, "uri",

	2721 "Data uri must not have authority");

	2722 }

	2723 if (uri.hasQuery) {

	2724 throw new ArgumentError.value(uri, "uri",

	2725 "Data uri must not have a query part");

	2726 }

	2727 if (uri.hasFragment) {

	2728 throw new ArgumentError.value(uri, "uri",

	2729 "Data uri must not have a fragment part");

	2730 }
	nweiz 2015/10/15 21:09:03 According to https://simonsapin.github.io/data-url According to https://simonsapin.github.io/data-urls/ (which is referenced by browser vendors and so is at least somewhat authoritative), the query should be included in the parsing algorithm and the fragment should be ignored. Lasse Reichstein Nielsen 2015/10/16 14:38:45 True - '?' is a valid uric, so the query should be True - '?' is a valid uric, so the query should be included. Lasse Reichstein Nielsen 2015/10/28 13:55:47 True. Fixed. True. Fixed.
	2731 return _parse(uri.path, 0);

	2732 }

	2733

	2734 /**

	2735 * Writes the initial part of a `data:` uri, from after the "data:"

	2736 * until just before the ',' before the data, or before a `;base64,`

	2737 * marker.

	2738 *

	2739 * Of an [indices] list is passed, separator indices are stored in that

	2740 * list.

	2741 */

	2742 static void _writeUri(String mimeType,

	2743 Iterable<DataUriParameter> parameters,

	2744 StringBuffer buffer, List indices) {

	2745 if (mimeType == null) {

	2746 mimeType = "text/plain";

	2747 }

	2748 if (mimeType.isEmpty \|\|

	2749 identical(mimeType, "text/plain") \|\|
	nweiz 2015/10/15 21:09:03 Consider omitting the text/plain mime type, since Consider omitting the text/plain mime type, since it's the default anyway. Lasse Reichstein Nielsen 2015/10/16 14:38:45 Good idea. Good idea.
	2750 identical(mimeType, "application/octet-stream")) {

	2751 buffer.write(mimeType); // Common cases need no escaping.

	2752 } else {

	2753 int slashIndex = _validateMimeType(mimeType);

	2754 if (slashIndex < 0) {

	2755 throw new ArgumentError.value(mimeType, "mimeType",

	2756 "Invalid MIME type");

	2757 }

	2758 buffer.write(Uri._uriEncode(_tokenCharTable,

	2759 mimeType.substring(0, slashIndex)));

	2760 buffer.write("/");

	2761 buffer.write(Uri._uriEncode(_tokenCharTable,

	2762 mimeType.substring(slashIndex + 1)));

	2763 }

	2764 if (parameters != null) {

	2765 for (var parameter in parameters) {

	2766 if (indices != null) indices.add(buffer.length);

	2767 buffer.write(';');

	2768 // Encode any non-RFC2045-token character as well as '%' and '#'.

	2769 buffer.write(Uri._uriEncode(_tokenCharTable, parameter.key));

	2770 if (indices != null) indices.add(buffer.length);

	2771 buffer.write('=');

	2772 buffer.write(Uri._uriEncode(_tokenCharTable, parameter.value));

	2773 }

	2774 }

	2775 }

	2776

	2777 /**

	2778 * Checks mimeType is valid-ish (`token '/' token`).

	2779 *

	2780 * Returns the index of the slash, or -1 if the mime type is not

	2781 * considered valid.

	2782 *

	2783 * Currently only looks for slashes, all other characters will be

	2784 * percent-encoded as UTF-8 if necessary.

	2785 */

	2786 static int _validateMimeType(String mimeType) {

	2787 int slashIndex = -1;

	2788 for (int i = 0; i < mimeType.length; i++) {

	2789 var char = mimeType.codeUnitAt(i);

	2790 if (char != Uri._SLASH) continue;

	2791 if (slashIndex < 0) {

	2792 slashIndex = i;

	2793 continue;

	2794 }

	2795 return -1;

	2796 }

	2797 return slashIndex;

	2798 }

	2799

	2800 /**

	2801 * Creates a [Uri] with the content of [DataUri.fromString].

	2802 *

	2803 * The resulting URI will have `data` as scheme and the remainder

	2804 * of the data URI as path.

	2805 *

	2806 * Equivalent to creating a `DataUri` using `new DataUri.fromString` and

	2807 * calling `toUri` on the result.

	2808 */

	2809 static Uri uriFromString(String content,

	2810 {mimeType: "text/plain",

	2811 Iterable<DataUriParameter> parameters}) {

	2812 var buffer = new StringBuffer();

	2813 _writeUri(mimeType, parameters, buffer, null);

	2814 buffer.write(',');

	2815 buffer.write(Uri.encodeComponent(content));

	2816 return new Uri(scheme: "data", path: buffer.toString());

	2817 }

	2818

	2819 /**

	2820 * Creates a [Uri] with the content of [DataUri.fromBytes].

	2821 *

	2822 * The resulting URI will have `data` as scheme and the remainder

	2823 * of the data URI as path.

	2824 *

	2825 * Equivalent to creating a `DataUri` using `new DataUri.fromBytes` and

	2826 * calling `toUri` on the result.

	2827 */

	2828 static Uri uriFromBytes(List<int> bytes,

	2829 {mimeType: "text/plain",

	2830 Iterable<DataUriParameter> parameters}) {

	2831 var buffer = new StringBuffer();

	2832 _writeUri(mimeType, parameters, buffer, null);

	2833 buffer.write(';base64,');

	2834 BASE64.encoder.startChunkedConversion(buffer)

	2835 .addSlice(bytes, 0, bytes.length, true);

	2836 return new Uri(scheme: "data", path: buffer.toString());

	2837 }

	2838

	2839 /**

	2840 * Parses a string as a `data` URI.

	2841 */

	2842 static DataUri parse(String uri) {

	2843 if (!uri.startsWith("data:")) {

	2844 throw new FormatException("Does not start with 'data:'", uri, 0);

	2845 }

	2846 return _parse(uri, 5);

	2847 }

	2848

	2849 /**

	2850 * Converts a `DataUri` to a [Uri].

	2851 *

	2852 * Returns a `Uri` with scheme `data` and the remainder of the data URI

	2853 * as path.

	2854 */

	2855 Uri toUri() {

	2856 String content = _text;

	2857 int colonIndex = _separatorIndices[0];

	2858 if (colonIndex >= 0) {

	2859 content = _text.substring(colonIndex + 1);

	2860 }

	2861 return new Uri._internal("data", null, null, null, content, null, null);

	2862 }

	2863

	2864 /**

	2865 * The MIME type of the data URI.

	2866 *

	2867 * A data URI consists of a "media type" followed by data.

	2868 * The mediatype starts with a MIME type and can be followed by

	2869 * extra parameters.

	2870 *

	2871 * Example:

	2872 *

	2873 * data:text/plain;encoding=utf-8,Hello%20World!

	2874 *

	2875 * This data URI has the media type `text/plain;encoding=utf-8`, which is the

	2876 * MIME type `text/plain` with the parameter `encoding` with value `utf-8`.

	2877 * See [RFC 2045](https://tools.ietf.org/html/rfc2045) for more detail.

	2878 *

	2879 * If the first part of the data URI is empty, it defaults to `text/plain`.

	2880 */

	2881 String get mimeType {

	2882 int start = _separatorIndices[0] + 1;

	2883 int end = _separatorIndices[1];

	2884 if (start == end) return "text/plain";

	2885 return Uri._uriDecode(_text, start: start, end: end);

	2886 }

	2887

	2888 /**

	2889 * Whether the data is base64 encoded or not.

	2890 */

	2891 bool get isBase64 => _separatorIndices.length.isOdd;

	2892

	2893 /**

	2894 * The content part of the data URI, as its actual representation.

	2895 *

	2896 * This string may contain percent escapes.

	2897 */

	2898 String get contentText => _text.substring(_separatorIndices.last + 1);

	2899

	2900 /**

	2901 * The content part of the data URI as bytes.

	2902 *

	2903 * If the data is base64 encoded, it will be decoded to bytes.

	2904 *

	2905 * If the data is not base64 encoded, it will be decoded by unescaping

	2906 * percent-escaped characters and returning byte values of each unescaped

	2907 * character. The bytes will not be, e.g., UTF-8 decoded.

	2908 */

	2909 List<int> contentAsBytes() {

	2910 String text = _text;

	2911 int start = _separatorIndices.last + 1;

	2912 if (isBase64) {

	2913 if (text.endsWith("%3D")) {

	2914 return BASE64.decode(Uri._uriDecode(text, start: start,

	2915 encoding: LATIN1));
	nweiz 2015/10/15 21:09:04 Why does this assume a LATIN1 encoding? It should Why does this assume a LATIN1 encoding? It should be based on the parameters, and per spec it should default to ASCII. Lasse Reichstein Nielsen 2015/10/16 14:38:45 This function is not creating text, only bytes, so This function is not creating text, only bytes, so "%c4" should evaluate to the byte 0xc4. This is really a hack to treat escapes as the bytes they represent, which is what LATIN-1 encoding does. nweiz 2015/10/19 19:51:20 It would be good to document that in a comment. Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > This function is not creating text, only bytes, so "%c4" should evaluate to the > byte 0xc4. This is really a hack to treat escapes as the bytes they represent, > which is what LATIN-1 encoding does. It would be good to document that in a comment. Lasse Reichstein Nielsen 2015/10/28 13:55:47 No longer necessary using the new BASE64 decoder ( No longer necessary using the new BASE64 decoder (but it does assume that percent-escapes only occurs in the padding).
	2916 }

	2917 return BASE64.decode(text.substring(start));

	2918 }

	2919

	2920 // Not base64, do percent-decoding and return the remaining bytes.

	2921 // Compute result size.

	2922 const int percent = 0x25;

	2923 int length = text.length - start;

	2924 for (int i = start; i < text.length; i++) {

	2925 var codeUnit = text.codeUnitAt(i);

	2926 if (codeUnit == percent) {

	2927 i += 2;

	2928 length -= 2;

	2929 }

	2930 }

	2931 // Fill result array.

	2932 Uint8List result = new Uint8List(length);

	2933 if (length == text.length) {

	2934 result.setRange(0, length, text.codeUnits, start);

	2935 return result;

	2936 }

	2937 int index = 0;

	2938 for (int i = start; i < text.length; i++) {

	2939 var codeUnit = text.codeUnitAt(i);

	2940 if (codeUnit != percent) {

	2941 result[index++] = codeUnit;

	2942 } else {

	2943 if (i + 2 < text.length) {

	2944 var digit1 = _hexDigit(text.codeUnitAt(i + 1));

	2945 var digit2 = _hexDigit(text.codeUnitAt(i + 2));

	2946 if (digit1 >= 0 && digit2 >= 0) {

	2947 int byte = digit1 * 16 + digit2;

	2948 result[index++] = byte;

	2949 i += 2;

	2950 continue;

	2951 }

	2952 }

	2953 throw new FormatException("Invalid percent escape", text, i);

	2954 }

	2955 }

	2956 assert(index == result.length);

	2957 return result;

	2958 }

	2959

	2960 // Converts a UTF-16 code-unit to its value as a hex digit.

	2961 // Returns -1 for non-hex digits.

	2962 int _hexDigit(int char) {

	2963 const int char_0 = 0x30;

	2964 const int char_a = 0x61;

	2965

	2966 int digit = char ^ char_0;

	2967 if (digit <= 9) return digit;

	2968 char = ((char \| 0x20) - char_a) & 0xFFFF;

	2969 if (char < 6) return 10 + char;

	2970 return -1;

	2971 }

	2972

	2973 /**

	2974 * Returns a string created from the content of the data URI.

	2975 *

	2976 * If the content is base64 encoded, it will be decoded to bytes and then

	2977 * decoded to a string using [encoding].

	2978 *

	2979 * If the content is not base64 encoded, it will first have percent-escapes

	2980 * converted to bytes and then the character codes and byte values are

	2981 * decoded using [encoding].

	2982 */

	2983 String contentAsString({Encoding encoding: UTF8}) {
	nweiz 2015/10/15 21:09:03 The encoding should be taken from the URI's parame The encoding should be taken from the URI's parameters, at least by default. Lasse Reichstein Nielsen 2015/10/16 14:38:45 I really, really don't want to parse the "charset" I really, really don't want to parse the "charset" header. It can take a lot of values that we won't be able to satisfy anyway. I prefer to leave the parsing of parameters to the user who can then err appropriately if the parameters are not understandable. On the other hand, we could promise to understand any of the names accepted by Encoding.getByName (basically ASCII, LATIN-1 and UTF-8 by a number of names), and if we can't understand the charset parameter, we can fall back on LATIN-1 (because LATIN-1 decoding can't fail - all byte sequences are valid - so we just give you the bytes). If we do that, we should also have a getter that gives you the encoding that would be used (maybe "Encoding get charsetEncoding"). nweiz 2015/10/19 19:51:20 I think never failing by default is less useful th Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > I really, really don't want to parse the "charset" header. > It can take a lot of values that we won't be able to satisfy anyway. > I prefer to leave the parsing of parameters to the user who can then err > appropriately if the parameters are not understandable. > > On the other hand, we could promise to understand any of the names accepted by > Encoding.getByName (basically ASCII, LATIN-1 and UTF-8 by a number of names), > and if we can't understand the charset parameter, we can fall back on LATIN-1 > (because LATIN-1 decoding can't fail - all byte sequences are valid - so we just > give you the bytes). > > If we do that, we should also have a getter that gives you the encoding that > would be used (maybe "Encoding get charsetEncoding"). I think never failing by default is less useful than generally doing the right thing by default. I'd say: * If the user passes in an encoding, use that. * Otherwise, if the charset declares a recognized encoding, use that. * Otherwise, fail with a useful message. If the user wants to be sure that they absolutely never fail to decode, they can still do so, but the default is to provide a good error message in cases where we're reasonably confident that a correct decoding isn't possible using dart:convert. Lasse Reichstein Nielsen 2015/10/28 13:55:47 Sounds reasonable. If there is no charset paramete Sounds reasonable. If there is no charset parameter, the default is US-ASCII, so the failure should only happen if we have a present-but-unrecognizable charset parameter.
	2984 String text = _text;

	2985 int start = _separatorIndices.last + 1;

	2986 if (isBase64) {

	2987 var converter = BASE64.decoder.fuse(encoding.decoder);

	2988 if (text.endsWith("%3D")) {

	2989 return converter.convert(Uri._uriDecode(text, start: start,

	2990 encoding: LATIN1));

	2991 }

	2992 return converter.convert(text.substring(start));

	2993 }

	2994 return Uri._uriDecode(text, start: start, encoding: encoding);

	2995 }

	2996

	2997 /**

	2998 * An iterable over the parameters of the data URI.

	2999 *

	3000 * A data URI may contain parameters between the the MIMI type and the
	nweiz 2015/10/15 21:09:03 Nit: "MIMI" -> "MIME" Nit: "MIMI" -> "MIME" Lasse Reichstein Nielsen 2015/10/16 14:38:45 :) :)
	3001 * data. This iterates through those parameters, returning each as a

	3002 * [DataUriParameter] pair of key and value.

	3003 */

	3004 Iterable<DataUriParameter> get parameters sync* {

	3005 for (int i = 3; i < _separatorIndices.length; i += 2) {

	3006 var start = _separatorIndices[i - 2] + 1;

	3007 var equals = _separatorIndices[i - 1];

	3008 var end = _separatorIndices[i];
	nweiz 2015/10/15 21:09:04 It looks like this will incorrectly accept invalid It looks like this will incorrectly accept invalid URIs that mix up semicolons, commas, and maybe equals as well. Lasse Reichstein Nielsen 2015/10/16 14:38:45 The parser should avoid that. This assumes that th The parser should avoid that. This assumes that the separator indices are correct and provided by the parser. nweiz 2015/10/19 19:51:20 That might be true. It's a good thing to write tes Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > The parser should avoid that. This assumes that the separator indices are > correct and provided by the parser. That might be true. It's a good thing to write tests for, though. Lasse Reichstein Nielsen 2015/10/28 13:55:47 It's not even possible to test it, because it's al It's not even possible to test it, because it's all hidden behind private functions and constructors. I'll add some assertions to ensure that invariants hold.
	3009 String key = Uri._uriDecode(_text, start: start, end: equals);

	3010 String value = Uri._uriDecode(_text, start: equals + 1, end: end);
	nweiz 2015/10/15 21:09:03 What about whitespace? If the spec is interpreted What about whitespace? If the spec is interpreted like other similar specs, there should be implicit whitespace between all tokens which will end up encoded as percent-escapes, and thus find its way into the slices of text this uses. If we want to match browser behavior (which we should), we ought to strip this whitespace. Also, mime types and parameters that are whitespace-only should be disallowed. Lasse Reichstein Nielsen 2015/10/16 14:38:45 The URI grammars don't generally allow space chara The URI grammars don't generally allow space characters. There is no implicit whitespace between the tokens - spaces are simply not allowed. If you include a space character, it needs to be escaped, but then it will also be treated as meaningful - escaped characters are included literally. Are you sure browsers allow whitespace in data URIs? I don't want to validate the parameters (I have no idea what they mean anyway), so as long as they are syntactically correct, it should be fine. Space is not allowed in RFC 2045 token (which parameter key and value must be). Percent is allowed, but probably shouldn't mean percent-encoding, so I guess we should just check each string against token chars and fail if they are invalid. nweiz 2015/10/19 19:51:20 Chrome accepts literal spaces in data URIs, but no Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > The URI grammars don't generally allow space characters. There is no implicit > whitespace between the tokens - spaces are simply not allowed. If you include a > space character, it needs to be escaped, but then it will also be treated as > meaningful - escaped characters are included literally. > > Are you sure browsers allow whitespace in data URIs? Chrome accepts literal spaces in data URIs, but not "%20". I suppose because we don't support literal spaces in Uri objects anyway we don't need to worry about it. Show quoted text > I don't want to validate the parameters (I have no idea what they mean anyway), > so as long as they are syntactically correct, it should be fine. Space is not > allowed in RFC 2045 token (which parameter key and value must be). Percent is > allowed, but probably shouldn't mean percent-encoding, so I guess we should just > check each string against token chars and fail if they are invalid. The data URI spec says that percent-encoding can be used to escape non-token chars in parameter values. Lasse Reichstein Nielsen 2015/10/28 13:55:47 ACK, so we must escape existing percent characters Show quoted text > The data URI spec says that percent-encoding can be used to escape non-token > chars in parameter values. ACK, so we must escape existing percent characters. Lasse Reichstein Nielsen 2015/10/28 13:55:47 It seems Chrome distinguishes between "data:text/h It seems Chrome distinguishes between "data:text/html;%20charset=UTF-8,<%c2%80>" and "data:text/html;charset=UTF-8,<%c2%80>", where only the latter is interpreted as UTF-8, so whitespace is not ignored. (What is shown on in the address bar is not the plain URI, it does some tweaks to make it more readable).
	3011 yield new DataUriParameter(key, value);

	3012 }

	3013 }

	3014

	3015 static DataUri _parse(String text, int start) {

	3016 assert(start == 0 \|\| start == 5);

	3017 assert((start == 5) == text.startsWith("data:"));

	3018

	3019 /// Character codes.

	3020 const int comma = 0x2c;

	3021 const int slash = 0x2f;

	3022 const int semicolon = 0x3b;

	3023 const int equals = 0x3d;

	3024 List indices = [start - 1];

	3025 int slashIndex = -1;

	3026 var char;

	3027 int i = start;

	3028 for (; i < text.length; i++) {

	3029 char = text.codeUnitAt(i);

	3030 if (char == comma \|\| char == semicolon) break;

	3031 if (char == slash) {

	3032 if (slashIndex < 0) {

	3033 slashIndex = i;

	3034 continue;

	3035 }

	3036 throw new FormatException("Invalid MIME type", text, i);

	3037 }

	3038 }

	3039 if (slashIndex < 0 && i > start) {

	3040 // An empty MIME type is allowed, but if non-empty it must contain

	3041 // exactly one slash.

	3042 throw new FormatException("Invalid MIME type", text, i);

	3043 }

	3044 while (char != comma) {

	3045 // parse parameters and/or "base64".

	3046 indices.add(i);

	3047 i++;

	3048 int equalsIndex = -1;

	3049 for (; i < text.length; i++) {

	3050 char = text.codeUnitAt(i);

	3051 if (char == equals) {

	3052 if (equalsIndex < 0) equalsIndex = i;

	3053 } else if (char == semicolon \|\| char == comma) {

	3054 break;

	3055 }

	3056 }

	3057 if (equalsIndex >= 0) {

	3058 indices.add(equalsIndex);

	3059 } else {

	3060 // Have to be final "base64".

	3061 var lastSeparator = indices.last;

	3062 if (char != comma \|\|

	3063 i != lastSeparator + 7 /* "base64,".length */ \|\|

	3064 !text.startsWith("base64", lastSeparator + 1)) {

	3065 throw new FormatException("Expecting '='", text, i);

	3066 }

	3067 break;

	3068 }

	3069 }

	3070 indices.add(i);

	3071 return new DataUri._(text, indices);

	3072 }

	3073

	3074 String toString() => text;

	3075

	3076 // Table of the `token` characters of RFC 2045 in a URI.

	3077 //

	3078 // A token is any US-ASCII character except SPACE, control characters and

	3079 // `tspecial` characters. The `tspecial` category is:

	3080 // '(', ')', '<', '>', '@', ',', ';', ':', '\', '"', '/', '[, ']', '?', '='.

	3081 //

	3082 // In a data URI, we also need to escape '%' and '#' characters.

	3083 static const _tokenCharTable = const [

	3084 // LSB MSB

	3085 // \| \|

	3086 0x0000, // 0x00 - 0x0f 00000000 00000000

	3087 0x0000, // 0x10 - 0x1f 00000000 00000000

	3088 // ! $ &' *+ -.

	3089 0x6cd2, // 0x20 - 0x2f 01001011 00110110

	3090 // 01234567 89

	3091 0x03ff, // 0x30 - 0x3f 11111111 11000000

	3092 // ABCDEFG HIJKLMNO

	3093 0xfffe, // 0x40 - 0x4f 01111111 11111111

	3094 // PQRSTUVW XYZ ^_

	3095 0xc7ff, // 0x50 - 0x5f 11111111 11100011

	3096 // `abcdefg hijklmno

	3097 0xffff, // 0x60 - 0x6f 11111111 11111111

	3098 // pqrstuvw xyz{\|}~

	3099 0x7fff]; // 0x70 - 0x7f 11111111 11111110

	3100 }

	3101

	3102 /**

	3103 * A parameter of a data URI.

	3104 *

	3105 * A parameter is a key and a value.

	3106 *

	3107 * The key and value are the actual values to be encoded into the URI.

	3108 * They will be escaped if necessary when creating a data URI,

	3109 * and have been unescaped when extracted from a data URI.

	3110 */

	3111 class DataUriParameter {
	nweiz 2015/10/15 21:09:03 Why isn't this just a map? Maps are much easier to Why isn't this just a map? Maps are much easier to use than custom data types. Lasse Reichstein Nielsen 2015/10/16 14:38:45 Because the same parameter name may occur more tha Because the same parameter name may occur more than once. Maps are not suitable for that (HTTP headers have have that problem). nweiz 2015/10/19 19:51:20 Can they? The MIME spec isn't explicit about this, Show quoted text On 2015/10/16 14:38:45, Lasse Reichstein Nielsen wrote: > Because the same parameter name may occur more than once. > Maps are not suitable for that (HTTP headers have have that problem). Can they? The MIME spec isn't explicit about this, but it seems to imply that it's possible to find a single canonical value of a parameter with a given name. Do you know of content types that require the use of multiple parameters with the same name? Lasse Reichstein Nielsen 2015/10/28 13:55:47 Good point. It doesn't actually look like paramete Good point. It doesn't actually look like parameters can be repeated, so I'll change it to a Map<String, String>.
	3112 /** Parameter key. */

	3113 final String key;

	3114 /** Parameter value. */

	3115 final String value;

	3116 DataUriParameter(this.key, this.value);

	3117

	3118 /**

	3119 * Creates an iterable of parameters from a map from key to value.

	3120 *

	3121 * Parameter keys are not required to be unique in a data URI, but

	3122 * when they are, a map can be used to represent the parameters, and

	3123 * this function provides a way to access the map pairs as parameter

	3124 * values.

	3125 */

	3126 static Iterable<DataUriParameter> fromMap(Map<String, String> headers) sync* {

	3127 for (String key in headers.keys) {

	3128 yield new DataUriParameter(key, headers[key]);

	3129 }

	3130 }

	3131 }

OLD	NEW

« no previous file with comments | « sdk/lib/core/core.dart ('k') | tests/corelib/data_uri_test.dart » ('j') | tests/corelib/data_uri_test.dart » ('J')