|
|
Created:
4 years, 6 months ago by Lasse Reichstein Nielsen Modified:
4 years, 5 months ago CC:
reviews_dartlang.org Base URL:
https://github.com/dart-lang/sdk.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionAdd fast-mode Uri class.
Optimize parser and make it recognize a class of URIs that don't need extra
handling: no escapes, no funny characters, already (nearly or completely)
normalized.
Have a class specifically for those URIs which retains the original
input string without having allocate any further strings.
R=floitsch@google.com
Committed: https://github.com/dart-lang/sdk/commit/00090a0c7237b636a73f048e53966879e74ec55a
Committed: https://github.com/dart-lang/sdk/commit/323ca7e410765f98a88d078c8c8c388fd972eba9
Patch Set 1 #
Total comments: 34
Patch Set 2 : Extract Uri interface class, address comments. #Patch Set 3 : Fix bug in resolution when base path doesn't contain slash. #Patch Set 4 : Tweak operator== #
Total comments: 25
Patch Set 5 : Address comments. Add more tests. Fix bug in old code found by tests. #Patch Set 6 : Add changelog. Fix code depending on now fixed bug in Uri.resolve. #
Messages
Total messages: 15 (4 generated)
lrn@google.com changed reviewers: + floitsch@google.com
I see a >100% speedup on parsing all URIs occuring in import/export/part statements in the Dart SDK repository (manually extracted). The analyzer package has their own `FastUri` class because the default class is too slow. I hope this will be an improvement for them, and everybody else. The state-based tokenizer tables can be created more efficiently from pre-computed tables, but I prefer to have the code that creates them in the repository - then we can always create tables from that, and if we need to change them again, we can go back to code that generate them again.
LGTM. Are you sure that the tests cover both URI classes now? What about the resolution that needs to go from simple to non-simple? https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:417: return Uriata._parse(uri.substring(start + 5, end), 0, null).uri; UriData I'm guessing you didn't run the tests? https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:420: var indices = _scanUri(uri, start, end); Provide example of what the indices could look like. (probably also as dartdoc on the '_scanUri' function. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:422: int schemeEnd = indices[_schemeEndIndex]; // >0 if has scheme Finish with ".". https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:434: if (fragmentStart < queryStart) queryStart = fragmentStart; It's not clear why this wouldn't be done by the _scanUri function. Same for the other derived indices. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:453: } Is it necessary to enter the 'else' below? if not -> "else if". Otherwise add comment or move this check into the else below. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:499: // Http URIs should not have an explicit port of 80 Finish with ".". https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3411: const int _nonSimpleEndStates = 14; This needs a comment and should be separated from the other constants. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3423: void _setChars(Uint8List target, String chars, int value) { Needs dartdocs. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3426: target[char ^ 0x60] = value; Needs explanation what the ^ 0x60 does. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3430: void _setRange(Uint8List target, String range, int value) { Needs dartdocs. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3436: List<Uint8List> _createTables() { Needs documentation. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3444: const pchar = "$unreserved$subDelims"; what does the "p" stand for? https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3615: // use 0x1f (nee 0x7f) to represent all unhandled characters. Use https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3616: if (char > 0x5f) char = 0x1f; // TODO: check if negating test is better. TODO(ldap) https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3626: indices[_portStartIndex] = start; // equals effective pathStart if no auth. "Equals" https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3647: bool _isUpperCase(int char) { unused? https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:4019: // TODO: Deprecate Remove Kill Crush Destroy! TODO(ldap). Also, make this more informative.
Restructured the file to extract a Uri interface instead of using the Uri class. Also moved some constants closer to where they are used, and reused constants further out to be in scope everywhere. It was a bad idea to have _SimpleUri implement Uri when it had all the private members too. And more tests and derived bug-fixes. So, PTAL again https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:417: return Uriata._parse(uri.substring(start + 5, end), 0, null).uri; I did. Turns out we never tested a data URI starting with a non-lower-case "data" - and that we didn't accept it. Fixed now. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:420: var indices = _scanUri(uri, start, end); On 2016/06/28 00:28:03, floitsch wrote: > Provide example of what the indices could look like. > (probably also as dartdoc on the '_scanUri' function. Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:422: int schemeEnd = indices[_schemeEndIndex]; // >0 if has scheme Just removed, it's documented in the _scanUri docs. (And it was really ">start"). https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:434: if (fragmentStart < queryStart) queryStart = fragmentStart; It absolutely could. More to the point, the _scanUri function should really be inlined here - it's the only place it's used, and the separation isn't logical (as you say, this feels like it belongs closer to the scanning). I think I'll keep it as-is for now, but add a "TODO(lrn): consider inlining _scanUri" https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:453: } On 2016/06/28 00:28:03, floitsch wrote: > Is it necessary to enter the 'else' below? > if not -> "else if". > Otherwise add comment or move this check into the else below. Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:499: // Http URIs should not have an explicit port of 80 On 2016/06/28 00:28:03, floitsch wrote: > Finish with ".". Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3411: const int _nonSimpleEndStates = 14; Commented and moved. Most of the states have been moved into the _createTable function. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3423: void _setChars(Uint8List target, String chars, int value) { On 2016/06/28 00:28:03, floitsch wrote: > Needs dartdocs. Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3426: target[char ^ 0x60] = value; Now described on the _scannerTables declaration. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3430: void _setRange(Uint8List target, String range, int value) { On 2016/06/28 00:28:03, floitsch wrote: > Needs dartdocs. Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3436: List<Uint8List> _createTables() { On 2016/06/28 00:28:04, floitsch wrote: > Needs documentation. Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3444: const pchar = "$unreserved$subDelims"; This refers to the "pchar" character group in RFC 3986. Added documentation. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3615: // use 0x1f (nee 0x7f) to represent all unhandled characters. On 2016/06/28 00:28:03, floitsch wrote: > Use Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3616: if (char > 0x5f) char = 0x1f; // TODO: check if negating test is better. Or check it. It isn't faster. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3626: indices[_portStartIndex] = start; // equals effective pathStart if no auth. On 2016/06/28 00:28:04, floitsch wrote: > "Equals" Done. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:3647: bool _isUpperCase(int char) { Indeed. Gone. https://codereview.chromium.org/2086613003/diff/1/sdk/lib/core/uri.dart#newco... sdk/lib/core/uri.dart:4019: // TODO: Deprecate Remove Kill Crush Destroy! Done. Whoops. :)
... or wait until I fix a bug in path-merging with a relative base path.
Fixed, now PTAL.
LGTM. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:785: // easier to do here because we already have extracted variables from the I think you should do the normalizations in _scanUri. It can't cost that much to read it out there too, and to write it back. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1154: int parseHex(int start, int end) { newline before and after nested functions. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1200: try { I would assume that the try/catch costs performance. The first one shouldn't even be an exceptions since having an ip4 address is valid. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1432: // The default port for the scheme of this Uri.. Remove trailing ".". https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1441: String get query => (_query == null) ? "" : _query; _query ?? "" https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1443: String get fragment => (_fragment == null) ? "" : _fragment; _fragment ?? "" https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3180: int dataDelta = _startsWithData(uri, 0); startsWithData should return a bool, and you shouldn't take the substring. (see my other comments). https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3397: assert((start == 5) == text.startsWith("data:")); (start == 5) == text.toLowerCase().startsWith("data:"); https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3872: /// - [_portStartIndex]: Either [start] if the URI has no authority component, or long line. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:4135: _Uri._makeQuery(query, 0, _stringOrNullLength(query), queryParameters); long line. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:4373: int _startsWithData(String text, int start) { This should return a bool. (return delta == 0 || delta == 0x20) https://codereview.chromium.org/2086613003/diff/60001/tests/corelib/uri_test.... File tests/corelib/uri_test.dart (right): https://codereview.chromium.org/2086613003/diff/60001/tests/corelib/uri_test.... tests/corelib/uri_test.dart:198: base = Uri.parse("s://h/p?q#f%20"); // A non-simpe base finish with ".".
Description was changed from ========== Add fast-mode Uri class. Optimize parser and make it recognize a class of URIs that don't need extra handling: no escapes, no funny characters, already (nearly or completely) normalized. Have a class specifically for those URIs which retains the original input string without having allocate any further strings. ========== to ========== Add fast-mode Uri class. Optimize parser and make it recognize a class of URIs that don't need extra handling: no escapes, no funny characters, already (nearly or completely) normalized. Have a class specifically for those URIs which retains the original input string without having allocate any further strings. R=floitsch@google.com Committed: https://github.com/dart-lang/sdk/commit/00090a0c7237b636a73f048e53966879e74ec55a ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) manually as 00090a0c7237b636a73f048e53966879e74ec55a (presubmit successful).
Message was sent while issue was closed.
https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:785: // easier to do here because we already have extracted variables from the It probably shouldn't but in practice I see a ~10% regression on my benchmark (which is basically parsing all URIs seen in import statements in the Dart SDK repository, weighted by count). I went in the other direction and inlined _scanUri. It might still cost 2-3% but it might also just be noise. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1154: int parseHex(int start, int end) { On 2016/06/29 23:41:47, floitsch wrote: > newline before and after nested functions. Done. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1200: try { Rewritten to not use try/catch. This means that the error message for an invalid IPv4 address inside an IPv6 address changes to say "invalid IPv4 address", but I guess that's acceptable. It only happens if the IPv6 address part contains a ".", so it is likely to be intended as an IPv4 address. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1432: // The default port for the scheme of this Uri.. On 2016/06/29 23:41:47, floitsch wrote: > Remove trailing ".". Done. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1441: String get query => (_query == null) ? "" : _query; On 2016/06/29 23:41:47, floitsch wrote: > _query ?? "" Done. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:1443: String get fragment => (_fragment == null) ? "" : _fragment; On 2016/06/29 23:41:47, floitsch wrote: > _fragment ?? "" Done. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3180: int dataDelta = _startsWithData(uri, 0); Wouldn't work. I need to distinguish "data:" from "Data:". https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3397: assert((start == 5) == text.startsWith("data:")); We *need* to have a lower-case scheme. We promise to do scheme-normalization, so code will expect .toString() to return something starting with "data:", and the code is allowed to it against lower-case strings. So, storing "Data:..." is a complete waste of space because we have to ignore it everywhere and remove it when doing toString. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:3872: /// - [_portStartIndex]: Either [start] if the URI has no authority component, or Reworded and moved after inlining _scanUri. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:4135: _Uri._makeQuery(query, 0, _stringOrNullLength(query), queryParameters); On 2016/06/29 23:41:47, floitsch wrote: > long line. Done. https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:4373: int _startsWithData(String text, int start) { Again, I actually need to distinguish the two cases. This may be an overly tricky way to do it, but it gets the job done. Would it be better if I had named constants for "starts with lower case data" (0) and "starts with non-lower case data" (0x20)? https://codereview.chromium.org/2086613003/diff/60001/tests/corelib/uri_test.... File tests/corelib/uri_test.dart (right): https://codereview.chromium.org/2086613003/diff/60001/tests/corelib/uri_test.... tests/corelib/uri_test.dart:198: base = Uri.parse("s://h/p?q#f%20"); // A non-simpe base Argh, missed that one. Will fix.
Message was sent while issue was closed.
Description was changed from ========== Add fast-mode Uri class. Optimize parser and make it recognize a class of URIs that don't need extra handling: no escapes, no funny characters, already (nearly or completely) normalized. Have a class specifically for those URIs which retains the original input string without having allocate any further strings. R=floitsch@google.com Committed: https://github.com/dart-lang/sdk/commit/00090a0c7237b636a73f048e53966879e74ec55a ========== to ========== Add fast-mode Uri class. Optimize parser and make it recognize a class of URIs that don't need extra handling: no escapes, no funny characters, already (nearly or completely) normalized. Have a class specifically for those URIs which retains the original input string without having allocate any further strings. R=floitsch@google.com Committed: https://github.com/dart-lang/sdk/commit/00090a0c7237b636a73f048e53966879e74ec55a Committed: https://github.com/dart-lang/sdk/commit/323ca7e410765f98a88d078c8c8c388fd972eba9 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001) manually as 323ca7e410765f98a88d078c8c8c388fd972eba9 (presubmit successful).
Message was sent while issue was closed.
https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/2086613003/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:4373: int _startsWithData(String text, int start) { On 2016/06/30 10:27:31, Lasse Reichstein Nielsen wrote: > Again, I actually need to distinguish the two cases. > This may be an overly tricky way to do it, but it gets the job done. > Would it be better if I had named constants for "starts with lower case data" > (0) and "starts with non-lower case data" (0x20)? I think that would be better. Even just changing the name of the function would already be an improvement.
Message was sent while issue was closed.
sra@google.com changed reviewers: + sra@google.com
Message was sent while issue was closed.
This CL adds 9k to swarm and 23k to some angular2 dart2js code size benchmarks. Is there a way to add fast paths without adding so much code? I think the old URI class already cost too much code size for dart2js programs (considering browsers already have various methods to process URIs). This CL makes the dart:core URI tax even higher. |