|
|
Created:
5 years, 1 month ago by sra1 Modified:
5 years, 1 month ago CC:
reviews_dartlang.org, Siggi Cherem (dart-lang), floitsch Base URL:
https://github.com/dart-lang/sdk.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionShort-circuit Uri._uriEncode for components that don't need encoding.
For dart2js:
Uri.pathSegments.easy: +792%
Uri.pathSegments.hard.escaped: +8%
Uri.pathSegments.hard.normalized: +152%
For VM:
Uri.pathSegments.easy: +117%
Uri.pathSegments.hard.escaped: -8%
Uri.pathSegments.hard.normalized: +35%
We could probably get a little more in the browser if we make the check a patched '_platformUriEncode' method, but the 'encoding' parameter makes it impossible to remove _uriEncode completely.
R=lrn@google.com
Committed: https://github.com/dart-lang/sdk/commit/6ddbaa55a4bab95baf49ea54b5d50d85a4995dcf
Patch Set 1 : #
Total comments: 4
Messages
Total messages: 18 (6 generated)
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Description was changed from ========== Short-circuit Uri._uriEncode for components that don't need encoding. For dart2js: Uri.pathSegments.easy: +792% Uri.pathSegments.hard.escaped: +8% Uri.pathSegments.hard.normalized: +152% For VM: Uri.pathSegments.easy: +117% Uri.pathSegments.hard.escaped: -8% Uri.pathSegments.hard.normalized: +35% We could probably get a little more in the browser if we make the check a patched '_platformUriEncode' method, but the 'encoding' parameter makes it impossible to remove _uriEncode completely. ========== to ========== Short-circuit Uri._uriEncode for components that don't need encoding. For dart2js: Uri.pathSegments.easy: +792% Uri.pathSegments.hard.escaped: +8% Uri.pathSegments.hard.normalized: +152% For VM: Uri.pathSegments.easy: +117% Uri.pathSegments.hard.escaped: -8% Uri.pathSegments.hard.normalized: +35% We could probably get a little more in the browser if we make the check a patched '_platformUriEncode' method, but the 'encoding' parameter makes it impossible to remove _uriEncode completely. ==========
sra@google.com changed reviewers: + lrn@google.com
Patchset #1 (id:40001) has been deleted
I would rather avoid using a RegExp as part of the parser. Instead it should be possible to loop without doing anything until we see the first character which needs to be encoded, and if we never see on, we can just use the original string. At least I'd prefer to compare the performance to this solution.
Another thing we can do is to special-case the entire function for the most common encodings (UTF-8, LATIN-1 and ASCII), so we can work directly on the original string instead of running the encoding first. I guess LATIN-1 and ASCII might even be able to share code. This assumes that non-ASCII characters are rare in practice. https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2267: if (_needsNoEncoding.hasMatch(text)) return text; This should be a regexp that matches canonicalTable. This function handles encoding for all possible tables.
For a different approach, please try comparing with: https://codereview.chromium.org/1425373005
https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2267: if (_needsNoEncoding.hasMatch(text)) return text; Ah, finally realized that the only characters used are the ones that are not encoded by any encoding table.
For even more optimization, please compare with https://codereview.chromium.org/1409053007 as well.
This CL adds 461 bytes and gives 8x on the common easy case. The first suggestion does not improve performance and adds 1629 bytes. The second suggestion does not compile, but adding 200 lines of code will surely increase the size of the generated app by too much. A non-regexp alternative adds 912 bytes and has equivalent performance: https://codereview.chromium.org/1428163002/ Can we proceed with this CL and open an issue to add a VM vs patch?
On 2015/11/03 19:00:57, sra1 wrote: > This CL adds 461 bytes and gives 8x on the common easy case. > > The first suggestion does not improve performance and adds 1629 bytes. > > The second suggestion does not compile, but adding 200 lines of code will surely > increase the size of the generated app by too much. > > A non-regexp alternative adds 912 bytes and has equivalent performance: > https://codereview.chromium.org/1428163002/ > > Can we proceed with this CL and open an issue to add a VM vs patch? Fair enough, it does work, and the VM's RegExps aren't slow any more. I'll see if I can optimize it enough for the VM to make it worth doing a platform split. LGTM.
Message was sent while issue was closed.
Committed patchset #1 (id:60001) manually as 6ddbaa55a4bab95baf49ea54b5d50d85a4995dcf (presubmit successful).
Message was sent while issue was closed.
iposva@google.com changed reviewers: + iposva@google.com
Message was sent while issue was closed.
Please provide a non-regexp solution. Thanks, -Ivan https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2256: static final RegExp _needsNoEncoding = new RegExp(r'^[\-\.0-9A-Z_a-z~]*$'); Please refrain from using RegExp in core library code. They are extremely expensive to generate and should be avoided at all cost. Also they do not necessarily execute fast depending on the execution platform or OS. Measuring on a desktop gives you bad information in that matter.
Message was sent while issue was closed.
Lasse: considering Ivan's comment, can you set up platform patches?
Message was sent while issue was closed.
Yes, will do. https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart File sdk/lib/core/uri.dart (right): https://codereview.chromium.org/1411413008/diff/60001/sdk/lib/core/uri.dart#n... sdk/lib/core/uri.dart:2267: if (_needsNoEncoding.hasMatch(text)) return text; Only do this if encoding is UTF8, LATIN1 or ASCII (or even only if it's UTF8 since that's the common case). Otherwise we don't know that an "A" is encoded as 0x41. I doubt we'll have EBCDIC encoding (letters are above 128), but to be fair, we should allow for it. It's too late to remove the encoding argument (if we want to reuse the function, at least, there is only one call that actually passed the encoding). If the optional parameters cause extra code, just make them non-optional. We can call with ..., UTF8, false) in the places where the parameter is not passed. |