|
|
Description[es6] Optimize TypedArray.subarray()
````
var array = new Uint8Array(65000);
var startDate = Date.now();
var counter = 0;
while (counter++ < 50000000) {
array.subarray(start, end);
}
var endDate = Date.now();
print(endDate - startDate);
````
4200 ms -> 3500 ms (16.67%)
BUG=
Committed: https://crrev.com/1e2aecf3635a5fb01607fa65511d67902735d90c
Cr-Commit-Position: refs/heads/master@{#30770}
Patch Set 1 #Patch Set 2 : macro version #Patch Set 3 : remove unused references #
Total comments: 1
Messages
Total messages: 17 (3 generated)
karl@skomski.com changed reviewers: + jkummerow@chromium.org, littledan@chromium.org
PTAL
I'm not sure about this one. Replacing MathMax() calls with verbose hand-written comparisons for performance reasons is sad. Can't we have a max() function that's fast enough to be usable? Also, given that .subarray() ends up having to call the runtime anyway, it'd be a prime candidate for moving the entire implementation to C++, which would probably be even faster than micro-optimizing the JS bits. Dan, would you like to land this as an incremental improvement anyway? Karl, is there any non-microbenchmark use case where this has an impact, or did you just optimize it "because we can"?
On 2015/09/11 09:52:26, Jakob wrote: > I'm not sure about this one. > > Replacing MathMax() calls with verbose hand-written comparisons for performance > reasons is sad. Can't we have a max() function that's fast enough to be usable? > > Also, given that .subarray() ends up having to call the runtime anyway, it'd be > a prime candidate for moving the entire implementation to C++, which would > probably be even faster than micro-optimizing the JS bits. > > Dan, would you like to land this as an incremental improvement anyway? > > Karl, is there any non-microbenchmark use case where this has an impact, or did > you just optimize it "because we can"? I optimized it since I want to use subarray in node.js to replace the custom slice implementation. https://github.com/nodejs/node/pull/2777 I can move it to C++.
On 2015/09/11 at 10:04:48, karl wrote: > On 2015/09/11 09:52:26, Jakob wrote: > > I'm not sure about this one. > > > > Replacing MathMax() calls with verbose hand-written comparisons for performance > > reasons is sad. Can't we have a max() function that's fast enough to be usable? > > > > Also, given that .subarray() ends up having to call the runtime anyway, it'd be > > a prime candidate for moving the entire implementation to C++, which would > > probably be even faster than micro-optimizing the JS bits. > > > > Dan, would you like to land this as an incremental improvement anyway? > > > > Karl, is there any non-microbenchmark use case where this has an impact, or did > > you just optimize it "because we can"? > > I optimized it since I want to use subarray in node.js to replace the custom slice implementation. > https://github.com/nodejs/node/pull/2777 > > I can move it to C++. Agreed that this is sad. Inlining JS natives better might make this unnecessary, but that's hard. I'm not really sure what moving it to C++ would get you. The important part (invoking the constructor) already ends up calling out to C++. To me, this looks like a case where the Javascript is doing small surface API work and nothing computationally significant, so if we can work things out cleanly and efficiently in JS, I think it'd make sense to leave it there long-term. What if we made a macro which did a simple max calculation, without all the NaN-related logic and multiple arguments logic, inline? Would that get the same speedup? This macro could be applied in many places. It could be defined in macros.py for use in any JS natives file. I confess, I have already pulled a patch from the Node.js team which included a similar change whose main benefit was avoiding calling MathMax https://codereview.chromium.org/1231673008 . If we could fix this problem in general with a macro, that would be great, and much easier to use everywhere than rewriting everything in C++ (even if we should eventually do that). karl, what other TypedArray performance issues are you facing?
On 2015/09/11 13:53:15, Dan Ehrenberg wrote: > Agreed that this is sad. Inlining JS natives better might make this unnecessary, > but that's hard. Inlining is not the only problem. As you allude to below, Math.min/max also have pretty crazy spec-defined semantics, so will never be as fast as a simple comparison. > I'm not really sure what moving it to C++ would get you. The highlights: - you don't have to worry about JS-spec-induced speed traps for simple things like max() - you don't have to worry about ICs, deopts, inlineability, and crafting platform-specific machine code for compiler intrinsics either - cross-process (!) memory sharing as opposed to per-context/per-isolate memory cost - runtime call overhead upper-bounded to 1 call - it's nice to have one consistent, safe, canonical implementation of everything > The important part > (invoking the constructor) already ends up calling out to C++. To me, this looks > like a case where the Javascript is doing small surface API work and nothing > computationally significant, so if we can work things out cleanly and > efficiently in JS, I think it'd make sense to leave it there long-term. The rule of thumb is: if it's 100% pure JS, or if it's most naturally expressed in JS because it requires JS semantics, implement it in JS; otherwise prefer C++. Since this uses intrinsics and runtime calls already, it falls into the latter bucket. > What if we made a macro which did a simple max calculation, without all the > NaN-related logic and multiple arguments logic, inline? Would that get the same > speedup? This macro could be applied in many places. It could be defined in > macros.py for use in any JS natives file. That would mitigate some of the immediate pain here and would constitute an incremental improvement. Doesn't have to happen in this CL, though. > I confess, I have already pulled a patch from the Node.js team which included a > similar change whose main benefit was avoiding calling MathMax > https://codereview.chromium.org/1231673008 . If we could fix this problem in > general with a macro, that would be great, and much easier to use everywhere > than rewriting everything in C++ (even if we should eventually do that). > > karl, what other TypedArray performance issues are you facing?
On 2015/09/11 15:03:45, Jakob wrote: > On 2015/09/11 13:53:15, Dan Ehrenberg wrote: > > Agreed that this is sad. Inlining JS natives better might make this > unnecessary, > > but that's hard. > > Inlining is not the only problem. As you allude to below, Math.min/max also have > pretty crazy spec-defined semantics, so will never be as fast as a simple > comparison. > > > I'm not really sure what moving it to C++ would get you. > > The highlights: > - you don't have to worry about JS-spec-induced speed traps for simple things > like max() > - you don't have to worry about ICs, deopts, inlineability, and crafting > platform-specific machine code for compiler intrinsics either > - cross-process (!) memory sharing as opposed to per-context/per-isolate memory > cost > - runtime call overhead upper-bounded to 1 call > - it's nice to have one consistent, safe, canonical implementation of everything ```` RUNTIME_FUNCTION(Runtime_TypedArraySubArrayImpl) { HandleScope scope(isolate); DCHECK(args.length() == 3); if (!args[0]->IsJSTypedArray()) { THROW_NEW_ERROR_RETURN_FAILURE( isolate, NewTypeError(MessageTemplate::kNotTypedArray)); } Handle<JSTypedArray> buffer(JSTypedArray::cast(args[0])); CONVERT_SMI_ARG_CHECKED(begin, 1); CONVERT_SMI_ARG_CHECKED(end, 2); int buffer_length = buffer->length_value(); if (begin < 0) { begin = Max(0, buffer_length + begin); } else { begin = Min(buffer_length, begin); } if (end < 0) { end = Max(0, buffer_length + end); } else { end = Min(buffer_length, end); } if (end < begin) { end = begin; } int new_length = end - begin; int byte_offset = NumberToInt32(buffer->byte_offset()); size_t begin_byte_offset = byte_offset + begin * buffer->element_size(); return *isolate->factory()->NewJSTypedArray( buffer->type(), buffer->GetBuffer(), begin_byte_offset, new_length); } ```` I tried a simple c++ implementation but it's actually 3 (30%) seconds slower even with removing the excessive checks in NewJSTypedArray. Still did the the TO_INTEGER checks in JS because I did not found a good replacement in runtime-utils.h. Back to this CR I pushed a simple macro version that it as fast as the verbose if variant. PTAL
Since the subarray patch landed in nodejs it would be great to land this, too. If the macro variant looks good I could also make a new CR that replaces MathMax or MathMin with the macro in ArrayCopyWithin, Slice etc.
On 2015/09/16 at 15:01:08, karl wrote: > Since the subarray patch landed in nodejs it would be great to land this, too. > If the macro variant looks good I could also make a new CR that replaces MathMax or MathMin > with the macro in ArrayCopyWithin, Slice etc. I'm not sure what macro you're referring to. Looks like copyWithin calls MathMax/MathMin, and ComputeSpliceStartIndex uses its own comparisons.
On 2015/09/16 at 15:44:45, Dan Ehrenberg wrote: > On 2015/09/16 at 15:01:08, karl wrote: > > Since the subarray patch landed in nodejs it would be great to land this, too. > > If the macro variant looks good I could also make a new CR that replaces MathMax or MathMin > > with the macro in ArrayCopyWithin, Slice etc. > > I'm not sure what macro you're referring to. Looks like copyWithin calls MathMax/MathMin, and ComputeSpliceStartIndex uses its own comparisons. Oh, I misread. Yes, this macro variant looks good to me, and applying it in more cases where it affects performance seems like a good idea.
lgtm
The CQ bit was checked by karl@skomski.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1331993004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1331993004/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/1e2aecf3635a5fb01607fa65511d67902735d90c Cr-Commit-Position: refs/heads/master@{#30770}
Message was sent while issue was closed.
mstarzinger@chromium.org changed reviewers: + mstarzinger@chromium.org
Message was sent while issue was closed.
Hmmm ... https://codereview.chromium.org/1331993004/diff/40001/src/macros.py File src/macros.py (right): https://codereview.chromium.org/1331993004/diff/40001/src/macros.py#newcode160 src/macros.py:160: macro MAX_SIMPLE(argA, argB) = (argA < argB ? argB : argA); Please be aware that this macro will expand "MAX_SIMPLE(a + b, 0)" to "a + b < 0 ? a + b : 0" which will evaluate the addition twice in the baseline compiler (which is the only compiler where this macro makes any sense in the first place). If you wanted to guard against that, then the %IS_VAR marker would be the right thing. But then you would need to adapt the use-sites accordingly. And I see that this is now being used left and right in the builtin JavaScript files and almost all use-sites are affected. |