Issue 396733006: Optimize the multiplication of two 32-bit unsigned integers.

Issue 396733006: Optimize the multiplication of two 32-bit unsigned integers. (Closed)

Created:
6 years, 5 months ago by regis

Modified:
6 years, 5 months ago

Reviewers:
Vyacheslav Egorov (Google), Cutch, srdjan, sra1

CC:
reviews_dartlang.org, vm-dev_dartlang.org

Base URL:
http://dart.googlecode.com/svn/branches/bleeding_edge/dart/

Visibility:
Public.

More Reviews

Description

Optimize the multiplication of two 32-bit unsigned integers. R=johnmccutchan@google.com, srdjan@google.com, vegorov@google.com Committed: https://code.google.com/p/dart/source/detail?r=38400

Patch Set 5 : #

Created: 6 years, 5 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+86 lines, -56 lines)			Patch
M	runtime/vm/flow_graph_optimizer.cc	View	1 2 3 4	3 chunks	+2 lines, -17 lines	0 comments	Download
M	runtime/vm/intermediate_language.h	View	1 2 3 4	1 chunk	+2 lines, -1 line	0 comments	Download
M	runtime/vm/intermediate_language_arm.cc	View	1 2 3 4	5 chunks	+30 lines, -13 lines	0 comments	Download
M	runtime/vm/intermediate_language_ia32.cc	View	1 2 3 4	6 chunks	+52 lines, -25 lines	0 comments	Download

Messages

Total messages: 16 (0 generated)

Expand Messages | Collapse Messages

sra1

https://codereview.chromium.org/396733006/diff/1/runtime/vm/intermediate_language_ia32.cc File runtime/vm/intermediate_language_ia32.cc (right): https://codereview.chromium.org/396733006/diff/1/runtime/vm/intermediate_language_ia32.cc#newcode5937 runtime/vm/intermediate_language_ia32.cc:5937: __ shrl(right_hi, Immediate(30)); Maybe I misunderstand, but... Why test ...

6 years, 5 months ago (2014-07-16 00:58:27 UTC) #2

regis

6 years, 5 months ago (2014-07-16 02:23:22 UTC) #3

srdjan

LGTM with comments https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_optimizer.cc File runtime/vm/flow_graph_optimizer.cc (right): https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_optimizer.cc#newcode2041 runtime/vm/flow_graph_optimizer.cc:2041: HasOnlyTwoOf(ic_data, kInt32x4Cid)) { How do you ...

6 years, 5 months ago (2014-07-16 16:55:23 UTC) #4

Cutch

The Uint32 work isn't complete. Will discuss offline. https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc File runtime/vm/intermediate_language_arm.cc (right): https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc#newcode6356 runtime/vm/intermediate_language_arm.cc:6356: __ ...

6 years, 5 months ago (2014-07-16 18:09:48 UTC) #5

Vyacheslav Egorov (Google)

not lgtm lets not add optimistic special cases that have no disabling mechanism and that ...

6 years, 5 months ago (2014-07-16 19:51:22 UTC) #7

regis

Thanks all. PTAL https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_optimizer.cc File runtime/vm/flow_graph_optimizer.cc (right): https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_optimizer.cc#newcode2041 runtime/vm/flow_graph_optimizer.cc:2041: HasOnlyTwoOf(ic_data, kInt32x4Cid)) { On 2014/07/16 16:55:23, ...

6 years, 5 months ago (2014-07-16 23:12:42 UTC) #8

Thanks all.

PTAL

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_op...
File runtime/vm/flow_graph_optimizer.cc (right):

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_op...
runtime/vm/flow_graph_optimizer.cc:2041: HasOnlyTwoOf(ic_data, kInt32x4Cid)) {
On 2014/07/16 16:55:23, srdjan wrote:
> How do you prevent repeated deoptimizations? Maybe check
> 'ic_data.HasDeoptReason' as seen above?

It's done above in the same way as for ADD and SUB.

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/flow_graph_op...
runtime/vm/flow_graph_optimizer.cc:2042: operands_type = kInt32x4Cid;
On 2014/07/16 16:55:23, srdjan wrote:
> Optional alternative (something like this):
> 
> } else if (HasOnlyTwoOf(ic_data, kInt32x4Cid)) {
>   if (op_kind == Token::kMUL) {
>     return;
>   }
>   operands_type = kInt32x4Cid;
> }

Done, but assert not kMUL, rather than return false.

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
File runtime/vm/intermediate_language_arm.cc (right):

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
runtime/vm/intermediate_language_arm.cc:6356: __ smull(out, IP, left, right);
On 2014/07/16 18:09:48, Cutch wrote:
> Have you tested this code path?

Yes, I have ran all the tests, including the RSA benchmark with
--unbox_mints=true and false.

I have not tested the path when we do not have ARMv7, however. But I changed
this code not to check for overflow and can use the MUL instruction available on
all supported ARM versions.

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
File runtime/vm/intermediate_language_ia32.cc (right):

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
runtime/vm/intermediate_language_ia32.cc:5939: __ j(NOT_ZERO, deopt);
On 2014/07/16 19:51:22, Vyacheslav Egorov (Google wrote:
> I am very wary of this kind of optimistic optimizations that have no built-in
> mechanism for disabling them. If somebody writes multiplication that from time
> to time multiplies smth non uint32-ish then their code will be stuck in the
> deoptimization loop. 
> 
> We saw example of why such situations are not desirable with Uint32List loads.
> 
> Additionally this feels somewhat backwards to me: should not we just ensure
that
> bigint arithmetic never emits this kind of multiplication? We started UInt32
> support precisely for this reason. If it does not hit the code that we are
> writing ourselves then maybe we should fix it first.

Fair enough. I have added a range check, so that this code is only generated if
the inputs are guaranteed to be uint32. In other words, the code does not check
the inputs anymore. However, the code can still deopt if the result is too
large, but I think this is OK and no different than deopt for addition or
subtraction.

The new bigint implementation is not yes complete and critical parts will be
hand coded in assembly. For now, 16-bit positive smis are multiplied and result
in positive mints. In order for these to be handled by unit32 ops, corresponding
mint ops must exist to be replaced. This is why I had to implement this mint
code.

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
runtime/vm/intermediate_language_ia32.cc:6255: __ mull(right);  // Result in
EDX:EAX, CF set if EDX != 0.
On 2014/07/16 18:09:48, Cutch wrote:
> Uint32 operations are only used in places where we know we don't care about
> overflow.

I have removed the overflow check.

Vyacheslav Egorov (Google)

LGTM to unblock you. Please fix ARM before landing and revert changes related to range ...

6 years, 5 months ago (2014-07-16 23:51:35 UTC) #9

regis

On 2014/07/16 23:51:35, Vyacheslav Egorov (Google wrote: > LGTM to unblock you. > > Please ...

6 years, 5 months ago (2014-07-18 02:44:30 UTC) #10

On 2014/07/16 23:51:35, Vyacheslav Egorov (Google wrote:
> LGTM to unblock you.
> 
> Please fix ARM before landing and revert changes related to range checking in
> optimizer.
> 
> Fair point about needing MintOperation right now if we want our narrowing pass
> to find it. We can make that pass to operate on "non-lowered" arithmetic
though.
> 
> 
> Ideally given two integers x and y stuff like (x * y) & 0xFFFFFFFF should
always
> result in Uint32 operation independent of whether x * y can be optimistically
> lowered to Mint operation or not.
> 
>
https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
> File runtime/vm/intermediate_language_arm.cc (right):
> 
>
https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_...
> runtime/vm/intermediate_language_arm.cc:6142: __ smull(out_lo, out_hi,
left_lo,
> right_lo);
> should not this be umul given that we look at hi? (smul and umul match at lo
> part but not on hi).
> 
> I feel like this miscomputes result of 0xFFFFFFFF * 0xFFFFFFFF which would be
1
> after smul but should not be.
> 
> Please write a pure Dart test on this.
> 
>
https://codereview.chromium.org/396733006/diff/40001/runtime/vm/flow_graph_op...
> File runtime/vm/flow_graph_optimizer.cc (right):
> 
>
https://codereview.chromium.org/396733006/diff/40001/runtime/vm/flow_graph_op...
> runtime/vm/flow_graph_optimizer.cc:2057: if ((left_range == NULL) ||
> Unfortunately range information is not available at this point yet so this
check
> would always fail. You would have to resort for runtime check.

I'll chat with John to see what we can do for uint32 operations.

For now, I have changed the mint multiplication to support two signed 32-bit
inputs, rather than two positive 32-bit inputs. It does not make a huge
difference, but it may be less surprising. Let me know what you think.

This works almost as well for me in the bigint implementation. There is one case
that is not covered anymore in the square function where I need to multiply a
32-bit unsigned digit by 2. Either we can solve that with better uint32 ops or
by detecting that one operand is a power of two.

Ideally, the mint multiplication should calculate the 128-bit product and deopt
if it does not fit in 64-bit, but that is expensive. I'd prefer to rely on
bigint for arbitrary large inputs.

I think that using the BSR instruction (CLZ on ARM) to calculate the sum of the
highest bits of the inputs and detect overflow is also too complicated.

Please, let me know if you are more comfortable with this current version.

Thanks,
Regis

regis

https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc File runtime/vm/intermediate_language_arm.cc (right): https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc#newcode6142 runtime/vm/intermediate_language_arm.cc:6142: __ smull(out_lo, out_hi, left_lo, right_lo); On 2014/07/16 23:51:35, Vyacheslav ...

6 years, 5 months ago (2014-07-18 02:44:57 UTC) #11

Vyacheslav Egorov (Google)

LGTM https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc File runtime/vm/intermediate_language_arm.cc (right): https://codereview.chromium.org/396733006/diff/20001/runtime/vm/intermediate_language_arm.cc#newcode6142 runtime/vm/intermediate_language_arm.cc:6142: __ smull(out_lo, out_hi, left_lo, right_lo); > I agree ...

6 years, 5 months ago (2014-07-18 11:15:35 UTC) #12

srdjan

https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc File runtime/vm/flow_graph_optimizer.cc (right): https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc#newcode2041 runtime/vm/flow_graph_optimizer.cc:2041: ASSERT(op_kind != Token::kMUL); Why can this be guaranteed?

6 years, 5 months ago (2014-07-18 16:32:54 UTC) #13

Cutch

lgtm https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc File runtime/vm/flow_graph_optimizer.cc (right): https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc#newcode2041 runtime/vm/flow_graph_optimizer.cc:2041: ASSERT(op_kind != Token::kMUL); On 2014/07/18 16:32:54, srdjan wrote: ...

6 years, 5 months ago (2014-07-18 16:39:39 UTC) #14

regis

Thanks! https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc File runtime/vm/flow_graph_optimizer.cc (right): https://codereview.chromium.org/396733006/diff/60001/runtime/vm/flow_graph_optimizer.cc#newcode2041 runtime/vm/flow_graph_optimizer.cc:2041: ASSERT(op_kind != Token::kMUL); On 2014/07/18 16:39:39, Cutch wrote: ...

6 years, 5 months ago (2014-07-18 17:59:43 UTC) #15

regis

6 years, 5 months ago (2014-07-18 18:00:02 UTC) #16

Message was sent while issue was closed.

Committed patchset #5 manually as r38400 (presubmit successful).

Expand Messages | Collapse Messages