Issue 1350223006: binary-operator-reducer: reduce mul+div(shift)

Issue 1350223006: binary-operator-reducer: reduce mul+div(shift) (Closed)

Created:
5 years, 3 months ago by fedor.indutny

Modified:
5 years ago

Reviewers:
titzer, Michael Starzinger, danno

CC:
danno, v8-reviews_googlegroups.com

Base URL:
https://chromium.googlesource.com/v8/v8.git@master

Target Ref:
refs/pending/heads/master

Project:
v8

Visibility:
Public.

More Reviews

Description

binary-operator-reducer: reduce mul+div(shift) Reduction Input: ChangeInt32ToFloat64=> TruncateFloat64ToInt32 Float64Mul=> ChangeInt32ToFloat64=> Float64Div=>TruncateFloat64ToInt32 Output: => TruncateInt64ToInt32 Int64Mul => Int64Shr => TruncateInt64ToInt32 Test code: function mul(a, b) { var l = a & 0x3ffffff; var h = b & 0x3ffffff; var m = l * h; var rl = m & 0x3ffffff; var rh = (m / 0x4000000) | 0; return rl | rh; } mul(1, 2); var a0 = mul(0x3ffffff, 0x3ffffff); mul(0x0, 0x0); %OptimizeFunctionOnNextCall(mul); var a1 = mul(0x3ffffff, 0x3ffffff); print(a0 + ' == ' + a1); BUG= R=mstarzinger@chromium.org Committed: https://crrev.com/461e5b49d022335a7fc4e9d172397a4bd48b93d4 Cr-Commit-Position: refs/heads/master@{#31899}

Patch Set 1 #

Patch Set 2 : ensure no extra precision #

Total comments: 8

Patch Set 3 : backwards-only matching #

Patch Set 4 : TruncateInt64ToFloat64 #

Total comments: 10

Patch Set 5 : remove Revisit #

Patch Set 6 : fixes #

Patch Set 7 : add test #

Patch Set 8 : use matchers #

Patch Set 9 : check range both ways #

Total comments: 2

Patch Set 10 : compiler: move round + truncate to machine-reducer #

Patch Set 11 : make test conditional #

Created: 5 years, 1 month ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+260 lines, -14 lines)			Patch
M	BUILD.gn	View	1 2 3 4 5 6	1 chunk	+2 lines, -0 lines	0 comments	Download
A +	src/compiler/binary-operator-reducer.h	View	1 2 3 4 5 6 7 8 9	3 chunks	+8 lines, -14 lines	0 comments	Download
A	src/compiler/binary-operator-reducer.cc	View	1 2 3 4 5 6 7 8 9	1 chunk	+128 lines, -0 lines	0 comments	Download
M	src/compiler/machine-operator-reducer.cc	View	1 2 3 4 5 6 7 8 9	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/compiler/pipeline.cc	View	1 2 3 4 5 6	2 chunks	+4 lines, -0 lines	0 comments	Download
A	test/unittests/compiler/binary-operator-reducer-unittest.cc	View	1 2 3 4 5 6 7 8 9 10	1 chunk	+94 lines, -0 lines	0 comments	Download
M	test/unittests/compiler/machine-operator-reducer-unittest.cc	View	1 2 3 4 5 6 7 8 9	1 chunk	+12 lines, -0 lines	0 comments	Download
M	test/unittests/compiler/node-test-utils.h	View	1 2 3 4 5 6	3 chunks	+5 lines, -0 lines	0 comments	Download
M	test/unittests/compiler/node-test-utils.cc	View	1 2 3 4 5 6	3 chunks	+3 lines, -0 lines	0 comments	Download
M	test/unittests/unittests.gyp	View	1 2 3 4 5 6	1 chunk	+1 line, -0 lines	0 comments	Download
M	tools/gyp/v8.gyp	View	1 2 3 4 5 6	1 chunk	+2 lines, -0 lines	0 comments	Download

Messages

Total messages: 61 (9 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

fedor.indutny

Hello! This is my first patch for the Turbofan. I'm sure there are lots of ...

5 years, 3 months ago (2015-09-19 05:25:48 UTC) #1

fedor.indutny

Just in case, generated code looks like this: 0x3ed0cc13aa8a 74 488b5de8 REX.W movq rbx,[rbp-0x18] 0x3ed0cc13aa8e ...

5 years, 3 months ago (2015-09-19 06:12:04 UTC) #2

fedor.indutny

I think there might be a problem with the increased precision in current version of ...

5 years, 3 months ago (2015-09-19 06:13:24 UTC) #3

fedor.indutny

Just limited the maximum precision to the 52 bits. Hopefully, the implementation is not that ...

5 years, 3 months ago (2015-09-19 07:51:15 UTC) #4

danno

Thanks for the patch. Could you perhaps provide some motivation for this CL, i.e. is ...

5 years, 3 months ago (2015-09-23 14:21:02 UTC) #6

fedor.indutny

Of course! This is the very heavily used loop in my bn.js library that I'm ...

5 years, 3 months ago (2015-09-23 17:41:14 UTC) #7

titzer

On 2015/09/28 17:51:26, fedor.indutny wrote: > Daniel, > > Kindly reminding you about this. > ...

5 years, 2 months ago (2015-09-29 11:13:59 UTC) #9

titzer

https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc#newcode41 src/compiler/binary-operator-reducer.cc:41: Reduction BinaryOperatorReducer::ReduceFloat64Mul(Node* node) { If you do this optimization ...

5 years, 2 months ago (2015-09-29 11:18:06 UTC) #11

titzer

https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc#newcode119 src/compiler/binary-operator-reducer.cc:119: node->InputAt(1)->InputAt(0)); TurboFan also only supports Int64Mul on 64-bit platforms ...

5 years, 2 months ago (2015-09-29 11:22:00 UTC) #12

fedor.indutny

On 2015/09/29 11:13:59, titzer wrote: > On 2015/09/28 17:51:26, fedor.indutny wrote: > > Daniel, > ...

5 years, 2 months ago (2015-10-02 02:49:52 UTC) #13

fedor.indutny

Answered your questions. Please let me know how it could be improved, going to start ...

5 years, 2 months ago (2015-10-02 02:54:30 UTC) #14

titzer

On 2015/10/02 02:49:52, fedor.indutny wrote: > On 2015/09/29 11:13:59, titzer wrote: > > On 2015/09/28 ...

5 years, 2 months ago (2015-10-06 23:10:39 UTC) #15

On 2015/10/02 02:49:52, fedor.indutny wrote:
> On 2015/09/29 11:13:59, titzer wrote:
> > On 2015/09/28 17:51:26, fedor.indutny wrote:
> > > Daniel,
> > > 
> > > Kindly reminding you about this.
> > > 
> > > Thanks.
> > 
> > Hi Fedor, I think the right place for this optimization would be in
> > compiler/machine-operator-reducer.cc, which handles reductions for that
level
> of
> > operators.
> 
> Hello!
> 
> I'm afraid machine-operator-reducer.cc can't be used for this, as this
reduction
> requires AdvancedReducer and MachineOperatorReducer is a child of Reducer.
> 
> > 
> > Can you explain a bit more about this optimization and how general you
> estimate
> > it to be?
> > 
> 
> This optimization is very "big number library"-specific. I'm sure that many
> other libraries may benefit from it, and there is tons of production crypto
code
> that relies on such libraries. So I would say that impact should be quite big.
> 
> BN.js operates on 26bit limbs, because you may multiply them with a single
> instruction without precision loss (52bits of precision in JS float type).
This
> is kind of compromise between fast addition of big numbers, and fast
> multiplication, and the main use case is elliptic curves which often use 256
bit
> numbers. Using 26bit limbs instead of (let's say) 16bit ones means that BN.js
> will be using 10 limbs instead of 16, and since multiplication is O(n^2) - it
> makes a big difference in performance.
> 
> The only problem with 26bit limb multiplication is that it converts
intermediate
> results to floating point numbers. This CL solves this by detecting the edge
> case, where floating point multiplication is happening without the loss in
> precision, and the results are converted to integers. It removes floating
points
> ops altogether, thus speeding up things for BN.js
> 
> I hope these comments explains it, if not - please let me know!
> 
> > Also, you will need to write some unittests and ideally some test JS code as
> > well for us to have confidence in it.
> 
> Surely will do it, thanks!

Hi, thanks for the explanation, the context helped a lot.

Some suggestions: you should be able to use the range types to get a more
precise bound on the inputs to the multiplication.
That will save some pattern matching for truncations on the inputs..

In general, you can't ignore uses from FrameStates, since they can observe
intermediate values. You really need to compute
whether there are any observations of the intermediate values to make this
optimization safe.

titzer

https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc#newcode58 src/compiler/binary-operator-reducer.cc:58: for (Node* use : node->uses()) { On 2015/10/02 02:54:30, ...

5 years, 2 months ago (2015-10-06 23:14:11 UTC) #16

fedor.indutny

On 2015/10/06 23:10:39, titzer wrote: > On 2015/10/02 02:49:52, fedor.indutny wrote: > > On 2015/09/29 ...

5 years, 2 months ago (2015-10-06 23:45:05 UTC) #17

On 2015/10/06 23:10:39, titzer wrote:
> On 2015/10/02 02:49:52, fedor.indutny wrote:
> > On 2015/09/29 11:13:59, titzer wrote:
> > > On 2015/09/28 17:51:26, fedor.indutny wrote:
> > > > Daniel,
> > > > 
> > > > Kindly reminding you about this.
> > > > 
> > > > Thanks.
> > > 
> > > Hi Fedor, I think the right place for this optimization would be in
> > > compiler/machine-operator-reducer.cc, which handles reductions for that
> level
> > of
> > > operators.
> > 
> > Hello!
> > 
> > I'm afraid machine-operator-reducer.cc can't be used for this, as this
> reduction
> > requires AdvancedReducer and MachineOperatorReducer is a child of Reducer.
> > 
> > > 
> > > Can you explain a bit more about this optimization and how general you
> > estimate
> > > it to be?
> > > 
> > 
> > This optimization is very "big number library"-specific. I'm sure that many
> > other libraries may benefit from it, and there is tons of production crypto
> code
> > that relies on such libraries. So I would say that impact should be quite
big.
> > 
> > BN.js operates on 26bit limbs, because you may multiply them with a single
> > instruction without precision loss (52bits of precision in JS float type).
> This
> > is kind of compromise between fast addition of big numbers, and fast
> > multiplication, and the main use case is elliptic curves which often use 256
> bit
> > numbers. Using 26bit limbs instead of (let's say) 16bit ones means that
BN.js
> > will be using 10 limbs instead of 16, and since multiplication is O(n^2) -
it
> > makes a big difference in performance.
> > 
> > The only problem with 26bit limb multiplication is that it converts
> intermediate
> > results to floating point numbers. This CL solves this by detecting the edge
> > case, where floating point multiplication is happening without the loss in
> > precision, and the results are converted to integers. It removes floating
> points
> > ops altogether, thus speeding up things for BN.js
> > 
> > I hope these comments explains it, if not - please let me know!
> > 
> > > Also, you will need to write some unittests and ideally some test JS code
as
> > > well for us to have confidence in it.
> > 
> > Surely will do it, thanks!
> 
> Hi, thanks for the explanation, the context helped a lot.
> 
> Some suggestions: you should be able to use the range types to get a more
> precise bound on the inputs to the multiplication.
> That will save some pattern matching for truncations on the inputs..

Hm... this is a good idea. However I skip cast to float altogether if the match
has succeeded. Therefore I still need to know whether to insert cast to int, or
just skip cast to float.

> 
> In general, you can't ignore uses from FrameStates, since they can observe
> intermediate values. You really need to compute
> whether there are any observations of the intermediate values to make this
> optimization safe.

May I ask you to enlighten me a bit on FrameStates? I have seen them through the
code base, but not exactly sure what they capture, and what is their function?

fedor.indutny

On 2015/10/06 23:14:11, titzer wrote: > https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc > File src/compiler/binary-operator-reducer.cc (right): > > https://codereview.chromium.org/1350223006/diff/20001/src/compiler/binary-operator-reducer.cc#newcode58 > ...

5 years, 2 months ago (2015-10-06 23:46:28 UTC) #18

fedor.indutny

I have one question. If I will do it at TruncateFloatToInt32 and Float64Div instead - ...

5 years, 2 months ago (2015-10-07 00:13:52 UTC) #19

fedor.indutny

Updated CL, now it matches from the very bottom of replacement, and iterates only inputs. ...

5 years, 2 months ago (2015-10-07 00:48:16 UTC) #20

fedor.indutny

After some consideration, I'm not sure if the way it is done in the second ...

5 years, 2 months ago (2015-10-11 17:07:31 UTC) #21

fedor.indutny

On 2015/10/11 17:07:31, fedor.indutny wrote: > After some consideration, I'm not sure if the way ...

5 years, 2 months ago (2015-10-11 18:28:26 UTC) #22

titzer

https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc#newcode47 src/compiler/binary-operator-reducer.cc:47: if (!machine()->Is64()) return NoChange(); You can use a BinopMatcher ...

5 years, 2 months ago (2015-10-16 18:44:06 UTC) #23

fedor.indutny

https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc#newcode50 src/compiler/binary-operator-reducer.cc:50: node->InputAt(1)->opcode() != IrOpcode::kChangeInt32ToFloat64) { But the idea here is ...

5 years, 2 months ago (2015-10-16 18:52:35 UTC) #24

titzer

https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc#newcode82 src/compiler/binary-operator-reducer.cc:82: Reduction BinaryOperatorReducer::ReduceFloat52Div(Node* node) { On 2015/10/16 18:52:35, fedor.indutny wrote: ...

5 years, 2 months ago (2015-10-16 19:51:54 UTC) #25

fedor.indutny

https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc#newcode82 src/compiler/binary-operator-reducer.cc:82: Reduction BinaryOperatorReducer::ReduceFloat52Div(Node* node) { On 2015/10/16 19:51:54, titzer wrote: ...

5 years, 2 months ago (2015-10-16 19:54:19 UTC) #26

titzer

On 2015/10/16 19:54:19, fedor.indutny wrote: > https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc > File src/compiler/binary-operator-reducer.cc (right): > > https://codereview.chromium.org/1350223006/diff/60001/src/compiler/binary-operator-reducer.cc#newcode82 > ...

5 years, 2 months ago (2015-10-16 20:41:34 UTC) #27

fedor.indutny

I think I have addressed everything said, except the ranges, which I am not sure ...

5 years, 2 months ago (2015-10-16 22:30:38 UTC) #28

fedor.indutny

On 2015/10/16 22:30:38, fedor.indutny wrote: > I think I have addressed everything said, except the ...

5 years, 1 month ago (2015-10-29 22:41:45 UTC) #29

titzer

On 2015/10/16 22:30:38, fedor.indutny wrote: > I think I have addressed everything said, except the ...

5 years, 1 month ago (2015-10-30 19:52:57 UTC) #30

fedor.indutny

On 2015/10/30 19:52:57, titzer wrote: > On 2015/10/16 22:30:38, fedor.indutny wrote: > > I think ...

5 years, 1 month ago (2015-11-03 22:20:24 UTC) #31

fedor.indutny

On 2015/11/03 22:20:24, fedor.indutny wrote: > On 2015/10/30 19:52:57, titzer wrote: > > On 2015/10/16 ...

5 years, 1 month ago (2015-11-03 23:14:09 UTC) #32

fedor.indutny

Kindly reminding you about this. I really would like to get my first turbofan patch ...

5 years, 1 month ago (2015-11-08 02:24:57 UTC) #33

titzer

Logic looks all good now. Thanks for writing tests. Are you using any of the ...

5 years, 1 month ago (2015-11-09 04:17:18 UTC) #34

fedor.indutny

On 2015/11/09 04:17:18, titzer wrote: > Logic looks all good now. Thanks for writing tests. ...

5 years, 1 month ago (2015-11-09 04:22:51 UTC) #35

titzer

On 2015/11/09 04:22:51, fedor.indutny wrote: > On 2015/11/09 04:17:18, titzer wrote: > > Logic looks ...

5 years, 1 month ago (2015-11-09 18:08:40 UTC) #36

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1350223006/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1350223006/140001

5 years, 1 month ago (2015-11-09 18:20:13 UTC) #38

commit-bot: I haz the power

Try jobs failed on following builders: v8_linux_gcc_compile_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_gcc_compile_rel/builds/8231)

5 years, 1 month ago (2015-11-09 18:26:09 UTC) #40

fedor.indutny

Titzer, There was a minor warning that I have fixed. While fixing it I have ...

5 years, 1 month ago (2015-11-09 18:49:34 UTC) #41

titzer

lgtm other than one comment https://codereview.chromium.org/1350223006/diff/160001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/160001/src/compiler/binary-operator-reducer.cc#newcode120 src/compiler/binary-operator-reducer.cc:120: Reduction BinaryOperatorReducer::ReduceTruncateFloat64ToInt32(Node* node) { ...

5 years, 1 month ago (2015-11-09 18:53:08 UTC) #42

fedor.indutny

https://codereview.chromium.org/1350223006/diff/160001/src/compiler/binary-operator-reducer.cc File src/compiler/binary-operator-reducer.cc (right): https://codereview.chromium.org/1350223006/diff/160001/src/compiler/binary-operator-reducer.cc#newcode120 src/compiler/binary-operator-reducer.cc:120: Reduction BinaryOperatorReducer::ReduceTruncateFloat64ToInt32(Node* node) { On 2015/11/09 18:53:08, titzer wrote: ...

5 years, 1 month ago (2015-11-09 18:57:58 UTC) #43

titzer

On 2015/11/09 19:14:07, fedor.indutny wrote: > Should I land it? We use the commit queue. ...

5 years, 1 month ago (2015-11-09 19:21:33 UTC) #45

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1350223006/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1350223006/180001

5 years, 1 month ago (2015-11-09 19:21:43 UTC) #48

commit-bot: I haz the power

Try jobs failed on following builders: v8_linux_rel on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_rel/builds/11543)

5 years, 1 month ago (2015-11-09 19:36:43 UTC) #50

fedor.indutny

Heh, test obviously does not work on 32bit platforms. Added check. Please take a look.

5 years, 1 month ago (2015-11-09 19:44:19 UTC) #51

titzer

On 2015/11/09 19:44:19, fedor.indutny wrote: > Heh, test obviously does not work on 32bit platforms. ...

5 years, 1 month ago (2015-11-09 19:46:40 UTC) #52

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1350223006/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1350223006/200001

5 years, 1 month ago (2015-11-09 19:47:12 UTC) #55

fedor.indutny

On 2015/11/09 19:46:40, titzer wrote: > On 2015/11/09 19:44:19, fedor.indutny wrote: > > Heh, test ...

5 years, 1 month ago (2015-11-09 19:50:28 UTC) #56

commit-bot: I haz the power

Patchset 11 (id:??) landed as https://crrev.com/461e5b49d022335a7fc4e9d172397a4bd48b93d4 Cr-Commit-Position: refs/heads/master@{#31899}

5 years, 1 month ago (2015-11-09 20:42:48 UTC) #58

fedor.indutny

On 2015/11/09 20:42:48, commit-bot: I haz the power wrote: > Patchset 11 (id:??) landed as ...

5 years, 1 month ago (2015-11-09 20:43:54 UTC) #59

Benedikt Meurer

Unfortunately this is also unsound for the reasons outlined in https://codereview.chromium.org/1473073004/ I think most of ...

5 years ago (2015-11-26 06:14:12 UTC) #60

Benedikt Meurer

5 years ago (2015-11-26 06:14:36 UTC) #61

Message was sent while issue was closed.

A revert of this CL (patchset #11 id:200001) has been created in
https://codereview.chromium.org/1478923002/ by bmeurer@chromium.org.

The reason for reverting is: This is also unsound for the reasons outlined in
https://codereview.chromium.org/1473073004/
Will help Fedor to implement a solution based on simplified operators..

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages