Issue 222403002: ARM: Avoid VMSR instruction when converting to clamped uint8

Issue 222403002: ARM: Avoid VMSR instruction when converting to clamped uint8 (Closed)

Created:
6 years, 8 months ago by oetuaho-nv

Modified:
6 years, 8 months ago

Reviewers:
Jarin, jbramley, rmcilroy

CC:
v8-dev

Base URL:
git://github.com/v8/v8.git@master

Visibility:
Public.

More Reviews

Description

ARM: Avoid VMSR instruction when converting to clamped uint8 Setting the FPSCR flags is expensive on some CPUs. Instead, do rounding to the nearest integer by adding 0.5 and truncating towards zero. Additional checks are needed to make sure that numbers ending in exactly .5 get rounded to the nearest even number according to WebIDL spec, but this case is relatively rare and even then the benefits outweigh the costs. Performance increases ranging from 2x to 9x were seen in measurements with a hot-loop benchmark depending on the CPU architecture. This new code path is used on VFP3-capable processors, which have the fixed-point conversion functions. Generic ARM assembler, disassembler, and simulator functions are added for the conversion between fixed point and floating point. A bug in encoding the immediate value in the vcvt_f64_s32 instruction is also fixed by this change. This did not cause behavioral issues before, since in existing uses of the function the order of the bits in the immediate value does not matter, as they are all 1. BUG=3253 LOG=N

Patch Set 1 #

Total comments: 3

Created: 6 years, 8 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+251 lines, -53 lines)			Patch
M	src/arm/assembler-arm.h	View	1 chunk	+4 lines, -1 line	0 comments	Download
M	src/arm/assembler-arm.cc	View	2 chunks	+46 lines, -12 lines	0 comments	Download
M	src/arm/constants-arm.h	View	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/arm/disasm-arm.cc	View	4 chunks	+55 lines, -7 lines	0 comments	Download
M	src/arm/macro-assembler-arm.cc	View	1 chunk	+51 lines, -22 lines	3 comments	Download
M	src/arm/simulator-arm.h	View	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/arm/simulator-arm.cc	View	3 chunks	+77 lines, -7 lines	0 comments	Download
M	test/cctest/test-assembler-arm.cc	View	4 chunks	+10 lines, -2 lines	0 comments	Download
M	test/cctest/test-disasm-arm.cc	View	1 chunk	+6 lines, -2 lines	0 comments	Download

Messages

Total messages: 10 (0 generated)

Expand Messages | Collapse Messages

jbramley

https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm.cc File src/arm/macro-assembler-arm.cc (left): https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm.cc#oldcode3819 src/arm/macro-assembler-arm.cc:3819: // Set rounding mode to round to the nearest ...

6 years, 8 months ago (2014-04-03 09:22:48 UTC) #2

oetuaho-nv

Thanks for the quick review, one comment inline. I can look into this alternative solution ...

6 years, 8 months ago (2014-04-03 16:02:45 UTC) #3

Rodolph Perfetta

On 2014/04/03 16:02:45, oetuaho-nv wrote: > Thanks for the quick review, one comment inline. > ...

6 years, 8 months ago (2014-04-03 18:29:20 UTC) #4

On 2014/04/03 16:02:45, oetuaho-nv wrote:
> Thanks for the quick review, one comment inline.
> 
> I can look into this alternative solution and submit a new version once I have
> code and test results ready, and submit the assembler bug fix separately.
> 
>
https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm.cc
> File src/arm/macro-assembler-arm.cc (left):
> 
>
https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm....
> src/arm/macro-assembler-arm.cc:3819: // Set rounding mode to round to the
> nearest integer by clearing bits[23:22].
> On 2014/04/03 09:22:48, jbramley wrote:
> > Shouldn't we be in the right rounding mode already? ECMAScript maths
> operations
> > are supposed to be done in round-to-nearest mode, and this is also the
default
> > FPSCR setting. If we've changed it explicitly somewhere else (for whatever
> > reason), we're probably not doing normal ECMAScript maths properly.
> > 
> > If I'm correct about that, the whole thing collapses down:
> > 
> > // Handle inputs >= 255 (including +infinity).
> > mov(result_reg, 255);
> > Vmov(double_scratch, 255.0, result_reg);
> > VFPCompareAndSetFlags(input_reg, double_scratch);
> > b(ge, &done);
> > 
> > // All other inputs will clamp to the range [0-255]: NaN and -infinity both
> > produce 0.
> > vcvt_u32_f64(double_scratch.low(), input_reg, kFPSCRRounding);
> > Vmov(result_reg, double_scratch.low());
> > 
> > This is more-or-less equivalent to what we did in ClampDoubleToUint8 in
> > src/arm64/macro-assembler-arm64.cc, though the available instructions make
it
> > much simpler in A64.
> 
> This would be an excellent solution, but it seems the FPSCR state can be wrong
> when entering this function. VFPEnsureFPSCRState doesn't currently set the
> rounding mode, and I suppose it can be messed with by outside code. But if
> setting the FPSCR state in VFPEnsureFPSCRState will be enough, this solution
> could work.

The rounding mode should be round to nearest or other arithmetic operations are
potentially incorrect. You could add a check in debug mode to
VFPEnsureFPSCRState to check the rounding mode is round to nearest.

jbramley

On 2014/04/03 18:29:20, Rodolph Perfetta wrote: > On 2014/04/03 16:02:45, oetuaho-nv wrote: > > Thanks ...

6 years, 8 months ago (2014-04-04 09:43:05 UTC) #5

On 2014/04/03 18:29:20, Rodolph Perfetta wrote:
> On 2014/04/03 16:02:45, oetuaho-nv wrote:
> > Thanks for the quick review, one comment inline.
> > 
> > I can look into this alternative solution and submit a new version once I
have
> > code and test results ready, and submit the assembler bug fix separately.
> > 
> >
>
https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm.cc
> > File src/arm/macro-assembler-arm.cc (left):
> > 
> >
>
https://codereview.chromium.org/222403002/diff/1/src/arm/macro-assembler-arm....
> > src/arm/macro-assembler-arm.cc:3819: // Set rounding mode to round to the
> > nearest integer by clearing bits[23:22].
> > On 2014/04/03 09:22:48, jbramley wrote:
> > > Shouldn't we be in the right rounding mode already? ECMAScript maths
> > operations
> > > are supposed to be done in round-to-nearest mode, and this is also the
> default
> > > FPSCR setting. If we've changed it explicitly somewhere else (for whatever
> > > reason), we're probably not doing normal ECMAScript maths properly.
> > > 
> > > If I'm correct about that, the whole thing collapses down:
> > > 
> > > // Handle inputs >= 255 (including +infinity).
> > > mov(result_reg, 255);
> > > Vmov(double_scratch, 255.0, result_reg);
> > > VFPCompareAndSetFlags(input_reg, double_scratch);
> > > b(ge, &done);
> > > 
> > > // All other inputs will clamp to the range [0-255]: NaN and -infinity
both
> > > produce 0.
> > > vcvt_u32_f64(double_scratch.low(), input_reg, kFPSCRRounding);
> > > Vmov(result_reg, double_scratch.low());
> > > 
> > > This is more-or-less equivalent to what we did in ClampDoubleToUint8 in
> > > src/arm64/macro-assembler-arm64.cc, though the available instructions make
> it
> > > much simpler in A64.
> > 
> > This would be an excellent solution, but it seems the FPSCR state can be
wrong
> > when entering this function. VFPEnsureFPSCRState doesn't currently set the
> > rounding mode, and I suppose it can be messed with by outside code. But if
> > setting the FPSCR state in VFPEnsureFPSCRState will be enough, this solution
> > could work.

I can't find any use of vmsr that sets the rounding mode to anything else. A few
tests hit it, but the rounding mode is correct in every case. Under what
conditions can FPSCR be wrong here?

> The rounding mode should be round to nearest or other arithmetic operations
are
> potentially incorrect. You could add a check in debug mode to
> VFPEnsureFPSCRState to check the rounding mode is round to nearest.

That sounds like a good idea.

oetuaho-nv

On 2014/04/04 09:43:05, jbramley wrote: > I can't find any use of vmsr that sets ...

6 years, 8 months ago (2014-04-04 14:14:17 UTC) #6

jbramley

On 2014/04/04 14:14:17, oetuaho-nv wrote: > On 2014/04/04 09:43:05, jbramley wrote: > > I can't ...

6 years, 8 months ago (2014-04-04 14:34:42 UTC) #7

oetuaho-nv

On 2014/04/04 14:34:42, jbramley wrote: > On 2014/04/04 14:14:17, oetuaho-nv wrote: > > On 2014/04/04 ...

6 years, 8 months ago (2014-04-04 15:16:07 UTC) #8

jbramley

On 2014/04/04 15:16:07, oetuaho-nv wrote: > On 2014/04/04 14:34:42, jbramley wrote: > > On 2014/04/04 ...

6 years, 8 months ago (2014-04-04 15:36:56 UTC) #9

On 2014/04/04 15:16:07, oetuaho-nv wrote:
> On 2014/04/04 14:34:42, jbramley wrote:
> > On 2014/04/04 14:14:17, oetuaho-nv wrote:
> > > On 2014/04/04 09:43:05, jbramley wrote:
> > > > I can't find any use of vmsr that sets the rounding mode to
> > > > anything else. A few tests hit it, but the rounding mode is
> > > > correct in every case. Under what conditions can FPSCR be wrong
> > > > here?
> > > 
> > > Yes, it's not being set by other code generated by v8, but is
> > > there anything preventing other code that runs in the same
> > > execution context setting the FPSCR to another value?
> > 
> > Sort of, yes. AAPCS (the procedure-call standard) says that the
> > rounding-mode bits may be modified by "specific support functions
> > that affect the global state of the application". That is, we would
> > have to do it explicitly somewhere.
> 
> To me, that reads as you can't rely on the bits being set a certain
> way, unless I'm understanding "global state of the application" wrong
> somehow. Isn't this state shared with the entire renderer main thread
> in the case of Chromium, for example, which is running all sorts of
> code?

Yes, indeed, but by the same logic, if a project (such as Chromium)
wants to change the arithmetic settings, it must ensure that this change
is acceptable for its components (such as V8).

An assertion in VFPEnsureFPSCRState would be sufficient to enforce this,
I think.

> I'm also curious about the NaN bit, according to AAPCS bit 25 is one
> of the "other" bits that must not be modified, yet it is modified in
> VFPEnsureFPSCRState?

Yes, I wondered about that too. We modify it because we rely on
default-NaN mode to optimize some NaN canonicalization operations. AAPCS
says that this is non-compliant, but in this case it is probably a
worthwhile non-compliance.

We restore it when returning to (or calling) C++ code so the impact will
be limited to signal handlers. In general, C++ code doesn't care much
about default-NaN mode. This could be a problem later, but I think it's
quite unlikely.

oetuaho-nv

6 years, 8 months ago (2014-04-09 14:10:30 UTC) #10

I now uploaded a new cl that implements the suggested approach:
https://codereview.chromium.org/230473005

I'm closing this one.

Expand Messages | Collapse Messages