gcc/gmp/mpn/sparc64/README - Issue 3050029: [gcc] GCC 4.5.0=>4.5.1

Unified Diff: gcc/gmp/mpn/sparc64/README

Issue 3050029: [gcc] GCC 4.5.0=>4.5.1 (Closed) Base URL: ssh://git@gitrw.chromium.org:9222/nacl-toolchain.git

Patch Set: Created 10 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: gcc/gmp/mpn/sparc64/README

diff --git a/gcc/gmp/mpn/sparc64/README b/gcc/gmp/mpn/sparc64/README

deleted file mode 100644

index 19072996de48b13fc6b706fa6102c93e203bb481..0000000000000000000000000000000000000000

--- a/gcc/gmp/mpn/sparc64/README

+++ /dev/null

@@ -1,114 +0,0 @@

-This file is part of the GNU MP Library.

-The GNU MP Library is free software; you can redistribute it and/or modify

-it under the terms of the GNU Lesser General Public License as published by

-the Free Software Foundation; either version 3 of the License, or (at your

-option) any later version.

-The GNU MP Library is distributed in the hope that it will be useful, but

-WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY

-or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public

-License for more details.

-You should have received a copy of the GNU Lesser General Public License

-along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.

-This directory contains mpn functions for 64-bit V9 SPARC

-RELEVANT OPTIMIZATION ISSUES

-Notation:

- IANY = shift/add/sub/logical/sethi

- IADDLOG = add/sub/logical/sethi

- MEM = ld*/st*

- FA = fadd*/fsub*/f*to*/fmov*

- FM = fmul*

-UltraSPARC can issue four instructions per cycle, with these restrictions:

-* Two IANY instructions, but only one of these may be a shift. If there is a

- shift and an IANY instruction, the shift must precede the IANY instruction.

-* One FA.

-* One FM.

-* One branch.

-* One MEM.

-* IANY/IADDLOG/MEM must be insn 1, 2, or 3 in an issue bundle. Taken branches

- should not be in slot 4, since that makes the delay insn come from separate

- bundle.

-* If two IANY/IADDLOG instructions are to be executed in the same cycle and one

- of these is setting the condition codes, that instruction must be the second

- one.

-To summarize, ignoring branches, these are the bundles that can reach the peak

-execution speed:

-insn1 iany iany mem iany iany mem iany iany mem

-insn2 iaddlog mem iany mem iaddlog iany mem iaddlog iany

-insn3 mem iaddlog iaddlog fa fa fa fm fm fm

-insn4 fa/fm fa/fm fa/fm fm fm fm fa fa fa

-The 64-bit integer multiply instruction mulx takes from 5 cycles to 35 cycles,

-depending on the position of the most significant bit of the first source

-operand. When used for 32x32->64 multiplication, it needs 20 cycles.

-Furthermore, it stalls the processor while executing. We stay away from that

-instruction, and instead use floating-point operations.

-Floating-point add and multiply units are fully pipelined. The latency for

-UltraSPARC-1/2 is 3 cycles and for UltraSPARC-3 it is 4 cycles.

-Integer conditional move instructions cannot dual-issue with other integer

-instructions. No conditional move can issue 1-5 cycles after a load. (This

-might have been fixed for UltraSPARC-3.)

-The UltraSPARC-3 pipeline is very simular to he one of UltraSPARC-1/2 , but is

-somewhat slower. Branches execute slower, and there may be other new stalls.

-But integer multiply doesn't stall the entire CPU and also has a much lower

-latency. But it's still not pipelined, and thus useless for our needs.

-STATUS

-* mpn_lshift, mpn_rshift: The current code runs at 2.0 cycles/limb on

- UltraSPARC-1/2 and 2.65 on UltraSPARC-3. For UltraSPARC-1/2, the IEU0

- functional unit is saturated with shifts.

-* mpn_add_n, mpn_sub_n: The current code runs at 4 cycles/limb on

- UltraSPARC-1/2 and 4.5 cycles/limb on UltraSPARC-3. The 4 instruction

- recurrency is the speed limiter.

-* mpn_addmul_1: The current code runs at 14 cycles/limb asymptotically on

- UltraSPARC-1/2 and 17.5 cycles/limb on UltraSPARC-3. On UltraSPARC-1/2, the

- code sustains 4 instructions/cycle. It might be possible to invent a better

- way of summing the intermediate 49-bit operands, but it is unlikely that it

- will save enough instructions to save an entire cycle.

- The load-use of the u operand is not enough scheduled for good L2 cache

- performance. The UltraSPARC-1/2 L1 cache is direct mapped, and since we use

- temporary stack slots that will conflict with the u and r operands, we miss

- to L2 very often. The load-use of the std/ldx pairs via the stack are

- perhaps over-scheduled.

- It would be possible to save two instructions: (1) The mov could be avoided

- if the std/ldx were less scheduled. (2) The ldx of the r operand could be

- split into two ld instructions, saving the shifts/masks.

- It should be possible to reach 14 cycles/limb for UltraSPARC-3 if the fp

- operations where rescheduled for this processor's 4-cycle latency.

-* mpn_mul_1: The current code is a straightforward edit of the mpn_addmul_1

- code. It would be possible to shave one or two cycles from it, with some

- labour.

-* mpn_submul_1: Simpleminded code just calling mpn_mul_1 + mpn_sub_n. This

- means that it runs at 18 cycles/limb on UltraSPARC-1/2 and 23 cycles/limb on

- UltraSPARC-3. It would be possible to either match the mpn_addmul_1

- performance, or in the worst case use one more instruction group.

-* US1/US2 cache conflict resolving. The direct mapped L1 date cache of US1/US2

- is a problem for mul_1, addmul_1 (and a prospective submul_1). We should

- allocate a larger cache area, and put the stack temp area in a place that

- doesn't cause cache conflicts.

« no previous file with comments | « gcc/gmp/mpn/sparc32/v9/submul_1.asm ('k') | gcc/gmp/mpn/sparc64/add_n.asm » ('j') | no next file with comments »