Index: gcc/gmp/mpn/pa32/README |
diff --git a/gcc/gmp/mpn/pa32/README b/gcc/gmp/mpn/pa32/README |
deleted file mode 100644 |
index 72158d30ea1496d71fb1161beee63f02c7a3d3bd..0000000000000000000000000000000000000000 |
--- a/gcc/gmp/mpn/pa32/README |
+++ /dev/null |
@@ -1,151 +0,0 @@ |
-Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc. |
- |
-This file is part of the GNU MP Library. |
- |
-The GNU MP Library is free software; you can redistribute it and/or modify |
-it under the terms of the GNU Lesser General Public License as published by |
-the Free Software Foundation; either version 3 of the License, or (at your |
-option) any later version. |
- |
-The GNU MP Library is distributed in the hope that it will be useful, but |
-WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
-or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public |
-License for more details. |
- |
-You should have received a copy of the GNU Lesser General Public License |
-along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. |
- |
- |
- |
- |
- |
- |
-This directory contains mpn functions for various HP PA-RISC chips. Code |
-that runs faster on the PA7100 and later implementations, is in the pa7100 |
-directory. |
- |
-RELEVANT OPTIMIZATION ISSUES |
- |
- Load and Store timing |
- |
-On the PA7000 no memory instructions can issue the two cycles after a store. |
-For the PA7100, this is reduced to one cycle. |
- |
-The PA7100 has a lookup-free cache, so it helps to schedule loads and the |
-dependent instruction really far from each other. |
- |
-STATUS |
- |
-1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the |
- instructions below (but some sw pipelining is needed to avoid the |
- xmpyu-fstds delay): |
- |
- fldds s1_ptr |
- |
- xmpyu |
- fstds N(%r30) |
- xmpyu |
- fstds N(%r30) |
- |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- |
- addc |
- stws res_ptr |
- addc |
- stws res_ptr |
- |
- addib Loop |
- |
-2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb |
- (asymptotically) on the PA7100, using the instructions below. With proper |
- sw pipelining and the unrolling level below, the speed becomes 8 |
- cycles/limb. |
- |
- fldds s1_ptr |
- fldds s1_ptr |
- |
- xmpyu |
- fstds N(%r30) |
- xmpyu |
- fstds N(%r30) |
- xmpyu |
- fstds N(%r30) |
- xmpyu |
- fstds N(%r30) |
- |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- ldws N(%r30) |
- addc |
- addc |
- addc |
- addc |
- addc %r0,%r0,cy-limb |
- |
- ldws res_ptr |
- ldws res_ptr |
- ldws res_ptr |
- ldws res_ptr |
- add |
- stws res_ptr |
- addc |
- stws res_ptr |
- addc |
- stws res_ptr |
- addc |
- stws res_ptr |
- |
- addib |
- |
-3. For the PA8000 we have to stick to using 32-bit limbs before compiler |
- support emerges. But we want to use 64-bit operations whenever possible, |
- in particular for loads and stores. It is possible to handle mpn_add_n |
- efficiently by rotating (when s1/s2 are aligned), masking+bit field |
- inserting when (they are not). The speed should double compared to the |
- code used today. |
- |
- |
- |
- |
-LABEL SYNTAX |
- |
-The HP-UX assembler takes labels starting in column 0 with no colon, |
- |
- L$loop ldws,mb -4(0,%r25),%r22 |
- |
-Gas on hppa GNU/Linux however requires a colon, |
- |
- L$loop: ldws,mb -4(0,%r25),%r22 |
- |
-This is covered by using LDEF() from asm-defs.m4. An alternative would be |
-to use ".label" which is accepted by both, |
- |
- .label L$loop |
- ldws,mb -4(0,%r25),%r22 |
- |
-but that's not as nice to look at, not if you're used to assembler code |
-having labels in column 0. |
- |
- |
- |
- |
-REFERENCES |
- |
-Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998, |
-part number 92432-90012. |
- |
- |
- |
----------------- |
-Local variables: |
-mode: text |
-fill-column: 76 |
-End: |