openssl/crypto/des/asm/readme - Issue 2072073002: Delete bundled copy of OpenSSL and replace with README.

Unified Diff: openssl/crypto/des/asm/readme

Issue 2072073002: Delete bundled copy of OpenSSL and replace with README. (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/openssl@master

Patch Set: Delete bundled copy of OpenSSL and replace with README. Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: openssl/crypto/des/asm/readme

diff --git a/openssl/crypto/des/asm/readme b/openssl/crypto/des/asm/readme

deleted file mode 100644

index 1beafe253b17fe52985f7c4de6f7b4577f1f1bfb..0000000000000000000000000000000000000000

--- a/openssl/crypto/des/asm/readme

+++ /dev/null

@@ -1,131 +0,0 @@

-First up, let me say I don't like writing in assembler. It is not portable,

-dependant on the particular CPU architecture release and is generally a pig

-to debug and get right. Having said that, the x86 architecture is probably

-the most important for speed due to number of boxes and since

-it appears to be the worst architecture to to get

-good C compilers for. So due to this, I have lowered myself to do

-assembler for the inner DES routines in libdes :-).

-The file to implement in assembler is des_enc.c. Replace the following

-4 functions

-des_encrypt1(DES_LONG data[2],des_key_schedule ks, int encrypt);

-des_encrypt2(DES_LONG data[2],des_key_schedule ks, int encrypt);

-des_encrypt3(DES_LONG data[2],des_key_schedule ks1,ks2,ks3);

-des_decrypt3(DES_LONG data[2],des_key_schedule ks1,ks2,ks3);

-They encrypt/decrypt the 64 bits held in 'data' using

-the 'ks' key schedules. The only difference between the 4 functions is that

-des_encrypt2() does not perform IP() or FP() on the data (this is an

-optimization for when doing triple DES and des_encrypt3() and des_decrypt3()

-perform triple des. The triple DES routines are in here because it does

-make a big difference to have them located near the des_encrypt2 function

-at link time..

-Now as we all know, there are lots of different operating systems running on

-x86 boxes, and unfortunately they normally try to make sure their assembler

-formating is not the same as the other peoples.

-The 4 main formats I know of are

-Microsoft Windows 95/Windows NT

-Elf Includes Linux and FreeBSD(?).

-a.out The older Linux.

-Solaris Same as Elf but different comments :-(.

-Now I was not overly keen to write 4 different copies of the same code,

-so I wrote a few perl routines to output the correct assembler, given

-a target assembler type. This code is ugly and is just a hack.

-The libraries are x86unix.pl and x86ms.pl.

-des586.pl, des686.pl and des-som[23].pl are the programs to actually

-generate the assembler.

-So to generate elf assembler

-perl des-som3.pl elf >dx86-elf.s

-For Windows 95/NT

-perl des-som2.pl win32 >win32.asm

-[ update 4 Jan 1996 ]

-I have added another way to do things.

-perl des-som3.pl cpp >dx86-cpp.s

-generates a file that will be included by dx86unix.cpp when it is compiled.

-To build for elf, a.out, solaris, bsdi etc,

-cc -E -DELF asm/dx86unix.cpp | as -o asm/dx86-elf.o

-cc -E -DSOL asm/dx86unix.cpp | as -o asm/dx86-sol.o

-cc -E -DOUT asm/dx86unix.cpp | as -o asm/dx86-out.o

-cc -E -DBSDI asm/dx86unix.cpp | as -o asm/dx86bsdi.o

-This was done to cut down the number of files in the distribution.

-Now the ugly part. I acquired my copy of Intels

-"Optimization's For Intel's 32-Bit Processors" and found a few interesting

-things. First, the aim of the exersize is to 'extract' one byte at a time

-from a word and do an array lookup. This involves getting the byte from

-the 4 locations in the word and moving it to a new word and doing the lookup.

-The most obvious way to do this is

-xor eax, eax # clear word

-movb al, cl # get low byte

-xor edi DWORD PTR 0x100+des_SP[eax] # xor in word

-movb al, ch # get next byte

-xor edi DWORD PTR 0x300+des_SP[eax] # xor in word

-shr ecx 16

-which seems ok. For the pentium, this system appears to be the best.

-One has to do instruction interleaving to keep both functional units

-operating, but it is basically very efficient.

-Now the crunch. When a full register is used after a partial write, eg.

-mov al, cl

-xor edi, DWORD PTR 0x100+des_SP[eax]

-386 - 1 cycle stall

-486 - 1 cycle stall

-586 - 0 cycle stall

-686 - at least 7 cycle stall (page 22 of the above mentioned document).

-So the technique that produces the best results on a pentium, according to

-the documentation, will produce hideous results on a pentium pro.

-To get around this, des686.pl will generate code that is not as fast on

-a pentium, should be very good on a pentium pro.

-mov eax, ecx # copy word

-shr ecx, 8 # line up next byte

-and eax, 0fch # mask byte

-xor edi DWORD PTR 0x100+des_SP[eax] # xor in array lookup

-mov eax, ecx # get word

-shr ecx 8 # line up next byte

-and eax, 0fch # mask byte

-xor edi DWORD PTR 0x300+des_SP[eax] # xor in array lookup

-Due to the execution units in the pentium, this actually works quite well.

-For a pentium pro it should be very good. This is the type of output

-Visual C++ generates.

-There is a third option. instead of using

-mov al, ch

-which is bad on the pentium pro, one may be able to use

-movzx eax, ch

-which may not incur the partial write penalty. On the pentium,

-this instruction takes 4 cycles so is not worth using but on the

-pentium pro it appears it may be worth while. I need access to one to

-experiment :-).

-eric (20 Oct 1996)

-22 Nov 1996 - I have asked people to run the 2 different version on pentium

-pros and it appears that the intel documentation is wrong. The

-mov al,bh is still faster on a pentium pro, so just use the des586.pl

-install des686.pl

-3 Dec 1996 - I added des_encrypt3/des_decrypt3 because I have moved these

-functions into des_enc.c because it does make a massive performance

-difference on some boxes to have the functions code located close to

-the des_encrypt2() function.

-9 Jan 1997 - des-som2.pl is now the correct perl script to use for

-pentiums. It contains an inner loop from

-Svend Olaf Mikkelsen <svolaf@inet.uni-c.dk> which does raw ecb DES calls at

-273,000 per second. He had a previous version at 250,000 and the best

-I was able to get was 203,000. The content has not changed, this is all

-due to instruction sequencing (and actual instructions choice) which is able

-to keep both functional units of the pentium going.

-We may have lost the ugly register usage restrictions when x86 went 32 bit

-but for the pentium it has been replaced by evil instruction ordering tricks.

-13 Jan 1997 - des-som3.pl, more optimizations from Svend Olaf.

-raw DES at 281,000 per second on a pentium 100.

« no previous file with comments | « openssl/crypto/des/asm/desboth.pl ('k') | openssl/crypto/des/cbc3_enc.c » ('j') | no next file with comments »