|
From msuinfo!agate!doc.ic.ac.uk!pipex!sunic!trane.uninett.no!nntp.uio.no!ifi.uio.no!thork Fri Oct 15:42:39 1993 Nntp-Posting-Host: hymir.ifi.uio.no Newsgroups: sci.crypt Path: msuinfo!agate!doc.ic.ac.uk!pipex!sunic!trane.uninett.no!nntp.uio.no!ifi.uio.no!thork From: thork@ifi.uio.no (Thor Oddleif Kristoffersen) Subject: Fast DES in software Message-ID: <1993Oct13.181153.21261@ifi.uio.no> Sender: thork@ifi.uio.no (Thor Oddleif Kristoffersen) Organization: Dept. of Informatics, University of Oslo, Norway Date: Wed, 13 Oct 1993 18:11:53 GMT Lines: 21 Originator: thork@hymir.ifi.uio.no Anyone looking for fast implementations of DES in software should take a look at an article in the latest issue of Computers & Security (Vol. 12, No. 5), titled "More efficient software implementations of (generalized) DES". For example, a PowerBook 180 (MC68030, 33MHz, 70ns RAM) running their implementation of triple-DES in ECB- or CBC- mode achieves a throughput of 384kbits/s. The authors tested their code on the Mac architecture only, but another person ported it to the 8086. The throughput for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s. Not as fast as the Mac figures, but only 8086 instructions were used, so there's probably a lot of room for 386/486- specific optimization. I didn't see any mention of a RISC implementation, though. -- Thor Kristoffersen - Oslo, Norway - thork@ifi.uio.no From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!munnari.oz.au!bunyip.cc.uq.oz.psych.psy.uq.oz.au!eay Fri Oct 15 22:42:39 1993 Newsgroups: sci.crypt Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!munnari.oz.au!bunyip.cc.uq.oz!psych.psy.uq.oz.au!eay From: eay@psych.psy.uq.oz.au (Eric Young) Subject: Re: Fast DES in software Message-ID: <CEvzLA.1CM@bunyip.cc.uq.oz.au> Sender: news@bunyip.cc.uq.oz.au (Bunyip News) Nntp-Posting-Host: grunt.psy.uq.oz.au Organization: Psychology Department, University of Queensland References: <1993Oct13.181153.21261@ifi.uio.no> Date: Thu, 14 Oct 1993 12:19:58 GMT Lines: 37 In article <1993Oct13.181153.21261@ifi.uio.no>, thork@ifi.uio.no (Thor Oddleif Kristoffersen) writes> Anyone looking for fast implementations of DES in software |> should take a look at an article in the latest issue of |> Computers & Security (Vol. 12, No. 5), titled "More efficient |> software implementations of (generalized) DES". |> The authors tested their code on the Mac architecture only, |> but another person ported it to the 8086. The throughput |> for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s. I only have numbers for 80486sx/33Hz MSDOS Turbo C v 2.0 of 495kbits/s. Divide by 2 for clock speed, divide by 1.5 for 386 -> 486 speedup and you still get 165kbits/s. This is for a C DES library that was first posted to comp.sources.misc 18 months ago. I don't have access to the paper (yet) but there are at least 3 or 4 independently developed 'fast' publicly available version of DES that have speeds in this range. If anyone is interested in the speedups being done in the various 'net available' implementations, I have put in ftp.psy.uq.oz.au pub/DES/ALGORITHM, a very, very brief (and not easy to read) description of the speedups that I am aware off. If the above mentioned paper contains anything new, could some-one please email what they are. BTW for RISC processors (the times listed below are for C code with no CPU specific optimisation, there are public version of DES that are a bit faster but they normally contain #defines for specific architecture) DEC Alpha DEC 4000/610 AXP OSF/1 v 1.3 - gcc v 2.3.3 DES ecb bytes per sec = 1223712.35 ( 6.5uS) or 9.5 mega bits/s sun sparc 10/30 - gcc -O2 DES ecb bytes per sec = 555949.47 ( 14.4uS) or 4.3 mega bits/s PA-RISC 1.1 HP 710 cc DES ecb bytes per sec = 505971.82 ( 19.7uS) or 3.9 mega bits/s eric (who is constantly amazed how people don't realise how much good code is freely avaliable on the net) -- Eric Young | Systems programmer - Psychology Dept. Queensland Uni. AARNet | eay@psych.psy.uq.oz.au From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!mcsun!sunic!trane.uninett.no!n.uio.no!ifi.uio.no!thork Fri Oct 15 22:42:39 1993 Nntp-Posting-Host: hymir.ifi.uio.no Newsgroups: sci.crypt Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!mcsun!sunic!trane.uninett.no!p.uio.no!ifi.uio.no!thork From: thork@ifi.uio.no (Thor Oddleif Kristoffersen) Subject: Re: Fast DES in software Message-ID: <1993Oct15.151553.22063@ifi.uio.no> Sender: thork@ifi.uio.no (Thor Oddleif Kristoffersen) Organization: Dept. of Informatics, University of Oslo, Norway References: <1993Oct13.181153.21261@ifi.uio.no> <CEvzLA.1CM@bunyip.cc.uq.oz.au> Date: Fri, 15 Oct 1993 15:15:53 GMT Lines: 78 Originator: thork@hymir.ifi.uio.no In article <CEvzLA.1CM@bunyip.cc.uq.oz.au>, eay@psych.psy.uq.oz.au (Eric Young) writes: > In article <1993Oct13.181153.21261@ifi.uio.no>, thork@ifi.uio.no (Thor Oddleif Kristoffersen) writ > |> Anyone looking for fast implementations of DES in software > |> should take a look at an article in the latest issue of > |> Computers & Security (Vol. 12, No. 5), titled "More efficient > |> software implementations of (generalized) DES". > |> The authors tested their code on the Mac architecture only, > |> but another person ported it to the 8086. The throughput > |> for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s. > I only have numbers for 80486sx/33Hz MSDOS Turbo C v 2.0 of > 495kbits/s. Divide by 2 for clock speed, divide by 1.5 for 386 -> 486 > speedup and you still get 165kbits/s. > This is for a C DES library that was first posted to comp.sources.misc > 18 months ago. I don't have access to the paper (yet) but there are at > least 3 or 4 independently developed 'fast' publicly available version of > DES that have speeds in this range. > If anyone is interested in the speedups being done in the various > 'net available' implementations, I have put in ftp.psy.uq.oz.au > pub/DES/ALGORITHM, a very, very brief (and not easy to read) > description of the speedups that I am aware off. If the above > mentioned paper contains anything new, could some-one please > email what they are. First of all, the paper mostly discusses generalized DES (G-DES), i.e. DES with arbitrary permutations and substitutions. If the specific E permutation in DES is used, performance can be enhanced since one can take advantage of its special structure. The authors maintain, however, that this is the only optimization which will give DES better performance than G-DES. A whole range of techniques for the optimization of G-DES is covered in the paper. Some of these are previously known: 1. The entire cipher structure can be transformed so that the E and P permutations become adjacent and so can be combined into one. There are two ways of doing this, the 32-bit model and the 48-bit model. 2. Permutations can be implemented with small look-up tables which are combined with OR or EOR. Some (presumably) unknown techniques are several variants of pre- computing key specific tables. This saves the EORing of subkeys in each round. I am quoting the section headings: 7.1 EORing subkeys to one of the smaller tables. (*) 7.2 EORing to the appropriate bits of all smaller tables. (*) 7.3 Reordering the S-box entries with respect to the subkeys. 7.4 Moving the subkeys to make use of Sections 7.1 and 7.2 directly. (*) see 2 above. Then there's a known technique for DES: The E permutation can be implemented (on a 32-bit architecture) by one copy, one 8-bit rotate and two ANDs if the S-boxes are implemented as four tables with 2^{12} entries of 4 bytes. This can then be combined with the precomputation of key specific tables mentioned above. As far as I can see, some of these techniques are mentioned in your ALGORITHM file, but not all. Btw, I later discovered that the figure of 136.5kbit/s I quoted for a 386/16 in my first posting was for G-DES. Code for DES wasn't tested on the 8086 architecture, but extrapolating from the Macintosh figures the speed increase will be a factor of 1.58, so the actual throughput for a 386/16 is probably around 215kbits/s. Besides, the move from 16-bit to 32-bit operands should improve things considerably. > Eric Young | Systems programmer - Psychology Dept. Queensland Uni. > AARNet | eay@psych.psy.uq.oz.au -- Thor Kristoffersen - Oslo, Norway - thork@ifi.uio.no