TUCoPS :: Crypto :: fastdes.txt

Fast DES routines available from various Internet sites

From msuinfo!agate!doc.ic.ac.uk!pipex!sunic!trane.uninett.no!nntp.uio.no!ifi.uio.no!thork Fri Oct 15:42:39 1993
Nntp-Posting-Host: hymir.ifi.uio.no
Newsgroups: sci.crypt
Path: msuinfo!agate!doc.ic.ac.uk!pipex!sunic!trane.uninett.no!nntp.uio.no!ifi.uio.no!thork
From: thork@ifi.uio.no (Thor Oddleif Kristoffersen)
Subject: Fast DES in software
Message-ID: <1993Oct13.181153.21261@ifi.uio.no>
Sender: thork@ifi.uio.no (Thor Oddleif Kristoffersen)
Organization: Dept. of Informatics, University of Oslo, Norway
Date: Wed, 13 Oct 1993 18:11:53 GMT
Lines: 21
Originator: thork@hymir.ifi.uio.no


Anyone looking for fast implementations of DES in software
should take a look at an article in the latest issue of
Computers & Security (Vol. 12, No. 5), titled "More efficient
software implementations of (generalized) DES".

For example, a PowerBook 180 (MC68030, 33MHz, 70ns RAM)
running their implementation of triple-DES in ECB- or CBC-
mode achieves a throughput of 384kbits/s.

The authors tested their code on the Mac architecture only,
but another person ported it to the 8086. The throughput
for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s.
Not as fast as the Mac figures, but only 8086 instructions
were used, so there's probably a lot of room for 386/486-
specific optimization.

I didn't see any mention of a RISC implementation, though.

-- 
Thor Kristoffersen   -   Oslo, Norway   -   thork@ifi.uio.no

From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!munnari.oz.au!bunyip.cc.uq.oz.psych.psy.uq.oz.au!eay Fri Oct 15 22:42:39 1993
Newsgroups: sci.crypt
Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!munnari.oz.au!bunyip.cc.uq.oz!psych.psy.uq.oz.au!eay
From: eay@psych.psy.uq.oz.au (Eric Young)
Subject: Re: Fast DES in software
Message-ID: <CEvzLA.1CM@bunyip.cc.uq.oz.au>
Sender: news@bunyip.cc.uq.oz.au (Bunyip News)
Nntp-Posting-Host: grunt.psy.uq.oz.au
Organization: Psychology Department, University of Queensland
References:  <1993Oct13.181153.21261@ifi.uio.no>
Date: Thu, 14 Oct 1993 12:19:58 GMT
Lines: 37

In article <1993Oct13.181153.21261@ifi.uio.no>, thork@ifi.uio.no (Thor Oddleif Kristoffersen) writes> Anyone looking for fast implementations of DES in software
|> should take a look at an article in the latest issue of
|> Computers & Security (Vol. 12, No. 5), titled "More efficient
|> software implementations of (generalized) DES".
|> The authors tested their code on the Mac architecture only,
|> but another person ported it to the 8086. The throughput
|> for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s.
I only have numbers for 80486sx/33Hz MSDOS Turbo C v 2.0 of
495kbits/s.  Divide by 2 for clock speed, divide by 1.5 for 386 -> 486
speedup and you still get 165kbits/s.
This is for a C DES library that was first posted to comp.sources.misc
18 months ago.  I don't have access to the paper (yet) but there are at
least 3 or 4 independently developed 'fast' publicly available version of
DES that have speeds in this range.
If anyone is interested in the speedups being done in the various
'net available' implementations, I have put in ftp.psy.uq.oz.au
pub/DES/ALGORITHM, a very, very brief (and not easy to read)
description of the speedups that I am aware off.  If the above
mentioned paper contains anything new, could some-one please
email what they are.

BTW for RISC processors (the times listed below are for C code with no CPU
    specific optimisation, there are public version of DES that are
    a bit faster but they normally contain #defines for specific architecture)
	DEC Alpha DEC 4000/610 AXP OSF/1 v 1.3 - gcc v 2.3.3
	DES ecb bytes per sec =   1223712.35 (  6.5uS) or 9.5 mega bits/s
	sun sparc 10/30 - gcc -O2
	DES ecb bytes per sec =    555949.47 ( 14.4uS) or 4.3 mega bits/s
	PA-RISC 1.1 HP 710 cc
	DES ecb bytes per sec =    505971.82 ( 19.7uS) or 3.9 mega bits/s

eric (who is constantly amazed how people don't realise how much
      good code is freely avaliable on the net)
-- 
Eric Young | Systems programmer - Psychology Dept. Queensland Uni.
AARNet     | eay@psych.psy.uq.oz.au

From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!mcsun!sunic!trane.uninett.no!n.uio.no!ifi.uio.no!thork Fri Oct 15 22:42:39 1993
Nntp-Posting-Host: hymir.ifi.uio.no
Newsgroups: sci.crypt
Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!mcsun!sunic!trane.uninett.no!p.uio.no!ifi.uio.no!thork
From: thork@ifi.uio.no (Thor Oddleif Kristoffersen)
Subject: Re: Fast DES in software
Message-ID: <1993Oct15.151553.22063@ifi.uio.no>
Sender: thork@ifi.uio.no (Thor Oddleif Kristoffersen)
Organization: Dept. of Informatics, University of Oslo, Norway
References: <1993Oct13.181153.21261@ifi.uio.no> <CEvzLA.1CM@bunyip.cc.uq.oz.au>
Date: Fri, 15 Oct 1993 15:15:53 GMT
Lines: 78
Originator: thork@hymir.ifi.uio.no


In article <CEvzLA.1CM@bunyip.cc.uq.oz.au>, eay@psych.psy.uq.oz.au (Eric Young) writes:
> In article <1993Oct13.181153.21261@ifi.uio.no>, thork@ifi.uio.no (Thor Oddleif Kristoffersen) writ
> |> Anyone looking for fast implementations of DES in software
> |> should take a look at an article in the latest issue of
> |> Computers & Security (Vol. 12, No. 5), titled "More efficient
> |> software implementations of (generalized) DES".
> |> The authors tested their code on the Mac architecture only,
> |> but another person ported it to the 8086. The throughput
> |> for an 80386/16 and single-DES in CBC-mode was 136.5kbits/s.
> I only have numbers for 80486sx/33Hz MSDOS Turbo C v 2.0 of
> 495kbits/s.  Divide by 2 for clock speed, divide by 1.5 for 386 -> 486
> speedup and you still get 165kbits/s.
> This is for a C DES library that was first posted to comp.sources.misc
> 18 months ago.  I don't have access to the paper (yet) but there are at
> least 3 or 4 independently developed 'fast' publicly available version of
> DES that have speeds in this range.
> If anyone is interested in the speedups being done in the various
> 'net available' implementations, I have put in ftp.psy.uq.oz.au
> pub/DES/ALGORITHM, a very, very brief (and not easy to read)
> description of the speedups that I am aware off.  If the above
> mentioned paper contains anything new, could some-one please
> email what they are.

First of all, the paper mostly discusses generalized DES (G-DES),
i.e. DES with arbitrary permutations and substitutions. If the
specific E permutation in DES is used, performance can be enhanced
since one can take advantage of its special structure. The authors
maintain, however, that this is the only optimization which will
give DES better performance than G-DES.

A whole range of techniques for the optimization of G-DES is covered
in the paper. Some of these are previously known:

1. The entire cipher structure can be transformed so that the E and
P permutations become adjacent and so can be combined into one.
There are two ways of doing this, the 32-bit model and the 48-bit
model.

2. Permutations can be implemented with small look-up tables which
are combined with OR or EOR.

Some (presumably) unknown techniques are several variants of pre-
computing key specific tables. This saves the EORing of subkeys in
each round. I am quoting the section headings:

7.1 EORing subkeys to one of the smaller tables. (*)
7.2 EORing to the appropriate bits of all smaller tables. (*)
7.3 Reordering the S-box entries with respect to the subkeys.
7.4 Moving the subkeys to make use of Sections 7.1 and 7.2 directly.

(*) see 2 above.

Then there's a known technique for DES:

The E permutation can be implemented (on a 32-bit architecture)
by one copy, one 8-bit rotate and two ANDs if the S-boxes are
implemented as four tables with 2^{12} entries of 4 bytes.

This can then be combined with the precomputation of key specific
tables mentioned above.

As far as I can see, some of these techniques are mentioned in
your ALGORITHM file, but not all.

Btw, I later discovered that the figure of 136.5kbit/s I quoted for
a 386/16 in my first posting was for G-DES. Code for DES wasn't
tested on the 8086 architecture, but extrapolating from the
Macintosh figures the speed increase will be a factor of 1.58, so
the actual throughput for a 386/16 is probably around 215kbits/s.
Besides, the move from 16-bit to 32-bit operands should improve
things considerably.

> Eric Young | Systems programmer - Psychology Dept. Queensland Uni.
> AARNet     | eay@psych.psy.uq.oz.au

-- 
Thor Kristoffersen   -   Oslo, Norway   -   thork@ifi.uio.no




TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2024 AOH