|
Original Message Date: 06 Jun 91 13:48:41 From: Uucp on 1:125/777 To: Tom Jennings on 1:125/111 Subj: Ralph Merkle's paper ^AINTL 1:125/111 1:125/777 From: toad.com!gnu (John Gilmore) To: pozar, tom Date: Thu, 6 Jun 91 11:51:59 PDT From postnews Thu Jul 13 03:10:10 1989 Subject: Merkle's "A Software Encryption Function" now published and available Newsgroups: sci.crypt Ralph Merkle called me today to let me know that Xerox was not going to let him submit his paper on a nice new set of encryption and hash functions to a journal for publication. The story is that a division of Xerox sells a lot of stuff to NSA and they threatened to pull their business if Xerox publishes it. There is no law that says NSA can stop Xerox from publishing it -- it's just a "business decision" on Xerox's part. Happily, however, I do not sell anything to the NSA. And I have a copy of the paper, which was distributed for several months by Xerox, without any conditions, before NSA even heard of it. The work was not government-sponsored or classified; there is no law that lets the government suppress it. As a courtesy to Xerox Corporation I could avoid publishing this paper. However, I prefer to extend the courtesy to the person who did the work, Ralph Merkle, who would like to see it released and used. I do thank Xerox for supporting his excellent work and hope that they will continue to do so. Mr. Merkle did not ask or suggest that I publish the paper, and should bear none of the blame (if any). I have published and distributed a number of copies of this paper, and I hereby offer to sell a copy of this paper to anyone who sends me $10 (cash preferred, checks accepted) and a return address. Send your requests to: Merkle Paper Publishing PO Box 170608 San Francisco, CA, USA 94117-0608 Since the paper is "published and is generally available to the public through subscriptions which are available without restriction to any individual who desires to obtain or purchase the published information", it is exempt from State Department export control under 22 CFR 120.18 and 22 CFR 125.1 (a), and is exportable to all destintions under Commerce Department General License GTDA under 15 CFR 379.3(a). It can therefore be sent to foreign as well as US domestic individuals. I believe that the availability of fast, secure cryptography to the worldwide public will enable us to build much more secure computer systems and networks, increasing individual privacy as well as making viruses and worms much harder to write. For example, the Snefru one-way hash function described in the paper would be a good choice for validating copies of programs downloaded from BBS systems or the net, to detect virus contamination. If UUCP and TCP/IP links could be encrypted with Mr. Merkle's Khafre or Khufu ciphers, simple monitoring of phone wires or Ethernets would not yield login passwords, private mail, and other serious security violations. The technology exists; all that stands in our way is a bureacracy that has no *legal* power to restrict us, if we follow the published rules. John Gilmore From postnews Thu Jul 13 03:31:56 1989 Subject: Online copy of Merkle's "A Software Encryption Function" Newsgroups: sci.crypt References: <7981@hoptoad.uucp> [See the parent article <7981@hoptoad.uucp> for the legalese. This netnews article is exportable under 22 CFR 120.18, 22 CFR 125.1 (a), and 15 CFR 379.3(a). It can be sent to all foreign as well as US domestic destinations. -- John] A Software Encryption Function by Ralph C. Merkle Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304 ABSTRACT Encryption hardware is not available on most computer systems in use today. Despite this fact, there is no well accepted encryption function designed for software implementation -- instead, hardware designs are emulated in software and the resulting performance loss is tolerated. The obvious solution is to design an encryption function for implementation in software. Such an encryption function is presented here -- on a SUN 4/260 it can encrypt at 4 to 8 megabits per second. The combination of modern processor speeds and a faster algorithm make software encryption feasible in applications which previously would have required hardware. This will effectively reduce the cost and increase the availability of cryptographic protection. Introduction The computer community has long recognized the need for security and the essential role that encryption must play. Widely adopted, standard encryption functions will make a great contribution to security in the distributed heavily networked environment which is already upon us. IBM recognized the coming need in the 1970's and proposed the Data Encryption Standard. or DES [19]. Although controversy about its key size has persisted, DES has successfully resisted all public attack and been widely accepted. After some debate its use as a U.S. Federal Standard has recently been reaffirmed until 1992 [14]. However, given the inherent limitations of the 56 bit key size used in DES [16] it seems clear that the standard will at least have to be revised at some point. A recent review of DES by the Office of Technology Assessment [15] quotes Dennis Branstad as saying the 'useful lifetime' of DES would be until the late 199O's. Despite the widespread acceptance of DES the most popular software commercial encryption packages (for, e.g., the IBM PC or the Apple Macintosh) typically offer both DES encryption and their own home-grown encryption function. The reason is simple -- DES is often 50 to 100 times slower than the home-grown alternative. While some of this performance differential is due to a sub-optimal DES implementation or a faster but less secure home-grown encryption function, it seems that DES is inherently 5 to 10 times slower than an equivalent, equally secure encryption function designed for software implementation. This is not to fault DES -- one of the design objectives in DES was quite explicitly a fast hardware implementation: when hardware is available, DES is an excellent choice. However, a number of design decisions were made in DES reflecting the hardware orientation which result in slow software performance -- making the current extensive use of DES in software both unplanned-for and rather anomalous. Having offered a rationale for an encryption function designed for software implementation, we now turn to the design principles, followed by the actual design. Basic Principles The basic design principles in DES seem sound. The fact that DES has not been publicly broken speaks in their favor. However, upon examining specific design decisions in DES, we find that several should be revised -- either in light of the software orientation of the new encryption function, or because of the decreased cost of hardware since the early '70's. Examining the basic design decisions one by one, and in no particular order, we can decide what reasonably should be changed. The selection of a 56 bit key size is too small, a problem which can be easily remedied. This subject has already been debated extensively, and while 56 bits seems to offer just sufficient protection for many commercial applications, the negligible cost of increasing the key size virtually dictates that it be done. The extensive use of permutations is expensive in software, and should be eliminated -- provided that a satisfactory alternative can be found. While permutations are cheap in hardware and provide an effective way to spread information (also called `diffusion' [21]) they are not the best choice for software. In the faster implementations of DES, the permutations are implemented by table look-ups on several bits at once. That is, the 48-to-32 bit permutation P is implemented by looking up several bits at once in a table -- where each individual table entry is 32 bits wide and the table has been pre-computed to contain the permutation of the bits looked up. Using a table-lookup in a software encryption function seems a good idea and can effectively provide the desired `diffusion' -- however there seems no reason to limit such a table to being a permutation, Having once paid the cost of looking up an entry in a table, it seems preferable that the entry should contain as much information as possible rather than being arbitrarily restricted to a small range of possibilities. Each individual S-box in DES provides only 64 entries of 4 bits each, or 32 bytes per S-box. Memory sizes have greatly increased since the mid 1970's when DES was designed, and larger S-boxes seem appropriate. More subtly, DES uses 8 S-boxes and looks up 8 different values in them simultaneously. While this is appropriate for hardware (where the 8 lookups can occur in parallel) it seems an unreasonable restriction for software. In software, each table lookup must follow the preceding lookups anyway -- for that is the nature of sequential program execution. It seems more valuable cryptographically to make each lookup depend upon the preceding lookup, because in software such dependency is nearly free. This means that the cascade of unpredictable changes that are so central to DES-type encryption functions can achieve greater depth with fewer lookups. Looked at another way, DES has a maximum circuit depth of 16 S-boxes, even though it has a total of 128 S-box lookups. If those same 128 S-box operations were done sequentially, with the output of each lookup operation altering the input to the next lookup, then the maximum circuit depth would be 128 S-boxes -- eight times as many and almost certainly providing greater cryptographic strength. This change would have very little impact on the running time of a software implementation on a typical sequential processor. A larger S-box size and sequential (rather than parallel) S-box usage should be adopted. The initial and final permutations in DES are widely viewed as cryptographically pointless -- or at least not very important. They are therefore discarded. The key schedule in DES has received mild criticism for not being sufficiently complicated[9]. In practice, all of the faster DES software implementations pre-compute the key schedule. This pre-computation seems a good idea when large volumes of data are being encrypted -- the pre-computation allows a more leisurely and careful arrangement of the encryption tables and means the actual encryption function can more rapidly scramble the data with less effort. A more complex key pre-computation therefore seems desirable. Finally, the design criteria used for the DES S-boxes were kept secret. Even though there is no particular reason to believe that they conceal a trap door, it would seem better if the criteria for S-box selection were made explicit, and some sort of assurances provided that the S-boxes were actually chosen randomly in accordance with the published criteria. This would both quiet the concerns about trap doors, and also allow a fuller and more explicit consideration of the S-box selection criteria. With this overview of design principles we can now proceed to the design. Khufu, Khafre and Snefru There are actually two encryption functions named Khufu and Khafre, and a one-way hash function named Snefru (all three names were taken from the Pharoahs of ancient Egypt following a suggestion by Dan Greene. To quote the Encyclopedia Britannica, "The ideal pyramid was eventually built by Snefru's successor, Khufu, and the first --- the Great Pyramid at Giza --- was the finest and must successful." Khafre was Khufu's son). The basic hardware model around which they are optimized is a 32-bit register oriented microprocessor. The basic operations are 32-bit load, store, shift, rotate, `xor' and `and'. The two encryption functions are optimized for somewhat different tasks, but use very similar design principles. Khufu is designed for fast bulk encryption of large amounts of data. To achieve the fastest possible speed, the tables used in encryption are pre-computed. This pre-computation is moderately expensive, and makes Khufu unsuited for the encryption of small amounts of data. The other encryption function -- Khafre -- does not require any pre-computation. This means Khafre can efficiently encrypt small amounts of data. On the other hand, Khafre is somewhat slower than Khufu for the encryption of large volumes of data because it takes more time to encrypt each block. The one-way hash function -- Snefru -- is designed to rapidly reduce large blocks of data to a small residue (perhaps 128 or 256 bits). Snefru requires no pre-computation and therefore can be efficiently applied to small arguments. Snefru will be used exclusively for authentication purposes and does not provide secrecy. We first discuss the design of Khufu. Khufu Khufu is a block cipher operating on 64-bit blocks. Although increasing block size was a very tempting design alternative, the 64-bit block size of DES has not been greatly criticized. More important, many systems built around DES assume that the block size is 64 bits. The pain of using a different encryption function is better minimized if the new encryption function can be easily 'plugged in' in place of the old -- which can be done if the block size is the same and the key size is larger. The new encryption function essentially looks exactly like the old encryption function -- but with some new keys added to the key space. Increasing the block size might have forced changes in more than just a single subroutine -- it might (for example) have forced changes in data formats in a communications systems. Khufu, like DES, is a multi-round encryption function in which two 32-bit halves (called L and R for Left and Right) are used alternately in the computations. Each half is used as input to a function F, whose output is XORed with the other half -- the two halves are exchanged and the computation repeated until the result appears to be random (no statistically detectable patterns). Khufu uses a different F-function than DES -- and uses different F-functions during the course of encryption. One round of DES uses an F-function defined by 8 table lookups and associated permutations. By contrast, one round of Khufu uses a single table lookup in a larger S-box. In addition, in the first step of encryption (prior to the main loop) the plaintext is XORed with 64 bits of key material, and again in the final step of encryption (following the main loop) the 64-bit block is XORed with another 64 bits of key material to produce the ciphertext. In somewhat more detail, the 64-bit plaintext is first divided into two 32-bit pieces designated L and R. L and R are then XORed with two 32-bit pieces of auxiliary key material. Then the main loop is started, in which the bottom byte from L is used as the input to a 256-entry S-box. Each S-box entry is 32-bits wide, The selected 32-bit entry is XORed with R. L is then rotated to bring a new byte into position, after which L and R are swapped. The S-box itself is changed to a new S-box after every 8 rounds (which we shall sometimes call an `octet'). This means that the number of S-boxes required depends on the number of rounds of encryption being used. Finally, after the main loop has been completed, we again XOR L and R with two new 32-bit auxiliary key values to produce the ciphertext. For efficiency reasons, we restrict the number of rounds to be a multiple of 8, i.e., an integral number of octets. If the main encryption loop is always executed a multiple of 8 times, then it can be unrolled 8 times -- which is substantially more efficient than the definitionally correct but inefficient versions given in this paper. For this reason, the variable `enough' given below must be an exact multiple of 8. Various integer calculations will not work correctly for values of 'enough' that are not multiples of 8. Encryption of a single 64-bit plaintext by Khufu can be viewed algorithmically as follows: L, R: int32; enough: integer; -- the security parameter, default of 16 seems appropriate. -- values of 8, 16, 24, 32, 40, 48, 56, and 64 are possible. SBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- key material AuxiliaryKeys: ARRAY[1 .. 4] OF int32; -- additional key material rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24]; octet: integer; -- really (round + 7)/8 -- it keeps track of which 8-round 'octet' we are currently in L := L XOR AuxiliaryKeys[l]; R := R XOR AuxiliaryKeys[2]; octet := 1; FOR round := 1 to enough DO -- Note that 'enough' must be a multiple of 8 Begin R := R XOR SBoxes[octet] [L AND #FF]; L := RotateRight[L, rotateSchedule[(round-1) mod 8 + 1] ]; SWAP[L,R]; if (round mod 8 = 0) then octet := octet + 1; End; L := L XOR AuxiliaryKeys[3]; R := R XOR AuxiliaryKeys[4]; Notationally, it will be convenient to index the different variables at different rounds. This indexing is explicitly given by re-writing the above algorithm and replacing L and R with arrays. In addition, we add the array 'i' to denote the indices used to index into the S-box. L, R: ARRAY [0 .. enough] OF int32; enough: integer; -- the security parameter, default of 16 seems appropriate. -- values of 8, 16, 24, 32, 40, 48, 56, and 64 are possible. i: ARRAY[0 .. enough] OF int8; -- 8-bit bytes SBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- key material AuxiliaryKeys: ARRAY[1 .. 4] OF int32: -- additional key material rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24]; octet: integer; -- really (round + 7)/8 -- it keeps track of which 8-round 'octet' we are currently in L[0] := L[-1] XOR AuxiliaryKeys[l]; R[0] := R[-1] XOR AuxiliaryKeys[2]; octet := 1; FOR round := 1 to enough DO -- Note that `enough' must be a multiple of 8 Begin i[round] := L[round-1] AND #FF L[round] := R[round-1] XOR SBoxes[octet] [ i[round] ]; R[round] := RotateRight[ L[round-1], rotateSchedule[(round-1) mod 8 + 1] ]; if (round mod 8 = 0) then octet := octet+ 1; End; L[enough + 1] := L[enough] XOR AuxiliaryKeys[3]; R[enough + 1] := R[enough] XOR AuxiliaryKeys[4]; The plaintext is (by definition) [L[-1], R[-1]], while the ciphertext is [L[enough + 1], R[enough + 1]]. By definition, round 1 computes L[1] and R[1] from L[0] and R[0], using index 1 -- or i[1]. Similarly, round n computes L[n] and R[n] from L[n-1] and R[n-1] using i[n]. We shall sometimes say that 'round' 0 computes L[0] and R[0] from L[-1] and R[-1], and that 'round' enough + 1 computes L[enough + 1] and R[enough + 1] from L[enough] and R[enough]. The primary purpose of the rotation schedule is to bring new bytes into position so that all 8 bytes of input are used in the first 8 rounds (or first octet). This means that a change in any single input bit is guaranteed to force the use of a different S-box entry within 8 rounds, and so initiate the cascade of unpredictable changes needed to hide the input. A secondary purpose of the rotation schedule is to maximize the number of rotates by 16 because they tend to be faster on many microprocessors. For example, the 68000 has a SWAP instruction which is equivalent to rotating a 32-bit register by 16 bits. Also, rotation by 16 tends to be very fast on processors with 16 bit registers -- simply by altering one`s viewpoint about which register contains the lower 16 bits and which register contains the upper 16 bits it is possible to perform this operation with no instructions at all. The final purpose of the rotation schedule was to restore the data to its original rotational position after each octet of 8 rounds. Thus, the sum of the rotations is equal to 0 module 32. A different S-box is used after each octet of encryption. This has two beneficial effects: first, it means that the same S-box entry will never be used twice with the same rotational alignment. That is, if a single S-box were used for all octets, then it might be that i[1] (the index used to select an S-box entry on the first round) and i[9] might be the same -- and therefore the same S-box entry would be used in rounds 1 and 9. These identical S-box entries would cancel each other out because a value XORed with itself produces 0. (If i[1] = i[9], then SBox[i[1]] XOR ...stuff... XOR SBox[i[9]] would equal ...stuff...) Both i[1] and i[9] would have had no effect on the encryption process. This would weaken the encryption function. If, however, the S-box is changed after every octet then even if i[1] = i[9], cancellation is very unlikely to occur (because SBoxes[1][i[1]] is almost certainly different from SBoxes[2][i[9]], even though i[1] = i[9]). A second beneficial effect is to insure that the encryption process is entirely different during the second octet than in the first octet. If the same S-box were used, then the second octet would compute the same function as the first octet -- which can be a serious weakness. The parameter 'enough' is used because encryption must continue for enough rounds to obscure and conceal the data. The exact number of rounds that is sufficient will no doubt be a matter of considerable debate -- it is left as a parameter precisely so that those who wish greater security can use more rounds, while those who are satisfied with fewer rounds can encrypt and decrypt data more rapidly. It seems very unlikely that fewer than 8 rounds (one octet) will ever be used, nor more than 64 rounds (8 octets). The author expects that almost all applications will use 16, 24, or 32 rounds. Values of 'enough' that are not multiples of 8 are banned. Pre-Computing the S-Boxes While 128 bits of key material is used at the start and finish of the encryption process (e.g., 64 bits at the start and 64 bits at the finish from the 128-bit array 'AuxiliaryKeys'), most of the key material is mixed in implicitly during the encryption process by selection of entries from the S- boxes. All the S-boxes (along with the 128 bits of auxiliary key material) are pre-computed from a (presumably short) user supplied key. The S-boxes ARE most of the key. This raises the question of how the S-boxes are computed and what properties they have. While the specific method of computing the S-boxes is complex, the essential idea is simple: generate the S-boxes in a pseudo-random fashion from a user supplied key so that they satisfy one property: all four of the one-byte columns in each S-box must be permutations. Intuitively, we require that selection of a different S-box entry change all four bytes produced by the S-box. More formally, (where `#' means 'not equal to' and SBoxes[o][i][k] refers to the kth byte in the ith 32-bit entry of the SBox used during octet 'o'): for all o, i, j, k; i # j implies SBoxes[o][i][k] # SBoxes[o][j][k]. We can divide the pre-computation of a pseudo-random S-box satisfying the desired properties into two parts: first, we generate a stream of good pseudo-random bytes; second, we use the stream of pseudo-random bytes to generate four pseudo-random permutations that map 8 bits to 8 bits. These four pseudo-random permutations ARE the generated S-box. We repeat this process and compute additional S-boxes until we have enough for the number of rounds of encryption that we anticipate. We could generate a stream of pseudo-random bytes using an encryption function -- but we have no S-box to use in such an encryption function! To circumvent this circularity problem, we can assume the existence of a single 'initial' S-box. Although we must get this initial S-box from somewhere, for the moment we assume it exists and satisfies the properties described earlier. We will discuss where it comes from later. We (rather arbitrarily) adopt a 64-byte 'state' value for our pseudo-random byte-stream generator. That is, the user-provided key is used to initialize a 64-byte block (which effectively limits the key size to 512 bits -- this does not seem to be a significant limit). This 64-byte block is then encrypted using Khufu (using the standard S-box for all octets, and setting the auxiliary keys to 0) in cipher block chaining mode. (Although use of a single S-box for all rounds will result in occasional cancellations as described earlier, this is acceptable for this particular application.) This provides 64 pseudo-random bytes. When these 64 bytes have been used, the 64-byte block is again encrypted, providing an additional 64 pseudo-random bytes. This process is repeated as long as more pseudo-random bytes are needed. Now that we have a stream of pseudo-random bytes, we must convert them into the needed permutations. We adopt the algorithm given in Knuth Vol II. In this algorithm, we start with some pre-existing (not necessarily random) permutation. For our purposes, we can start with the initial S-box. We then interchange each element in the initial permutation with some other randomly chosen element, thus producing a random permutation. In a pseudo programming language we have: FOR octet := 1 to enough/8 DO SBox:= initialSBox; FOR column := 0 to 3 DO BEGIN FOR i := 0 to 255 DO BEGIN randomRow := RandomInRange[i,255]; -- returns a random number -- between i and 255, inclusive SwapBytes [SBox[i,column], SBox[randomRow,column]]; END; END; SBoxes[octet] := SBox; END; The routine 'RandomInRange' uses the stream of random bytes to actually generate a number in the requested range. Khafre The design of Khafre is similar to the design of Khufu except that Khafre does not pre-compute its S-box. Instead, Khafre uses a set of standard S-boxes (discussed in the next section -- note that the standard S-boxes are different from the one initial S-box). The use of standard S-boxes means that Khafre can quickly encrypt a single 64-bit block without the lengthy pre-computation used in Khufu; however it also means that some new mechanism of mixing in key material must be adopted because the standard S-boxes can not serve as the key. The mechanism of key-mixing is simple -- key material is XORed with the 64-bit data block before the first round and thereafter following every 8 rounds. A consequence of this method is that the key must be a multiple of 64 bits -- it is expected that 64-bit and 128-bit key sizes will typically be used in commercial applications. Arbitrarily large key sizes can be used, though this will slow down encryption. We can summarize Khafre as follows: L, R: int32; standardSBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; key: ARRAY [0 .. enoughKey-1] OF ARRAY [0 .. 1] of int32: keyIndex: [0 .. enoughKey-1]; rotateSchedule: ARRAY [1 .. 8] := [16,16,8,8,16,16,24,24]; L := L XOR key[0][0]; R := R XOR key[0][1]; keyIndex := 1 MOD enoughKey; round := 1; octet := 1; WHILE (round <= enough) DO Begin L := L XOR standardSBoxes[octet] [R AND #FF]; R := RotateRight[R, rotateSchedule[round mod 8 + 1] ]; SWAP[L,R]; IF round MOD 8 = 0 THEN BEGIN L := L XOR RotateRight[ key[keyIndex][0], octet]; R := R XOR RotateRight[ key[keyIndex][1], octet]; keyIndex := keyIndex + 1; IF keyIndex = enoughKey THEN keyIndex := 0; octet := octet + 1; END; round := round + 1; End; IF keyIndex # 0 THEN ERROR; We again require that the number of rounds be a multiple of 8 for efficiency reasons. In addition, we will require that the key size evenly divide the number of octets + 1, That is, we require that enoughKey MOD (enough/8 + 1) = 0. (This requirement is shown in the code by the final test and error condition). This condition is imposed to insure that decryption can be done correctly. If we did not impose this condition, then we would have to compute the correct value of 'keyIndex' to use when we started to decrypt. For example, if we used a 128-bit key for 32 rounds to encrypt a 64-bit plaintext, then the final entry used in the key array would be key[1]. If, however, we had encrypted for 24 rounds the final entry would have been key[0]. Computing the correct value of keyIndex with which to start the decryption process would require additional computational steps -- it would require a MOD operation which is computationally expensive. To avoid this, we simply impose the condition that we can always decrypt by starting from the last entry in the key array (keyIndex = enoughKey-1). This effectively imposes the constraint that (enough/8 + 1)/enoughKey is exactly 0. Khafre will probably require more rounds than Khufu to achieve a similar level of security because it uses a fixed S-box. In addition, each Khafre round is somewhat more complex than each Khufu round. As a consequence of these two factors, Khafre will take longer than Khufu to encrypt each 64-bit block. In compensation for this slower encryption speed, Khafre does not require pre-computation of the S-box and so will encrypt small amounts of data more quickly than Khufu. Snefru Snefru is a one-way hash function. One-way hash functions are fundamentally intended for authentication, not secrecy, and so their design is somewhat different than that of conventional encryption functions. However, many of the same basic principles are applicable. Snefru in particular uses many of the design principles used in Khufu and Khafre. One way hash functions (also known as MDC's (Manipulation Detection Codes)) have been generally known for some time. The first definition was apparently given by Merkle [1,2] who also gave a method of constructing one-way hash functions from random block ciphers. A more recent overview was given by Jueneman, Matyas, and Meyer[11] and Jueneman[4]. The major use of one-way hash functions is for authentication. If a value y can be authenticated, then we can authenticate x by computing: y = F(x) No other input x' can be found (although they probably exist) which will generate y. A 128 bit y can authenticate an arbitrarily large x. This property is crucial for the convenient authentication of large amounts of information. Broadly speaking, there are two definitions for one-way hash functions. The first definition is for a `weak' one-way hash function. A weak one-way hash function is a function F such that: 1.) F can be applied to any argument of any size. For notational convenience, F applied to more than one argument is equivalent to F applied to the bit-wise concatenation of all its arguments. 2.) F produces a fixed size output. (The output might be 64 bits). 3.) Given F and x, it is easy to compute F(x). 4.) Given F and a 'suitably chosen' (e.g., random) x, it is computationally infeasible to find a x' # x such that F(x) = F(x'). The phrase 'computationally infeasible' used above can be replaced with any one of several more precise definitions -- each definition will in turn result in a somewhat different definition of a one way hash function. Snefru is intended to be a 'random' one-way hash function -- e.g., for all practical purposes it can be modelled by a very large table of random numbers, or by a `demon' who selects a random number whenever we wish to compute some output value for Snefru. This is discussed further in [18]. In a weak one way hash function there is no guarantee that it's computationally infeasible to find pairs of inputs that produce the same output. That is, it might be that F(z) = F(z') for some inputs z and z' that someone in fact found. However, if x # z and x # z', then this doesn't matter. Clearly, we must prevent someone from deliberately picking x so that it is equal to a value of z or z' which they know has this undesirable property. What's more, it must be difficult to generate too many z-z' pairs (and it clearly must be impossible to generate z-z' pairs in which we can effectively control the value of z or z') or even a randomly chosen x would not be safe. Thus, choosing x in a random fashion should make it unlikely that anyone can find an x' such that F(x)=F(x'). It is possible, however, to choose x in a non-random fashion and still be safe, if we assume that F is random (a reasonable assumption for Snefru), then any method of choosing x that does not depend on F is effectively random with respect to F and therefore suitable. This observation is sometimes important in practice, for it allows simpler (therefore faster and cheaper) methods of selecting x. Weak one-way hash functions also suffer from the problem that repeated use weakens them. That is, if a weak one-way hash function is used to hash 1,000 different random values for x (which might represent 1,000 different messages signed by a 1,000 different people) then the chances of finding an x' such that one of the thousand values of x satisfies F(x) = F(x') might be 1,000 times higher. As more people sign more messages with the same weak one-way hash function, the overall security of the system is degraded. This undesirable state of affairs can be dealt with by parameterization, which is the approach taken with Snefru. We simply define a family of one-way hash functions with the property that each member Fi of the family is different from all other members of the family, and so any information about how to break Fi will provide no help in breaking Fj for i # j. If the system is designed so that every use of a weak one-way hash function is parameterized by a different parameter, then overall system security can be kept high. The alternative definition is of a `strong' one-way hash function. A strong one-way hash function is a function F such that: 1.) F can be applied to any argument of any size. For notational convenience, F applied to more than one argument is equivalent to F applied to the bit-wise concatenation of all its arguments. 2.) F produces a fixed size output. (The output might be 128 bits). 3.) Given F and x, it is easy to compute F(x). 4.) Given F, it is computationally infeasible to find any pair x, x' such that x # x' and F(x) = F(x'). Strong one-way hash functions are easier to use in systems design than weak one-way hash functions because there are no pre-conditions on the selection of x, and they provide the full claimed level of security even when used repeatedly (or misused either because of error or sub-optimal system design). Many authors recommend the exclusive use of strong one-way hash functions[4,11] because of the practical difficulties of insuring that the pre-conditions on x imposed by a weak one-way hash function are met, and the increased problems in insuring that different parameters are selected for each use. In practice, the constraints imposed by use of a weak one- way hash function imply that x cannot be chosen by an agent who has a motive to break the system. If A signs a message which B provides, then A will have to 'randomize' the message by some means before signing it. If A fails to randomize the message, then B can select a 'rigged' message, and later surprise A by exhibiting a novel message and showing that A signed it. Weak one-way hash functions can be quite valuable in some system designs, although they must be used with caution. Parameterized weak one-way hash functions in particular appear to be under-rated. Snefru can be used as either a weak or a strong one-way hash function, with or without parameterization depending on the desires of the system designer. A one-way hash function F accepts an arbitrarily large input -- however, it is much easier to specify a function that accepts a fixed size input. We will, therefore, follow a two-step procedure to define F. First, we will define a fixed size function F0 which has the same properties as F but which accepts a fixed size input, and then we will use F0 to construct F. By definition, F0 has properties 2, 3, and 4 listed above for F; but replaces property 1 (which says that F can have an unlimited input size) with the simpler property that F0 can accept only a fixed size input. A fixed-size strong one-way hash function F0 has the following properties: 1.) F0 can be applied to a fixed size input argument (the input might be 256 bits). The size of the input must be larger than the size of the output. For notational convenience, F0 applied to more than one argument is equivalent to F0 applied to the bit-wise concatenation of all its arguments. 2.) F0 produces a fixed size output. (The output might be 128 bits). 3.) Given F0 and x, it is easy to compute F0(x). 4.) Given F0, it is computationally infeasible to find any pair x, x' such that x # x' and F0(x) = F0(x'). It is often convenient if the following property is also satisfied: 5.) Given F0(x), and a randomly chosen x, it is computationally infeasible to determine x. If we view x as an array, then we can define F(x) in terms of F0 in the following fashion: FUNCTION F(x[1 .. n]) BEGIN result := 0; FOR i := 1 to n DO result := F0(result,x[i]); END DO; RETURN(result); END; (Note that x can be padded with zero's at the end to insure that x[n] is of the correct size). We have added property 5 to our fixed-size strong one-way hash function because it will be convenient and useful to use F0 as a normal one-way function -- that is, the input is chosen randomly and is kept secret, while the output is made public. When used in this way, it is not necessary that we reduce the size of the input. For this reason, we will only require that F0 have this property, and will not attempt to show that F also has this property. We can show that F must satisfy properties 1 through 4 if F0 satisfies properties 2 through 4, and if F0 accepts only a fixed size input. From the program defining F it is obvious that it will accept an input of arbitrary size, so property 1 is satisfied. It is also obvious that F will produce a fixed size output -- which is the size of 'result' -- so property 2 is satisfied. Property 3 follows because computation of F(x) requires time linear in the size of x -- (which we actually take as the definition of 'easy') -- and because computation of F0 is 'easy'. The fact that property 4 holds can be shown inductively as follows: Clearly, if n = 1, then property 4 holds for it holds for F0. Assume, then, that the property holds for n-1, and we wish to prove it for n. We know that: result := F0( F(x[1 .. n-1]), x[n]) From the fact that property 4 holds for F0 it follows that neither F(x[1 .. n-1]) nor x[n] could have been changed -- if they had been, we would have two inputs to F0 that produced the same output. But if F(x[1 .. n-1]) has not been changed, then x[1 .. n-1] has not been changed, by the induction hypothesis. Q.E.D. The preceding paragraphs mean simply that in order to design a strong one-way hash function we need to design a fixed-size strong one-way hash function F0, from which we can then build F. In what follows, we shall define F0 and present intuitive arguments that it is difficult to break. Traditionally, one-way hash functions have been designed by treating the input, x, as the key and then encrypting a fixed size block. We will pursue a different approach. In this approach, we will treat the input, x, as the input to a block cipher which uses a fixed key. We will then encrypt x and exclusive or the output of the encryption function with x. That is, F0(x) is defined as: E(0, x) XOR x where E(key, plaintext) is a 'good' conventional block cipher. We then retain as many bits of the output as we desire. This general approach has been used before[12,23] but must still be justified. We prefer this approach to the more traditional approach (in which x is treated as the key to an encryption function) because it is faster. In the traditional approach, it is necessary to mix the key with a fixed plaintext block during the encryption process. This mixing process requires additional steps. In particular, the key must be repeatedly re-loaded from memory to be re-mixed with the block being encrypted. (By way of example, consider that the 56 bit key used in DES is actually expanded internally into a 768-bit key, so that each bit of key material can be mixed into the block being encrypted several times). On the other hand, if we treat x as the input to a block cipher then we can use a fixed key, need do no key mixing, and can still provide excellent security. To show that security is preserved using this method, we will appeal to the intuition that a 'good' encryption function appears to be random. That is, any change to the input will produce an unpredictable and apparently random change in the output. E(0, x) is totally unrelated to E(0, x XOR 1) -- changing a single bit produced a 'random' change. We presume that there is no effective method of analyzing E and that therefore it must be viewed as a 'black box' -- it's possible to encrypt or decrypt, but it's not possible to make any prediction about the value that will be produced (unless we've already encrypted or decrypted that particular value). If E is random in the sense discussed above, then E(0, x) is random -- even if x is highly structured. Therefore E(0, x) XOR x is random and cannot be predicted from a knowledge of x. To determine an x' such that F0(x) = F0(x'), x' must (by definition) satisfy the equation: E(0, x) XOR x = y = E(0, x') XOR x' However, if we assume that the only operations we can perform for the cryptanalysis are encryption and decryption of a block, i.e., the computation of E(0,w) or D(0,w) (where D stands for Decryption) for any value of w that we choose, then our chances of actually finding x' are little better than chance. We can select some w by any method we desire, and then compute E(0,w) -- but this will produce a nearly random value which, when XORed with w, will have a very small probability of being equal to y. If we operate in the reverse fashion and compute D(0,w) XOR w it too, for the same reason, will only equal y by chance. The critical question that we cannot answer with absolute reliability is whether F0 is in fact 'random' in the sense needed for the foregoing proof. This question (at present) can only be answered empirically by means of a certificational attack -- we have been unable to break this system, and so we hope that it cannot be broken. We will in fact propose 3 fixed-size one-way hash functions: HASH128, HASH256, and HASH512 which hash down 128, 256, or 512 bits, respectively. We will then define the final hash function, HASH, in terms of these four basic functions. The primary purpose of providing 4 one-way hash functions is efficiency -- if we wish to hash only 128 bits it is significantly more efficient to use a function which accepts a 128 bit input than to use a function which accepts a 512 bit input and zero-pad the 128-bit value out to 512 bits. In addition, HASH, HASH128 and HASH256 accept a 64-bit parameter, while HASH512 accepts a 96-bit parameter. The reader can ignore these parameters without loss of continuity, the general ideas presented below do not depend on them. It is quite possible (and simpler) to design a system that does not use these parameters -- however, we might have to double the amount of authentication information that we store. That is, the output would have to be around 128 bits to provide good security[4]. Whether or not this is a significant problem will depend on the specific system being designed and the particular way the hash functions are being used. In essence the additional parameters can be used to make exhaustive search attacks harder. The basic idea will be familiar to those who have thought about the problems of applying a one-way function to passwords -- if the same one-way function is used for all users, then the fact that two users used the same password is readily apparent if you examine the password file that contains the encrypted result for each user. Furthermore, by applying the one-way function to all the words in a dictionary, it is possible to recover all passwords that are English words. However, if each one-way function used to encrypt each user's password is slightly different from all the other one-way functions, then any would-be system cracker must encrypt each word in the dictionary not with a globally known and fixed one-way function, but with each individual user's one-way function before recovering all passwords that are English words. In a similar way, if HASH is used throughout a system, then we can reasonably expect that many arguments x0, x1, x2,.... will have been hashed into many results y0, y1, y2,.... Now, if an interloper wishes to attack the system he need only find a false xbad that maps onto some legitimate yi: that is, HASH(xi) = yi = HASH(xbad). This clearly is a problem because it might let the interloper 'authenticate' the false xbad when in fact it is not authentic. The more y's there are, the greater the probability that a randomly chosen x will map onto one of them. Thus, the more HASH is used, the more popular it becomes, the easier it will be to find an xbad for some legitimate xi such that HASH(xi) = yi = HASH(xbad). If, however, each use of HASH is parameterized with a different value, then the interloper must attack each use of HASH as though it were an entirely different hash function -- as indeed it will be if the hash function is correctly designed. More precisely, if the interloper finds an xbad such that HASH(p,xi) = yi = HASH(p',xbad) this will do him absolutely no good as long as p and p' are not equal. This parameter must be passed from HASH to the basic functions HASH128, and HASH256 which are used to define it. HASH512 must not only receive the parameter passed from HASH, but must also receive an additional parameter because HASH512 is used repeatedly when hashing down a large argument. That is, hashing a one-gigabyte file will need some 16,000,000 usages of HASH512 -- if two of these uses produced the same result, then we could simply delete the part of the file between them and produce a shortened file that would still produce the same result. To prevent this, all 16,000,000 uses of HASH512 invoked by HASH to hash down the one-gigabyte file must accept an additional parameter to insure that each use is different. We start with the design of HASH512 -- the largest fixed-size one-way hash function and therefore the most efficient for hashing large blocks of data. Conceptually, the hash functions all return 128 bits. If fewer bits are needed (e.g., adequate security is provided by 128 bits), then trailing bits are discarded. In practice, it will often be more efficient to return only as many bits as are required. HASH512 accepts an input, x, and two parameters which are named p64 (a 64-bit parameter) and p32 (a 32-bit parameter). Because HASH512 accepts p64 and p32 it is necessary to add these arguments to the conventional block cipher E which is used to define HASH512. (Note that we assume E returns a full 512-bit value -- conceptually we discard any unneeded bits after E512 has been computed). We can now provide a definition of HASH512 in terms of E512 as follows: HASH512(p64, p32, x) = leading 128 bits of ( E512(p64, p32, 0, x) XOR x) If we now specify E512(p64, p32, 0, x) -- a conventional block cipher that encrypts a 512-bit block with parameters 'p64' and 'p32' and which uses a fixed key -- our task is complete. In essence, we extend the design of Khufu by assuming a block size of 512 bits (16 32-bit words). This block size was selected as a compromise between two factors. We can more efficiently hash more data if the block size is large. On the other hand, if the block size is too large it won't fit into the registers on the computer implementing the hash function, so parts of the block will have to be repeatedly loaded and stored. Most modern RISC chips have many registers (more than 16 32-bit registers), so on most of these chips the entire 512-bit block being encrypted can be kept in registers. There will then be no need to load or store parts of the block during computation of the hash function. We modify the basic encryption loop for Khufu as follows: instead of two halves, L and R, we have 16 'halves' in an array. We designate this array as 'Block[0 .. 15]'. Because we have also added a 64-bit parameter to this fixed-size one-way hash function we need to mix in this parameter as well. The algorithm for doing this is given below: Function E512(p64, p32, 0, x) p64: INT64; p32: 1NT32; x: INT512; -- a 512 bit 'integer'. BEGIN blockSize = 512; -- the block size in bits blockSizeInBytes = blockSize/8; -- the block size in 8-bit bytes, here just 64 bytes blockSizeInWords = blockSize/32 -- the blocksize in 32-bit words, here just 16 words p64upper, p64lower: int32; -- the 64-bit parameter, which is represented as two 32-bit halves tempBlock, Block: array [0 .. blockSizeInWords-1] of int32; StandardSBoxes: ARRAY [1 .. enough/8] OF ARRAY [0 .. 255] OF int32; -- Fixed for all time rotateSchedule: ARRAY [1 .. 4] := [16,8,16,24]; index: integer; byteInWord: integer; sBoxEntry: int32; p64upper := Upper32BitsOf(p64); p64lower := Lower32BitsOf(p64); Block := x; -- note that x must be 512 bits or smaller. The -- trailing bits in Block are zero-filled FOR index + 1 to enough/blockSizeInBytes - 1 DO -- Mix in the parameters that parameterizes this instance Block[0] + Block[0] XOR p64lower; Block[1] + Block[1] XOR p64upper; Block[2] + Block[2] XOR p32; FOR byteInWord := 1 to 4 DO -- for each of the four columns FOR i := 0 to blockSizeInWords-1 DO next := (i+1) MOD blockSizeInWords; last := (i-1) MOD blockSizeInWords: -- Pick sboxes in sequence of 1,1,2,2,1,1,2,2,1,1,2,2,... -- ...1,1,2,2,3,3,4,4,3,3,4,4,... etc. Note that -- using the S-boxes in this sequence prevents self cancellation -- if the same entry is used twice. SBoxEntry := standardSBoxes[ 2*index-1 + (i MOD 2)] [Block[i].bottomByte]; Block[next] + Block[next] XOR SBoxEntry; Block[last] + Block[last] XOR SBoxEntry; ENDLOOP; -- rotate all the 32-bit words in the block at once by the correct amount FOR i: INTEGER IN [0.. wordCount) DO Block[i] + RotateRight[Block[i], rotateSchedule[byteInWord]]; ENDLOOP; ENDLOOP; -- end of byteInWord going from 1 to 4 ENDLOOP; -- end of index going from 1 to enough/blockSizeInBytes-1 -- flip the Block. The first word and the last word are interchanged, etc. tempBlock := Block; For i := 0 to blockSizeInWords-1 DO BEGIN Block[i] := tempBlock[blockSizeInWords-i-1]; END; RETURN(Block); End; For efficiency reasons, it is expected that in an actual implementation the inner loops would be unrolled blocksize or 4*blocksize times. This would mean that all shifts would be by fixed amounts (if unrolled 4*blocksize times) and that no array indexing, divisions or mod computations would actually be performed in the unrolled loop because the values would be known at compile time. The array 'Block' would be kept in 16 registers, and reference to individual array elements (because the array indices would be known at compile time) would be to specific registers. HASH512 can be used to efficiently hash large amounts of data. If only small amounts of data need be hashed, then the related functions HASH256 and HASH128 should be used. They are precisely the same as HASH512 except that 'blockSize' is changed from 512 bits to 256 or 128 bits as appropriate, and p32 (the 32-bit parameter) is always 0. Finally, we define HASH(p, x) in terms of the fixed size hash functions. To insure that HASH will always be computed efficiently, we first determine the size of the argument, x. If the argument is small, we use the appropriate fixed-size hash function for speed of computation. If the argument is large, we use HASH512 repeatedly to reduce its size. We now define HASH as follows: Function HASH(p64,x): INT128; x: ARRAY[0 .. n-1] OF INT512; -- note that this declaration actually defines n p64: INT64; p32: INT32; -- a 32-bit integer variable that counts how many times HASH512 is used hashArray: ARRAY[0 .. 3] OF INT128; -- an array of 128-bit 'integers' hashLoc: integer; -- index into hashArray BEGIN p32 := 0; -- SizeOf returns the size of its argument in bits -- Yes, it is somewhat inconsistent to view 'x' as an array of 512-bit elements and -- also allow it to have a length of less than 512 bits if there is only one element -- in the array - but this is a very high-level programming language that understands -- such oddities. IF SizeOf(x) <= 128 THEN RETURN(HASH128(p64, x)) ELSE IF SizeOf(x) <= 256 THEN RETURN(HASH256(p64, x)) ELSE IF SizeOf(x) <= 512 THEN RETURN(HASH512(p64, p32, x)) ELSE -- actually have to reduce a large amount of data BEGIN p32 := 0; -- count of number of calls to HASH512 starts at 0 hashLoc := 0; -- start using hashArray[0] hashArray := 0; -- zero out all entries in this array FOR i := 0 to n-1 DO BEGIN hashArray[hashLoc] := HASH512(p64,p32,x[i]); p32 := p32 + 1; hashLoc := hashLoc + 1; IF hashLoc >= 4 THEN BEGIN hashArray[0] := HASH512(p64,p32,hashArray); p32 := p32+1; hashLoc := 1; END; END; END; RETURN(HASH512(hashArray,p64,p32)); END; HASH is more complex than the function F which we discussed earlier for two reasons. First, the precise pattern used to hash down a large x is different; and second it uses a more efficient mechanism for small arguments. The fundamental concepts, however, are identical. Making the Initial and Standard S-Boxes We need an initial S-box to generate a pseudo-random stream of bytes. We also need a set of standard S-boxes to use in Khafre and Snefru during the encryption process. In all three applications, we need assurances about how the S-boxes were generated. This was a major question in DES -- whether any structure (intentional or accidental) might be present in the S-boxes that would weaken the encryption function. Because the method of selecting the DES S-boxes was kept secret, the published articles on the structure of DES are necessarily incomplete. Published discussions of the structure in the DES S-boxes makes it clear that very strong selection criteria were used, and much of the structure actually found can reasonably be attributed to design principles intended to strengthen DES. The purpose behind some of the structure detected is unclear; though it does not appear to weaken DES it would be useful to know if the structure serves some purpose or whether it occurred as an unintended consequence of the particular method chosen to actually generate the S-boxes. To avoid these questions, the standard S-boxes will be generated from the initial S-box according to the standard (and public) algorithm for generating a set of S-boxes from a key. The key selected for the standard S-boxes will be the null (all 0) key. In this way, not only the standard S- boxes but also the algorithm for generating them are made public and can be examined to determine if there are any weaknesses. The initial S-box must be generated from some stream of random numbers. In order to insure that the initial S-box does not have hidden or secret structure, we adopt the following rules: 1.) The program that generates the initial S-box from a stream of random numbers will be public. 2.) The stream of random numbers used as input to the program should be above reproach -- it should be selected in such a fashion that it could not reasonably have been tampered with in a fashion that might allow insertion of a trap-door or other weakness. The first criteria is met rather simply by publishing the algorithm used to generate the standard S-box (publication is planned in the near future). The second criteria is met by using the random numbers published in 1955 by the RAND corporation in 'A Million Random Digits with 100,000 Normal Deviates' (available on magnetic tape for a nominal fee). Given this approach, insertion of a trap-door would require (1) that the publicly known programs that generate the initial and standard S-boxes insert the trap door under the very nose of a watching world or that (2) the trap-door have been planned and inserted into the random numbers generated by RAND in 1955, over 30 years prior to the design of Khufu (at a time when Khufu's chief designer found successfully negotiating a flight of stairs an absorbing challenge). Methods of Cryptanalysis Questions about the security of a new cryptographic algorithm are inevitable. Often, these questions are of the form 'Have you considered the following attack...' It is therefore useful to describe the attacks that were considered during the design process. This serves two purposes. First, it reassures those who find their attack has already been considered (and presumably found non-threatening). Second, it tells those who are considering a new attack that the matter might not be settled and is worth pursuing further. A second question typically asked is 'How many rounds are enough?' This will vary with three factors: the value of the data being encrypted, the encryption speed (delay) that is acceptable, and the estimated cost of cryptanalysis. This last cost is inferred by considering how many rounds are sufficient to thwart various certificational attacks. Attacks can be broadly divided into a number of categories -- starting with chosen plaintext, known plaintext and ciphertext only. We shall primarily consider attacks of the chosen plaintext variety -- a system secure against chosen plaintext attacks is presumably also secure against the two weaker attacks. Some consideration will be given to known plaintext and ciphertext only attacks. Protection against casual browsers is valuable and can be provided more cheaply (i.e., with fewer rounds in the encryption process and hence less delay). An attack by a casual browser is better modeled by a ciphertext only attack. At the same time, the cryptographic resources the casual browser is likely to bring to bear are markedly inferior. Finally, the cost of encryption (in user inconvenience or delay) might be significant and the value of the data might not justify much inconvenience -- the choice might be between rapid encryption that offers protection against casual attack and no encryption at all. Without further ado, and in no particular order, we discuss the major attacks considered during the design phase. We first consider attacks against Khufu with a reduced number of rounds. We shall here consider attacks against an 8 round Khufu and will start with a chosen plaintext attack. We assume that the objective is to determine the entries in the S-box and the values of the auxiliary keys. While it might theoretically be possible to take advantage of the fact that the S-box was generated in a pseudo-random fashion from a smaller key (effectively limited to 64 bytes) this has so far not proven to be the case. The pseudo-random method of generating the S-box from the key is sufficiently complex that the 64-byte to 1024-byte expansion involved in this process appears quite strong. This is probably due to the relaxed computational time requirements on the pre-computation, i.e., the pre-computation is probably over-kill, but in most applications an additional fixed delay of some tens of milliseconds probably won't be noticed, so it wasn't further optimized. An 8 round encryption can be readily broken under a chosen plaintext attack by noting that each round of the encryption process is affected by only a single byte of the initial plaintext. Therefore, given 8 rounds and 8 bytes of plaintext, some byte of plaintext is used last; e.g., in the 8th round. By encrypting two plaintext blocks that differ only in this last byte, we obtain two ciphertext blocks in which the encryption process differs only in the 8th round, and therefore in which the two left halves have the same difference as two S-box entries. That is, if the two ciphertext left halves are designated L[8] and L'[8] and if the indices of the S-box entries used in the 8th rounds are designated i[8] and i'[8], then L[8] XOR L'[8] equals SBox[i[8]] XOR SBox[i'[8]]. L[8] and L'[8] are known, as are i[8] and i'[8], so this provides an equation about two S-box entries. After we recover roughly 256 equations we will almost be able to solve for the 256 entries in the S-box. At this point, the recovery of the S-box will not quite be complete -- we can arbitrarily set the value of a single S-box entry and determine values for the rest of the entries that will satisfy the equations we have. Further equations will not help us, for if we have one solution to the equations, we can generate another solution by complementing the ith bit in every proposed S-box entry. The new set of values will also satisfy the equations, for every equation XOR's two S-box entries, and hence complementing the ith bit in both entries will leave the XOR of the two bits unchanged. We need another method for resolving this last ambiguity. This is conceptually easy (in the worst case, we could simply examine all 2**32 possibilities) but an efficient algorithm is difficult to explain in a short space -- we therefore leave this as an exercise for the reader. Once the S-box entries are known, it is also relatively simple to determine the auxiliary keys. If we consider a known plaintext attack against an 8 round encryption, we find the problem is more difficult. Certainly, we could request a large number of plaintext-ciphertext pairs and hope that at least some of the pairs differed only in the final few bytes (e.g., the bytes that are used only on the 7th and 8th rounds of encryption) but this would require many millions of such pairs. This, of course, presumes that the plaintext is selected randomly -- which implies that cipher block chaining (or some other pseudo-randomization method) is used to insure that patterns in the plaintext are eliminated prior to encryption. Direct encryption (without some 'whitening' or pre- randomization) of sufficient text would probably result in 8-byte blocks that differed only in a single byte -- which might allow use of the method described above. The following paragraph condenses an entire Ph.D. thesis and some speculative ideas. It can be skipped without loss of continuity. A statistical attack of the 'hill-climbing' variety [22] might prove successful. In such attacks the discrete points in the key-space are embedded in a continuous space. Once this is done, it is possible to search the space using continuous optimization methods that are difficult to apply to a discrete space. A version of such an attack that would be relatively easy to implement on commercially available computers might view the encryption process as acting on bytes. In the cryptanalysis, each byte (whether it be a byte of plaintext, of ciphertext, a byte in the S-box, or a byte in some intermediate computation) would be represented by a vector with 256 entries, each entry being a real number in the range from 0.0 to 1.0 inclusive. Intuitively, these real numbers represent the probability that the byte takes on the corresponding value, That is, if the 23rd entry in such a vector were .3, then there would be a 30% chance that the byte associated with that vector was 23. (The sum of the entries for a given vector should be 1.0, i.e., the probability that the byte has some value between 0 and 255 inclusive should be 1). Because the S-box is initially completely unknown, we would associate each byte in the S-box with a vector all of whose entries were equal to 1/256. By representing each actual byte in the computation with a vector of real numbers, we have effectively mapped the discrete cryptographic encryption process onto a continuous process. Given a plaintext-ciphertext pair, we can map the discrete encryption process into a continuous analog computation. We can 'encrypt' the continuous plaintext with the continuous 'S-box' and produce a continuous 'ciphertext'. The continuous plaintext, because it is known, would be represented by 8 vectors each of which had a single 1 entry in the correct place, with all other entries being 0. Because the initial continuous approximation to the S-box does not correspond to any discrete S-box, the result of the encryption would be continuous 'ciphertext' which would consist of 8 vectors with some (initially flat) distribution showing the probable values for the bytes in the actual ciphertext. Because the computed probability distribution for the ciphertext will differ from the actual ciphertext (which is known) it is possible to generate an error signal. If we compute many continuous 'ciphertexts' from many plaintexts and compare all the continuous 'ciphertexts' with the actually known ciphertext values, then we can generate an aggregate error signal that reflects all the available information. By changing the probability distributions associated with the bytes in the S-box (e.g., by moving to adjacent points in the continuous key space) it is possible to change this error signal. Minimizing the error signal as a function of the S-box values is an optimization problem, and there are many algorithms in the literature which can be applied. The simplest approach would be to find the direction in the continuous key space which best minimizes the error, and then travel some distance in that direction. This is simply the method of steepest descent. In general, the solution of non-linear optimization problems is difficult, and whether or not such an approach will work with an 8 round Khufu is an interesting and as yet unanswered question. Because such an attack must depend on statistical 'roughness' in the encryption process, and because (as discussed later) 8 rounds is significantly 'rough', such an attack might work. Finally, a ciphertext only attack against an 8-round Khufu results in a very difficult problem. A modification of the attack sketched above might be mounted in which the ciphertext is 'decrypted' and the resulting 'plaintext' is then viewed statistically to measure how 'close' it is to being reasonable. For example, we could score the 'decryption' by multiplying the probability vector for each byte of 'plaintext' times the actual probability distribution (assuming that some reasonable approximation to this distribution is known). Fundamentally, hill climbing and other statistical attacks must rely on statistically detectable differences between various alternatives. If the statistics are flat, then such techniques will fail. An important question with Khufu is the number of rounds required to achieve a statistically flat distribution. Preliminary results indicate that 16 rounds produces flat statistics. The use of auxiliary keys were largely adopted for three reasons: first, it seemed intuitively reasonable that randomization of the input by XORing an unknown quantity would assist in the encryption process. Second, four additional register-to-register XOR operations are cheap to implement. Finally, the auxiliary keys foil a specific chosen ciphertext attack. This attack depends on the observation that, although the S-box has 256 entries, the encryption process does not use all entries for each plaintext-ciphertext pair. Even worse, although a typical 8-round encryption will use 8 different S-box entries it doesn't have to: some entries could be repeated. In the worst case, a single entry would be repeated 8 times -- which would effectively mean that only 32 bits of key material was used. If the auxiliary keys were not present then we could simply guess at the value of one of the S-box entries, and then confirm our guess if we could find a plaintext-ciphertext pair that used only that entry for all 8 rounds. Because each of the 8 rounds uses a single byte from the plaintext, we could actually construct the plaintext needed to confirm our guess (if the auxiliary keys were not present). For example, if we guess that the 0th S-box entry has some specific value, then we would select the first byte of our specially-built plaintext (by 'first byte' we mean i[1], or that byte of plaintext used as an index into the S-box in the first round) to be 0. Then, knowing what happens in the first round, we can select the second byte of plaintext so that the 0th entry is again selected on the second round -- which would tell us what happens in the third round. By repeating this process for 8 rounds, we can construct a plaintext which, when enciphered, will tell us whether or not the 0th S-box entry does or does not have a specific value. After trying 2**32 values we will surely find the correct one. If we then repeat this whole process for the 1st entry, and then the 2nd entry, etc, we could determine the values of all the entries in the S-box. The auxiliary keys prevent this attack because they effectively inject 64 bits of key material into the encryption process prior to selecting S-box entries. Thus, correctly guessing a 32-bit S-box entry is insufficient because we would also have to guess the 64-bit value XORed with the plaintext prior to encryption. If we guessed a single such bit incorrectly, then the incorrectly guessed bit would (within the first 8 rounds) cause selection of an uncontrolled S-box entry which would then initiate the uncontrolled avalanche of changes that we rely upon to provide cryptographic strength. Although this attack is actually rather inefficient compared with our first chosen ciphertext attack, it does point out that there is no guarantee that multiple different S-box entries have actually been used during encryption. Instead, we must assure ourselves that the risk of this occurring is sufficiently low by explicitly computing the probability of its occurrence. Another attack in this general class is the cancellation attack. In this attack, we alter the first byte of the 8 bytes in the plaintext, and then attempt to cancel the effects of this alteration by altering the other 32-bit half in a compensating fashion. That is, by altering the first byte of plaintext used in the first round, we cause a change in the second round that we can understand. Because we can also change the other 32-bit half, this understandable change in the second round can be cancelled. (Notice that the auxiliary keys have very little impact on this attack). Now, if the first byte were 3, and we changed it to a 5, then this would produce a predictable change in the value XORed with the other 32-bit half, R, in the first round. This first round is computed as: i[1] := L[0] AND #FF: L[1] := R[0] XOR SBox[ i[1] ]; For the first plaintext we encrypted, this would become: L[1] := R[0] XOR SBox[3]; while for the second plaintext encrypted, this would become: L'[1] := R'[0] XOR SBox[5]; Therefore, if we select R'[0] = R[0] XOR SBox[3] XOR SBox[5], then the second equation becomes: L'[1] := R[0] XOR SBox[3] XOR SBox[5] XOR SBox[5] or L'[1] := R[0] XOR SBox[3] But this means that L'[1] = R[0] XOR SBox[3] = L[1] In other words, L[1] and L'[1] are identical -- by knowing SBox[3] XOR SBox[5] we were able to cancel out the change that should have taken place in L[1]. This, of course, means that the avalanche of changes upon which encryption so critically depends has been thwarted at the very start. Notice that after the first round of encryption, the two blocks differ only in the first byte -- that is, the byte used in the first round. After 8 rounds of encryption, the resulting ciphertexts will also differ in only this one byte. In practice, this attack seems to require that you first guess the correct value of SBox[i] XOR SBox[j] for two different values of i and j. This is a 32-bit value, and so on average it seems necessary to try 2**32 different values before encountering the correct one. After 8 rounds of encryption, however, the fact that we have determined the correct 32-bit 'cancellation value' will be obvious because the final 64 bits of ciphertext generated by the two different plaintexts will differ in only a single byte. It might not at first be obvious, but we can in fact modify this attack so that only 2 * 2**16 plaintext- ciphertext pairs are required in order to find the correct cancellation value. Although as described above, it would seem that we need 2**32 pairs of plaintext-ciphertext pairs to test each possible 32-bit cancellation value, this is not the case. We can generate two lists of plaintext-ciphertext pairs, and then by selecting one plaintext-ciphertext pair from one list and the other plaintext- ciphertext pair from the other list, we can generate 2**32 possible combinations of entries from the two lists. If we select the plaintexts used to generate the lists carefully, then every 32-bit cancellation value can be represented by one entry from the first list, and one entry from the second list. When we consider this attack on a 16 round Khufu it is much weaker. If we can determine the correct 32-bit cancellation value it will cause collapse of the encryption process up until the changed byte is again used. If the first byte has been changed, then it will again be used on the 9th round -- this means that in a 16-round Khufu a cancellation attack will effectively strip off 8 rounds. The remaining 8 rounds must then provide sufficient cryptographics strength to resist attack. Empirical statistical tests indicate that 8 rounds in which changes take place in the first one or two rounds will result in apparently random output -- though of course, this result demonstrates only that the output was random with respect to the specific statistical tests used, not that all possible statistical tests would reveal no pattern. An attack proposed by Dan Greene is based on the observation that each 32-bit half is being XORed with values selected (possibly with a rotation) from the S-box. Once the key has been chosen this S-box is fixed -- so at most 256 different values can be used and each value can be rotated (in the first 8 rounds of Khufu) in only four different ways. That is, we are at best applying a fixed and rather limited set of operations to each half. If we focus on the right half R, (and if we neglect the effect of the auxiliary keys) then we find that: R8 = R0 XOR ROTATE[SBox[i1,0] XOR ROTATE[SBox[i3],16] XOR ROTATE[SBox[i5],24] XOR ROTATE[SBox[i7],8]] R8 designates the right half following 8 rounds of Khufu, i.e., the right half of the ciphertext. R0 designates the right half before encryption begins, i.e., the right half of the plaintext. Although the indices used to select the S-box entries are computed during encryption, we are going to ignore their actual values. Instead, we will assume that i1, i3, i5 and i7 are selected randomly. This should not weaken the encryption function, so any cryptanalytic success we have using this assumption indicates weakness in the original system as well. If we define Y = Y[i1, i3, i5, i7] = ROTATE[SBox[i1,0] XOR ROTATE[SBox[i3],16] XOR ROTATE[SBox[i5],24] XOR ROTATE[SBox[i7],8]] we can re-express the earlier equation as: R8 XOR R0 = Y[i1, i3, i5, i7] The left side of this equation is readily computed from a plaintext-ciphertext pair, and with enough such pairs we can compute the probability distribution of (R8 XOR R0). The right side should determine the same distribution (if we assume the actual indices are more or less random -- which should be a good approximation if the plaintext is random!). The 4 8-bit indices clearly could generate at most 2 possible values for Y, but it seems more plausible that some values for Y will be produced more than once while other values for Y will not be produced at all. That is to say, the distribution of Y's will not be uniform. If we can compute this distribution from enough plaintext-ciphertext pairs, and if it is non-uniform, could we then cryptanalyze an 8 round Khufu? Statistical evidence gathered on a 16-round Khufu suggests that this attack will fail for 16 rounds, but its success for 8 rounds is still unclear. Even given the distribution of Y's it is not clear (at the time of writing) how to determine the actual S-box entries. Summary An 8-round Khufu can be broken by several attacks, though it is reasonably resistant t0 ciphertext only attack. A 16-round Khufu has so far resisted cryptanalysis. Preliminary statistical analysis suggests that a 16-round Khufu produces random output. We are hopeful at the current writing that a 16-round Khufu will be useful for general commercial encryption, and that a 32-round Khufu can be used for high security applications. This conclusion is tentative and requires further investigation. The analysis of Khafre and Snefru have so far been less detailed -- as a consequence, we do not yet have recommendations on the number of rounds that should be used with them. It seems probable that Khafre will require more rounds of encryption to provide equivalent security than Khufu, because the S-boxes used with Khafre are public. Khufu, by contrast, generates different S-boxes for each key and keeps the S-boxes secret -- and so uses more key material per round than Khafre. Although we believe at the time of writing that the conclusions given above are sound, any reader seriously considering use of these encryption functions is advised that (1) waiting for one or two years following their publication should allow sufficient time for their examination by the public cryptographic community and (2) current information about their status should be obtained by contacting the author. Acknowledgements The author would like to particularly thank Dan Greene for his many comments and the many hours of discussion about cryptography in general and the various design proposals for Khufu in particular. Thanks are also due to Luis Rodriguez, who implemented the C version of Khufu and gathered most of the statistics. I would also like to thank Dan Swinehart and the many researchers at PARC who have shown both interest and support during the many months this effort has required. Bibliography 1.) `Secrecy, Authentication, and Public Key Systems', Stanford Ph.D. thesis, 1979, by Ralph C. Merkle. 2.) `A Certified Digital Signature', unpublished paper, 1979. 3.) Moti Yung, private communication. 4.) `A High Speed Manipulation Detection Code', by Robert R. Jueneman, Advances in Cryptology - CRYPTO '86, Springer Verlag, Lecture Notes on Computer Science, Vol. 263, page 327 to 346. 5.) `Another Birthday Attack' by Don Coppersmith, Advances in Cryptology - CRYPTO '85, Springer Verlag, Lecture Notes on Computer Science, Vol. 218, pages 14 to 17. 6.) `A digital signature based on a conventional encryption function', by Ralph C. Merkle, Advances in Cryptology CRYPTO 87, Springer Verlag, Lecture Notes on Computer Science, Vol. 293, page 369-378. 7.) `Cryptography and Data Security', by Dorothy E. R. Denning, Addison-Wesley 1982, page 170. 8.) `On the security of multiple encryption', by Ralph C. Merkle, CACM Vol. 24 No. 7, July 1981 pages 465 to 467. 9.) `Results of an initial attempt to cryptanalyze the NBS Data Encryption Standard', by Martin Hellman et. al., Information Systems lab. report SEL 76-042, Stanford University 1976. 10.) `Communication Theory of Secrecy Systems', by C. E. Shannon, Bell Sys. Tech. Jour. 28 (Oct. 1949) 656-715 11.) `Message Authentication' by R. R. Jueneman, S. M. Matyas, C. H. Meyer, IEEE Communications Magazine, Vol. 23, No, 9, September 1985 pages 29-40. 12.) `Generating strong one-way functions with cryptographic algorithm', by S. M. Matyas, C. H. Meyer, and J. Oseas, IBM Technical Disclosure Bulletin, Vol. 27, No. 10A, March 1985 pages 5658-5659 13.) `Analysis of Jueneman's MDC Scheme', by Don Coppersmith, preliminary version June 9, 1988. Analysis of the system presented in [4]. 14.) `The Data Encryption Standard: Past and Future' by M. E. Smid and D. K. Branstad, Proc. of the IEEE, Vol 76 No. 5 pp 550-559, May 1988 15.) `Defending Secrets, Sharing Data; New Locks and Keys for Electronic Information', U.S. Congress, Office of Technology Assessment, OTA-CIT-310, U.S. Government Printing Office, October 1987 16.) `Exhaustive cryptanalysis of the NBS data encryption standard', by Whitfield Diffie and Martin Hellman, Computer, June 1977, pages 74-78 17.) `Cryptography: a new dimension in data security', by Carl H. Meyer and Stephen M. Matyas, Wiley 1982. 18.) 'One Way Hash Functions and DES', by Ralph C. Merkle, Xerox Disclosure Bulletin. 19.) 'Data Encryption Standard (DES)', National Bureau of Standards (U.S.), Federal Information Processing Standards Publication 46, National Technical Information Service, Springfield, VA, Apr. 1977 20.) ??? 21.) `Cryptography and Computer Privacy', by H. Feistel, Sci. Amer. Vol. 228, No. 5 pp 15-23, May 1973 22.) `Maximum Likelihood Estimation Applied to Cryptanalysis', by Dov Andelman, Stanford Ph.D. Thesis, 1979 23.) IBM has recently proposed a specific one-way hash function which has so far resisted attack.