|
Hyperlinked definitions and discussions of many cryptographic, mathematic, logic, statistics, and electronics terms used in cipher construction and analysis.
Generally used for power distribution because the changing current supports the use of transformers. Utilities can thus transport power at high voltage and low current, which minimize "ohmic" or I2R losses. The high voltages are then reduced at power substations and again by pole transformers for delivery to the consumer.
One example is byte addition modulo 256, which simply adds two byte values, each in the range 0..255, and produces the remainder after division by 256, again a value in the byte range of 0..255. Subtraction is also an "additive" combiner.
Another example is bit-level exclusive-OR which is addition mod 2. A byte-level exclusive-OR is a polynomial addition.
Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 26-31. Addison-Wesley: Reading, Massachusetts.
Marsaglia, G. and L. Tsay. 1985. Matrices and the Structure of Random Number Sequences. Linear Algebra and its Applications. 67: 147-156.
Advantages include:
In addition, a vast multiplicity of independent cycles has the
potential of confusing even a "quantum computer," should such a
thing become possible.
For Degree-n Primitive, and Bit Width w
Total States: 2nw
Non-Init States: 2n(w-1)
Number of Cycles: 2(n-1)(w-1)
Length Each Cycle: (2n-1)2(w-1)
Period of LSB: 2n-1
The binary addition of two bits with no carry input is just XOR, so the lsb of an Additive RNG has the usual maximal length period.
A degree-127 Additive RNG using 127 elements of 32 bits each has 24064 unique states. Of these, 23937 are disallowed by initialization (the lsb's are all "0") but this is just one unusable state out of 2127. There are still 23906 cycles which each have almost 2158 steps. (The Cloak2 stream cipher uses an Additive RNG with 9689 elements of 32 bits, and so has 2310048 unique states. These are mainly distributed among 2300328 different cycles with almost 29720 steps each.)
Note that any LFSR, including the Additive RNG, is very weak when used alone. But when steps are taken to hide the sequence (such as using a jitterizer and Dynamic Substitution combining) the result can have significant strength.
Assume the flat plane defined by two arbitrary unit vectors e1, e2 and a common origin O; this is a coordinate "frame." Assume a grid of lines parallel to each frame vector, separated by unit lengths (a "metric" which may differ for each vector). If the vectors happen to be perpendicular, we have a Cartesian coordinate system, but in any case we can locate any point on the plane by its position on the grid.
An affine transformation can change the origin, the angle between the vectors, and unit vector lengths. Shapes in the original frame thus become "pinched," "squashed" or "stretched" images under the affine transformation. This same sort of thing generalizes to higher degree expressions.
The Handbook of Mathematics says that if e1, e2, e3 are linearly independent vectors, any vector a can be expressed uniquely in the form a = a1e1 + a2e2 + a3e3 where the ai are the affine coordinates. (p.518)
The VNR Concise Encyclopedia of Mathematics says "All transformations that lead to a uniquely soluble system of linear equations are called affine transformations." (p.534)
anxn + an-1xn-1 + ... + a1x1 + a0where the operations are mod 2: addition is Exclusive-OR, and multiplication is AND.
Note that all of the variables xi are to the first power only, and each coefficient ai simply enables or disables its associated variable. The result is a single Boolean value, but the constant term a0 can produce either possible output polarity.
Here are all possible 3-variable affine Boolean functions (each of which may be inverted by complementing the constant term):
affine truth table c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1
Transistors are analog amplifiers which are basically linear over a reasonable range and so require DC power. In contrast, Relays are classically mechanical devices with direct metal-to-metal moving connections, and so can handle generally higher power and AC current.
DEC HEX CTRL CMD DEC HEX CHAR DEC HEX CHAR DEC HEX CHAR 0 00 ^@ NUL 32 20 SPC 64 40 @ 96 60 ' 1 01 ^A SOH 33 21 ! 65 41 A 97 61 a 2 02 ^B STX 34 22 " 66 42 B 98 62 b 3 03 ^C ETX 35 23 # 67 43 C 99 63 c 4 04 ^D EOT 36 24 $ 68 44 D 100 64 d 5 05 ^E ENQ 37 25 % 69 45 E 101 65 e 6 06 ^F ACK 38 26 & 70 46 F 102 66 f 7 07 ^G BEL 39 27 ' 71 47 G 103 67 g 8 08 ^H BS 40 28 ( 72 48 H 104 68 h 9 09 ^I HT 41 29 ) 73 49 I 105 69 i 10 0a ^J LF 42 2a * 74 4a J 106 6a j 11 0b ^K VT 43 2b + 75 4b K 107 6b k 12 0c ^L FF 44 2c , 76 4c L 108 6c l 13 0d ^M CR 45 2d - 77 4d M 109 6d m 14 0e ^N SO 46 2e . 78 4e N 110 6e n 15 0f ^O SI 47 2f / 79 4f O 111 6f o 16 10 ^P DLE 48 30 0 80 50 P 112 70 p 17 11 ^Q DC1 49 31 1 81 51 Q 113 71 q 18 12 ^R DC2 50 32 2 82 52 R 114 72 r 19 13 ^S DC3 51 33 3 83 53 S 115 73 s 20 14 ^T DC4 52 34 4 84 54 T 116 74 t 21 15 ^U NAK 53 35 5 85 55 U 117 75 u 22 16 ^V SYN 54 36 6 86 56 V 118 76 v 23 17 ^W ETB 55 37 7 87 57 W 119 77 w 24 18 ^X CAN 56 38 8 88 58 X 120 78 x 25 19 ^Y EM 57 39 9 89 59 Y 121 79 y 26 1a ^Z SUB 58 3a : 90 5a Z 122 7a z 27 1b ^[ ESC 59 3b ; 91 5b [ 123 7b { 28 1c ^\ FS 60 3c < 92 5c \ 124 7c | 29 1d ^] GS 61 3d = 93 5d ] 125 7d } 30 1e ^^ RS 62 3e > 94 5e ^ 126 7e 31 1f ^_ US 63 3f ? 95 5f _ 127 7f DEL
Also see: commutative and distributive.
Classically, attacks were neither named nor classified; there was just: "here is a cipher, and here is the attack." And while this gradually developed into named attacks, there is no overall attack taxonomy. Currently, attacks are often classified by the information available to the attacker or constraints on the attack, and then by strategies which use the available information. Not only ciphers, but also cryptographic hash functions can be attacked, generally with very different strategies.
We are to attack a cipher which enciphers plaintext into ciphertext or deciphers the opposite way, under control of a key. The available information necessarily constrains our attack strategies.
The goal of an attack is to reveal some unknown plaintext, or the key (which will reveal the plaintext). An attack which succeeds with less effort than a brute-force search we call a break. An "academic" ("theoretical," "certificational") break may involve impractically large amounts of data or resources, yet still be called a "break" if the attack would be easier than brute force. (It is thus possible for a "broken" cipher to be much stronger than a cipher with a short key.) Sometimes the attack strategy is thought to be obvious, given a particular informational constraint, and is not further classified.
Many attacks try to isolate unknown small components or aspects so they can be solved separately, a process known as divide and conquer. Also see: security.
For a known population, the number of repetitions expected at each level has long been understood to be a binomial expression. But if we are sampling in an attempt to establish the effective size of an unknown population, we have two problems:
Fortunately, there is an unexpected and apparently previously unknown combinatoric relationship between the population and the number of combinations of occurrences of repeated values. This allows us to convert any number of triples and higher n-reps to the number of 2-reps which have the same probability. So if we have a double, and then get another of the same value, we have a triple, which we can convert into three 2-reps. The total number of 2-reps from all repetitions (the augmented 2-reps value) is then used to predict population.
We can relate the number of samples s to the population N through the expected number of augmented doubles Ead:
Ead(N,s) = s(s-1) / 2N .This equation is exact, provided we interpret all the exact n-reps in terms of 2-reps. For example, a triple is interpreted as three doubles; the augmentation from 3-reps to 2-reps is (3 C 2) or 3. The augmented result is the sum of the contributions from all higher repetition levels:
n i ad = SUM ( ) r[i] . i=2 2where ad is the number of augmented doubles, and r[i] is the exact repetition count at the i-th level.
And this leads to an equation for predicting population:
Nad(s,ad) = s(s-1) / 2 ad .This predicts the population Nad as based on a mean value of augmented doubles ad. Clearly, we expect the number of samples to be far larger than the number of augmented doubles, but an error in the augmented doubles ad should produce a proportionally similar error in the predicted population Nad. We typically develop ad to high precision by averaging the results of many large trials.
However, since the trials should have approximately a simple Poisson distribution (which has only a single parameter), we could be a bit more clever and fit the results to the expected distribution, thus perhaps developing a bit more accuracy.
Also see the article: Estimating Population from Repetitions in Accumulated Random Samples, and the Population Estimation Worksheets in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
One form of message authentication computes a CRC hash across the plaintext data, and appends the CRC remainder (or result) to the plaintext data: this adds a computed redundancy to an arbitrary message. The CRC result is then enciphered along with the data. When the message is deciphered, if a second CRC operation produces the same result, the message can be assumed unchanged.
Note that a CRC is a fast, linear hash. Messages with particular CRC result values can be constructed rather easily. However, if the CRC is hidden behind strong ciphering, an Opponent is unlikely to be able to change the CRC value systematically or effectively. In particular, this means that the CRC value will need more protection than a simple exclusive-OR stream cipher or the exclusive-OR approach to handling short last blocks in a block cipher.
A similar approach to message authentication uses a nonlinear cryptographic hash function. These also add a computed redundancy to the message, but generally require significantly more computation than a CRC. It is thought to be exceedingly difficult to construct messages with a particular cryptographic hash result, so the hash result perhaps need not be hidden by encryption.
One form of cryptographic hash is DES CBC mode: using a key different than that used for encryption, the final block of ciphertext is the hash of the message. This obviously doubles the computation when both encryption and authentication are needed. And since any cryptographic hash is vulnerable to birthday attacks, the small 64-bit block size implies that we should be able to find two different messages with the same hash value by constructing and hashing "only" about 232 different messages.
Another approach to message authentication is to use an authenticating block cipher; this is often a block cipher which has a large block, with some "extra data" inserted in an "authentication field" as part of the plaintext before enciphering each block. The "extra data" can be some transformation of the key, the plaintext, and/or a sequence number. This essentially creates a homophonic block cipher: If we know the key, many different ciphertexts will produce the same plaintext field, but only one of those will have the correct authentication field.
The usual approach to authentication in a public key cipher is to encipher with the private key. The resulting ciphertext can then be deciphered by the public key, which anyone can know. Since even the wrong key will produce a "deciphered" result, it is also necessary to identify the resulting plaintext as a valid message; in general this will also require redundancy in the form of a hash value in the plaintext. The process provides no secrecy, but only a person with access to the private key could have enciphered the message.
The classical approach to user authentication is a password; this is "something you know." One can also make use of "something you have" (such as a secure ID card), or "something you are" (biometrics).
The classic problem with passwords is that they must be remembered by ordinary people, and so carry a limited amount of uniqueness. Easy-to-remember passwords are often common language phrases, and so often fall to a dictionary attack. More modern approaches involve using a Diffie-Hellman key exchange, plus the password, thus minimizing exposure to a dictionary attack. This does require a program on the user end, however.
In secret key ciphers, key authentication is inherent in secure key distribution.
In public key ciphers, public keys are exposed and often delivered insecurely. But someone who uses the wrong key may unknowingly have "secure" communications with an Opponent, as in a man-in-the-middle attack. It is thus absolutely crucial that public keys be authenticated or certified as a separate process. Normally this implies the need for a Certification Authority or CA.
"As the input moves through successive layers the pattern of 1's generated is amplified and results in an unpredictable avalanche. In the end the final output will have, on average, half 0's and half 1's . . . ." [p.22]
Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 15-23.
Also see mixing, diffusion, overall diffusion, strict avalanche criterion, complete, S-box, and the bit changes section of the Ciphers By Ritter / JavaScript computation pages.
"For a given transformation to exhibit the avalanche effect, an average of one half of the output bits should change whenever a single input bit is complemented." [p.523]
Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85. 523-534.
Also see the bit changes section of the Ciphers By Ritter / JavaScript computation pages.
Back Door
"A function is balanced if, when all input vectors are equally likely, then all output vectors are equally likely."
Lloyd, S. 1990. Properties of binary functions. Advances in Cryptology -- EUROCRYPT '90. 124-139.
There is some desire to generalize this definition to describe multiple-input functions. (Is a function "balanced" if, for one value on the first input, all output values can be produced, but for another value on the first input, only some output values are possible?) Presumably a two-input balanced function would be balanced for either input fixed at any value, which would essentially be a Latin square or a Latin square combiner.
A Balanced Block Mixer is an m-input-port m-output-port mechanism with various properties:
If we have a two port mixer, with input ports labeled A and B, output ports labeled X and Y, and some irreducible mod 2 polynomial p of degree appropriate to the port size, a Balanced Block Mixer is formed by the equations:
X = 3A + 2B (mod 2)(mod p),
Y = 2A + 3B (mod 2)(mod p).
This particular BBM is a self-inverse or involution, and so can be used without change whether enciphering or deciphering. One possible value for p for mixing 8-bit values is 100011011.
Balanced Block Mixing functions probably should be thought of as orthogonal Latin squares. For example, here is a tiny nonlinear "2-bit" BBM:
3 1 2 0 0 3 2 1 30 13 22 01 0 2 1 3 2 1 0 3 = 02 21 10 33 1 3 0 2 1 2 3 0 11 32 03 20 2 0 3 1 3 0 1 2 23 00 31 12
Suppose we wish to mix (1,3); 1 selects the second row up in both squares, and 3 selects the rightmost column, thus selecting (2,0) as the output. Since there is only one occurrence of (2,0) among all entry pairs, this discrete mixing function is reversible, as well as being balanced on both inputs.
Cryptographic advantages of balanced block mixing include the fact that each output is always balanced with respect to either input, and that no information is lost in the mixing. This allows us to use balanced block mixing as the "butterfly" operations in a fast Walsh-Hadamard transform or the well-known FFT. By using the mixing patterns of these transforms, we can mix 2n elements such that each input is guaranteed to affect each and every output in a balanced way. And if we use keying to generate the tables, we can have a way to mix huge blocks in small nonlinear mixing tables with overall mixing guarantees.
Also see Mixing Cipher, Dynamic Substitution Combiner, Variable Size Block Cipher, and the Active Balanced Block Mixing in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
In a statically-balanced combiner, any particular result value can be produced by any value on one input, simply by selecting some appropriate value for the other input. In this way, knowledge of only the output value provides no information -- not even statistical information -- about either input.
The common examples of cryptographic combiner, including byte exclusive-OR (mod 2 polynomial addition), byte addition (integer addition mod 256), or other "additive" combining, are perfectly balanced. Unfortunately, these simple combiners are also very weak, being inherently linear and without internal state.
A Latin square combiner is an example of a statically-balanced reversible nonlinear combiner with massive internal state. A Dynamic Substitution Combiner is an example of a dynamically or statistically-balanced reversible nonlinear combiner with substantial internal state.
0 1 2 3 4 5 6 7 8 9 a b c d e f 0 A B C D E F G H I J K L M N O P 1 Q R S T U V W X Y Z a b c d e f 2 g h i j k l m n o p q r s t u v 3 w x y z 0 1 2 3 4 5 6 7 8 9 + / use "=" for padding
We can do FWT's in "the bottom panel" at the end of Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
Here is every bent sequence of length 4, first in {0,1} notation, then in {1,-1} notation, with their FWT results:
bent {0,1} FWT bent {1,-1} FWT 0 0 0 1 1 -1 -1 1 1 1 1 -1 2 2 2 -2 0 0 1 0 1 1 -1 -1 1 1 -1 1 2 -2 2 2 0 1 0 0 1 -1 1 -1 1 -1 1 1 2 2 -2 2 1 0 0 0 1 1 1 1 -1 1 1 1 2 -2 -2 -2 1 1 1 0 3 1 1 -1 -1 -1 -1 1 -2 -2 -2 2 1 1 0 1 3 -1 1 1 -1 -1 1 -1 -2 2 -2 2 1 0 1 1 3 1 -1 1 -1 1 -1 -1 -2 -2 2 -2 0 1 1 1 3 -1 -1 -1 1 -1 -1 -1 -2 2 2 2These sequences, like all true bent sequences, are not balanced, and the zeroth element of the {0,1} FWT is the number of 1's in the sequence.
Here are some bent sequences of length 16:
bent {0,1} 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0 FWT 6,-2,2,-2,2,-2,2,2,-2,-2,2,-2,-2,2,-2,-2 bent {1,-1} 1 -1 1 1 1 -1 1 1 -1 -1 1 -1 1 1 -1 1 FWT 4,4,-4,4,-4,4,-4,-4,4,4,-4,4,4,-4,4,4 bent {0,1} 0 0 1 0 0 1 0 0 1 0 0 0 1 1 1 0 FWT 6,2,2,-2,-2,2,-2,2,-2,-2,-2,-2,2,2,-2,-2 bent {1,-1} 1 1 -1 1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 FWT 4,-4,-4,4,4,-4,4,-4,4,4,4,4,-4,-4,4,4
Bent sequences are said to have the highest possible uniform nonlinearity. But, to put this in perspective, recall that we expect a random sequence of 16 bits to have 8 bits different from any particular sequence, linear or otherwise. That is also the maximum possible nonlinearity, and here we actually get a nonlinearity of 6.
There are various more or less complex constructions for these sequences. In most cryptographic uses, bent sequences are modified slightly to achieve balance.
Bernoulli trials have a Binomial distribution.
Possibly also the confusing counterpart to unary when describing the number of inputs or arguments to a function, but dyadic is almost certainly a better choice.
n k n-k P(k,n,p) = ( ) p (1-p) k
This ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.
If we have an experiment which we think should produce a binomial distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact binomial.
Also see the binomial section of the Ciphers By Ritter / JavaScript computation pages.
Also see: birthday paradox.
The "paradox" is resolved by noting that we have a 1/365 chance
of success for each possible pairing of students, and there
are 253 possible pairs or
combinations of 23 things taken 2 at
a time. (To count the number of pairs, we can choose any of the 23
students as part of the pair, then any of the 22 remaining students
as the other part. But this counts each pair twice, so we have
We can compute the overall probability of success from the
probability of failure
We can relate the probability of finding at least one "double" of some birthday (Pd) to the expected number of doubles (Ed) as:
Pd = 1 - e-Ed , soEd = -Ln( 1 - Pd ) and365 * -Ln( 0.5 ) = 365 * 0.693 = 253 .
Also see: Estimating Population from Repetitions in Accumulated Random Samples, my "birthday" article.
A 64-bit block supports 264 or about
It is not normally possible to block-cipher just a single bit or a single byte of a block. An arbitrary stream of data can always be partitioned into one or more fixed-size blocks, but it is likely that at least one block will not be completely filled. Using fixed-size blocks generally means that the associated system must support data expansion in enciphering, if only by one block. Handling even minimal data expansion may be difficult in some systems.
A block cipher is a transformation between plaintext block values and ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Within a particular block size, both plaintext and ciphertext have the same set of possible values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, and this is a permutation of the plaintext values. A block cipher is keyed by constructing a particular permutation of ciphertext values.
In an ideal block cipher, changing even a single bit of the input block will change all bits of the ciphertext result, each with independent probability 0.5. This means that about half of the bits in the output will change for any different input block, even for differences of just one bit. This is overall diffusion and is present in a block cipher, but not in a stream cipher. Data diffusion is a simple consequence of the keyed invertible simple substitution nature of the ideal block cipher.
Improper diffusion of data throughout a block cipher can have serious strength implications. One of the functions of data diffusion is to hide the different effects of different internal components. If these effects are not in fact hidden, it may be possible to attack each component separately, and break the whole cipher fairly easily.
A large message can be ciphered by partitioning the plaintext into blocks of a size which can be ciphered. This essentially creates a stream meta-cipher which repeatedly uses the same block cipher transformation. Of course, it is also possible to re-key the block cipher for each and every block ciphered, but this is usually expensive in terms of computation and normally unnecessary.
A message of arbitrary size can always be partitioned into some number of whole blocks, with possibly some space remaining in the final block. Since partial blocks cannot be ciphered, some random padding can be introduced to fill out the last block, and this naturally expands the ciphertext. In this case it may also be necessary to introduce some sort of structure which will indicate the number of valid bytes in the last block.
Proposals for using a block cipher supposedly without data expansion may involve creating a tiny stream cipher for the last block. One scheme is to re-encipher the ciphertext of the preceding block, and use the result as the confusion sequence. Of course, the cipher designer still needs to address the situation of files which are so short that they have no preceding block. Because the one-block version is in fact a stream cipher, we must be very careful to never re-use a confusion sequence. But when we only have one block, there is no prior block to change as a result of the data. In this case, ciphering several very short files could expose those files quickly. Furthermore, it is dangerous to encipher a CRC value in such a block, because exclusive-OR enciphering is transparent to the field of mod 2 polynomials in which the CRC operates. Doing this could allow an Opponent to adjust the message CRC in a known way, thus avoiding authentication exposure.
Another proposal for eliminating data expansion consists of ciphering blocks until the last short block, then re-positioning the ciphering window to end at the last of the data, thus re-ciphering part of the prior block. This is a form of chaining and establishes a sequentiality requirement which requires that the last block be deciphered before the next-to-the-last block. Or we can make enciphering inconvenient and deciphering easy, but one way will be a problem. And this approach cannot handle very short messages: its minimum size is one block. Yet any general-purpose ciphering routine will encounter short messages. Even worse, if we have a short message, we still need to somehow indicate the correct length of the message, and this must expand the message, as we saw before. Thus, overall, this seems a somewhat dubious technique.
On the other hand, it does show a way to chain blocks for authentication in a large-block cipher: We start out by enciphering the data in the first block. Then we position the next ciphering to start inside the ciphertext of the previous block. Of course this would mean that we would have to decipher the message in reverse order, but it would also propagate any ciphertext changes through the end of the message. So if we add an authentication field at the end of the message (a keyed value known on both ends), and that value is recovered upon deciphering (this will be the first block deciphered) we can authenticate the whole message. But we still need to handle the last block padding problem and possibly also the short message problem.
Ciphering raw plaintext data can be dangerous when the cipher has a small block size. Language plaintext has a strong, biased distribution of symbols and ciphering raw plaintext would effectively reduce the number of possible plaintexts blocks. Worse, some plaintexts would be vastly more probable than others, and if some known plaintext were available, the most-frequent blocks might already be known. In this way, small blocks can be vulnerable to classic codebook attacks which build up the ciphertext equivalents for many of the plaintext phrases. This sort of attack confronts a particular block size, and for these attacks Triple-DES is no stronger than simple DES, because they both have the same block size.
The usual way of avoiding these problems is to randomize the plaintext block with an operating mode such as CBC. This can ensure that the plaintext data which is actually ciphered is evenly distributed across all possible block values. However, this also requires an IV which thus expands the ciphertext.
Another approach is to apply data compression to the plaintext before enciphering. If this is to be used instead of plaintext randomization, the designer must be very careful that the data compression does not contain regular features which could be exploited by The Opponents.
An alternate approach is to use blocks of sufficient size for them to be expected to have a substantial amount of uniqueness or "entropy." If we expect plaintext to have about one bit of entropy per byte of text, we might want a block size of at least 64 bytes before we stop worrying about an uneven distribution of plaintext blocks. This is now a practical block size.
Typically computed by using a fast Walsh-Hadamard transform on the Boolean-valued truth table of the function. This produces the unexpected distance to every possible affine Boolean function (of the given length). Scanning those results for the maximum value implies the minimum distance to some particular affine sequence.
Especially useful in S-box analysis, where the nonlinearity for the table is often taken to be the minimum of the nonlinearity values computed for each output bit.
Also see the Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
A cipher is "broken" when the information in a message can be extracted without the key, or when the key itself can be recovered. The strength of a cipher can be considered to be the minimum effort required for a break, by any possible attack. A break is particularly significant when the work involved need not be repeated on every message.
The use of the term "break" can be misleading when an impractical amount of work is required to achieve the break. This case might be better described a "theoretical" or "certificational" weakness.
Recognizing plaintext may or may not be easy. Even when the key length of a cipher is sufficient to prevent brute force attack, that key will be far too small to produce every possible plaintext from a given ciphertext (see perfect secrecy). Combined with the fact that language is redundant, this means that very few of the decipherings will be words in proper form. Of course, if the plaintext is not language, but is instead computer code, compressed text, or even ciphertext from another cipher, recognizing a correct deciphering can be difficult.
Brute force is the obvious way to attack a cipher, and the way any cipher can be attacked, so ciphers are designed to have a large enough keyspace to make this much too expensive to use in practice. Normally, the design strength of a cipher is based on the cost of a brute-force attack.
Capacitor
Typically, two conductive "plates" or metal foils separated by a thin insulator, such as air, paper, or ceramic. An electron charge on one plate attracts the opposite charge on the other plate, thus "storing" charge. A capacitor can be used to collect a small current over long time, and then release a high current for a short time, as used in a camera strobe or "flash."
In CBC mode the ciphertext value of the preceding block is exclusive-OR combined with the plaintext value for the current block. This has the effect of distributing the combined block values evenly among all possible block values, and so prevents codebook attacks.
On the other hand, ciphering the first block generally requires an IV or initial value to start the process. The IV necessarily expands the ciphertext, which may or may not be a problem. And the IV must be dynamically random-like so that statistics cannot be developed on the first block of each message sent under the same key.
In CBC mode, each random-like confusing value is the ciphertext from each previous block. Clearly this ciphertext is exposed to The Opponent, so there would seem to be little benefit associated with hiding the IV, which is just the first of these values. But if The Opponent knows the first sent plaintext, and can intercept and change the message IV, The Opponent can manipulate the first block of received plaintext. Because the IV does not represent a message enciphering, manipulating this value does not also change any previous block.
Accordingly, the IV may be sent enciphered or may be specifically authenticated in some way. Alternately, the complete body of the plaintext message may be authenticated, often by a CRC. The CRC remainder should be block ciphered, perhaps as part of the plaintext.
CFB is closely related to OFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. CFB generally forms an autokey stream cipher. CFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.
CFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and the resulting ciphertext is also shifted into the register. The new register value is ciphered, producing another confusion value for use in stream ciphering.
One disadvantage of this, of course, is the need for a full block-wide ciphering operation, typically for each data byte ciphered. The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing.
Chaos
In physics, the "state" of an analog physical system cannot be fully measured, which always leaves some remaining uncertainty to be magnified on subsequent steps. And, in many cases, a physical system may be slightly affected by thermal noise and thus continue to accumulate new information into its "state."
In a computer, the state of the digital system is explicit and complete, and there is no uncertainty. No noise is accumulated. All operations are completely deterministic. This means that, in a computer, even a "chaotic" computation is completely predictable and repeatable.
In the usual case, many independent samples are counted by category or separated into value-range "bins." The reference distribution gives us the the number of values to expect in each bin. Then we compute a X2 test statistic related to the difference between the distributions:
X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )
("SQR" is the squaring function, and we require that each expectation not be zero.) Then we use a tabulation of chi-square statistic values to look up the probability that a particular X2 value or lower (in the c.d.f.) would occur by random sampling if both distributions were the same. The statistic also depends upon the "degrees of freedom," which is almost always one less than the final number of bins. See the chi-square section of the Ciphers By Ritter / JavaScript computation pages.
The c.d.f. percentage for a particular chi-square value is the area of the statistic distribution to the left of the statistic value; this is the probability of obtaining that statistic value or less by random selection when testing two distributions which are exactly the same. Repeated trials which randomly sample two identical distributions should produce about the same number of X2 values in each quarter of the distribution (0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100%). So if we repeatedly find only very high percentage values, we can assume that we are probing different distributions. And even a single very high percentage value would be a matter of some interest.
Any statistic probability can be expressed either as the proportion of the area to the left of the statistic value (this is the "cumulative distribution function" or c.d.f.), or as the area to the right of the value (this is the "upper tail"). Using the upper tail representation for the X2 distribution can make sense because the usual chi-squared test is a "one tail" test where the decision is always made on the upper tail. But the "upper tail" has an opposite "sense" to the c.d.f., where higher statistic values always produce higher percentage values. Personally, I find it helpful to describe all statistics by their c.d.f., thus avoiding the use of a wrong "polarity" when interpreting any particular statistic. While it is easy enough to convert from the c.d.f. to the complement or vise versa (just subtract from 1.0), we can base our arguments on either form, since the statistical implications are the same.
It is often unnecessary to use a statistical test if we just want to know whether a function is producing something like the expected distribution: We can look at the binned values and generally get a good idea about whether the distributions change in similar ways at similar places. A good rule-of-thumb is to expect chi-square totals similar to the number of bins, but distinctly different distributions often produce huge totals far beyond the values in any table, and computing an exact probability for such cases is simply irrelevant. On the other hand, it can be very useful to perform 20 to 40 independent experiments to look for a reasonable statistic distribution, rather than simply making a "yes / no" decision on the basis of what might turn out to be a rather unusual result.
Since we are accumulating discrete bin-counts, any fractional expectation will always differ from any actual count. For example, suppose we expect an even distribution, but have many bins and so only accumulate enough samples to observe about 1 count for every 2 bins. In this situation, the absolute best sample we could hope to see would be something like (0,1,0,1,0,1,...), which would represent an even, balanced distribution over the range. But even in this best possible case we would still be off by half a count in each and every bin, so the chi-square result would not properly characterize this best possible sequence. Accordingly, we need to accumulate enough samples so that the quantization which occurs in binning does not appreciably affect the accuracy of the result. Normally I try to expect at least 10 counts in each bin.
But when we have a reference distribution that trails off toward zero, inevitably there will be some bins with few counts. Taking more samples will just expand the range of bins, some of which will be lightly filled in any case. We can avoid quantization error by summing both the observations and expectations from multiple bins, until we get a reasonable expectation value (again, I like to see 10 counts or more). In this way, the "tails" of the distribution can be more properly (and legitimately) characterized.
A good cipher can transform secret information into a multitude of different intermediate forms, each of which represents the original information. Any of these intermediate forms or ciphertexts can be produced by ciphering the information under a particular key value. The intent is that the original information only be exposed by one of the many possible keyed interpretations of that ciphertext. Yet the correct interpretation is available merely by deciphering under the appropriate key.
A cipher appears to reduce the protection of secret information to enciphering under some key, and then keeping that key secret. This is a great reduction of effort and potential exposure, and is much like keeping your valuables in your house, and then locking the door when you leave. But there are also similar limitations and potential problems.
With a good cipher, the resulting ciphertext can be stored or transmitted otherwise exposed without also exposing the secret information hidden inside. This means that ciphertext can be stored in, or transmitted through, systems which have no secrecy protection. For transmitted information, this also means that the cipher itself must be distributed in multiple places, so in general the cipher cannot be assumed to be secret. With a good cipher, only the deciphering key need be kept secret.
We seek to hide distinctions of size, because operation is independent of size, and because size effects are usually straightforward. We thus classify serious block ciphers as keyed simple substitution, just like newspaper amusement ciphers, despite their obvious differences in strength and construction. This allows us to compare the results from an ideal tiny cipher to those from a large cipher construction; the grouping thus can provide benchmark characteristics for measuring large cipher constructions.
We could of course treat each cipher as an entity unto itself, or relate ciphers by their dates of discovery, the tree of developments which produced them, or by known strength. But each of these criteria is more or less limited to telling us "this cipher is what it is." We already know that. What we want to know is what other ciphers function in a similar way, and then whatever is known about those ciphers. In this way, every cipher need not be an island unto itself, but instead can be judged and compared in a related community of similar techniques.
Our primary distinction is between ciphers which handle all the data at once (block ciphers), and those which handle some, then some more, then some more (stream ciphers). We thus see the usual repeated use of a block cipher as a stream meta-cipher which has the block cipher as a component. It is also possible for a stream cipher to be re-keyed or re-originate frequently, and so appear to operate on "blocks." Such a cipher, however, would not have the overall diffusion we normally associate with a block cipher, and so might usefully be regarded as a stream meta-cipher with a stream cipher component.
The goal is not to give each cipher a label, but instead to seek insight. Each cipher in a particular general class carries with it the consequences of that class. And because these groupings ignore size, we are free to generalize from the small to the large and so predict effects which may be unnoticed in full-size ciphers.
(Note that this definition is somewhat broader than the now common understanding of a huge, and thus emulated, Simple Substitution. But there are ciphers which require blocked plaintext and which do not emulate Simple Substitution, and calling these something other than "block" ciphers negates the advantage of a taxonomy.)
Ciphertext expansion is the general situation: Stream ciphers need a message key, and block ciphers with a small block need some form of plaintext randomization, which generally needs an IV to protect the first block. Only block ciphers with a large size block generally can avoid ciphertext expansion, and then only if each block can be expected to hold sufficient uniqueness or "entropy" to prevent a codebook attack.
It is certainly true that in most situations of new construction a few extra bytes are not going to be a problem. However, in some situations, and especially when a cipher is to be installed into an existing system, the ability to encipher data without requiring additional storage can be a big advantage. Ciphering data without expansion supports the ciphering of data structures which have been defined and fixed by the rest of the system, provided only that one can place the cipher at the interface "between" two parts of the system. This is also especially efficient, as it avoids the process of acquiring a different, larger, amount of store for each ciphering. Such an installation also can apply to the entire system, and not require the re-engineering of all applications to support cryptography in each one.
In an analog system we might produce a known delay by slowly charging a capacitor and measuring the voltage across it continuously until the voltage reaches the desired level. A big problem with this is that the circuit becomes increasingly susceptible to noise at the end of the interval.
In a digital system we create a delay by simply counting clock cycles. Since all external operations are digital, noise effects are virtually eliminated, and we can easily create accurate delays which are as long as the count in any counter we can build.
Coding is a very basic part of modern computation and generally implies no secrecy or information hiding. Some codes are "secret codes," however, and then the transformation between the information and the coding is kept secret. Also see: cryptography and substitution.
The usual ciphertext-only approach depends upon the plaintext having strong statistical biases which make some values far more probable than others, and also more probable in the context of particular preceding known values. Such attacks can be defeated if the plaintext data are randomized and thus evenly and independently distributed among the possible values. (This may have been the motivation for the use of a random confusion sequence in a stream cipher.)
When a codebook attack is possible on a block cipher, the complexity of the attack is controlled by the size of the block (that is, the number of elements in the codebook) and not the strength of the cipher. This means that a codebook attack would be equally effective against either DES or Triple-DES.
One way a block cipher can avoid a codebook attack is by having a large block size which will contain an unsearchable amount of plaintext "uniqueness" or entropy. Another approach is to randomize the plaintext block, often by using an operating mode such as CBC.
n ( ) = C(n,k) = n! / (k! (n-k)!) k
See the combinations section of the Ciphers By Ritter / JavaScript computation pages. Also see permutation.
Consider a block cipher: For any given size block, there is some fixed number of possible messages. Since every enciphering must be reversible (deciphering must work), we have a 1:1 mapping between plaintext and ciphertext blocks. The set of all plaintext values and the set of all ciphertext values is the same set; particular values just have different meanings in each set.
Keying gives us no more ciphertext values, it only re-uses the values which are available. Thus, keying a block cipher consists of selecting a particular arrangement or permutation of the possible block values. Permutations are a combinatoric topic. Using combinatorics we can talk about the number of possible permutations or keys in a block cipher, or in cipher components like substitution tables.
Permutations can be thought of as the number of unique arrangements of a given length on a particular set. Other combinatoric concepts include binomials and combinations (the number of unique given-length subsets of a given set).
Reversible combiners are used to encipher plaintext into ciphertext in a stream cipher. The ciphertext is then deciphered into plaintext using a related inverse or extractor mechanism.
Irreversible or non-invertible combiners are often used to mix multiple RNG's into a single confusion sequence, also for use in stream cipher designs.
Also see balanced combiner, additive combiner and complete, and The Story of Combiner Correlation: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
Also see: associative and distributive.
Completeness does not require that an input bit change an output bit for every input value (which would not make sense anyway, since every output bit must be changed at some point, and if they all had to change at every point, we would have all the output bits changing, instead of the desired half). The inverse of a complete function is not necessarily also complete.
As originally defined in Kam and Davida:
"For every possible key value, every output bit ci of the SP network depends upon all input bits p1,...,pn and not just a proper subset of the input bits." [p.748]Kam, J. and G. Davida. 1979. Structured Design of Substitution-Permutation Encryption Networks. IEEE Transactions on Computers. C-28(10): 747-753.
The most successful components are extremely general and can be used in many different ways. Even as a brick is independent of the infinite variety of brick buildings, a flip-flop is independent of the infinite variety of logic machines which use flip-flops.
The source of the ability to design and build a wide variety of different electronic logic machines is the ability to interconnect and use a few very basic but very general parts.
Electronic components include
Cryptographic system components include:
A logic machine with:
Also see: source code, object code and software.
In number theory we say than integer a (exactly) divides
integer b (denoted
In number theory we say that integer a is congruent to
integer b
modulo m, denoted
Used in the analysis of signal processing to develop the response of a processing system to a complicated real-valued input signal. The input signal is first separated into some number of discrete impulses. Then the system response to an impulse -- the output level at each unit time delay after the impulse -- is determined. Finally, the expected response is computed as the sum of the contributions from each input impulse, multiplied by the magnitude of each impulse. This is an approximation to the convolution integral with an infinite number of infinitesimal delays. Although originally accomplished graphically, the process is just polynomial multiplication.
It is apparently possible to compute the convolution of two sequences by taking the FFT of each, multiplying these results term-by-term, then taking the inverse FFT. While there is an analogous relationship in the FWT, in this case the "delays" between the sequences represent mod 2 distance differences, which may or may not be useful.
One way to evaluate the correlation of two real-valued sequences is to multiply them together term-by-term and sum all results. If we do this for all possible "delays" between the two sequences, we get a "vector" or 1-dimensional array of correlations which is a convolution. Then the maximum value represents the delay with the best correlation.
"The correlation coefficient associated with a pair of Boolean functions f(a) and g(a) is denoted by C(f,g) and is given by
C(f,g) = 2 * prob(f(a) = g(a)) - 1 ."
Daemen, J., R. Govaerts and J. Vanderwalle. 1994. Correlation Matrices. Fast Software Encryption. 276. Springer-Verlag.
A CRC is essentially a fast remainder operation over a huge numeric value which is the data. (For best speed, the actual computation occurs as mod 2 polynomial operations.) The CRC result is an excellent (but linear) hash value corresponding to the data.
No CRC has any appreciable strength, but some applications -- even in cryptography -- need no strength:
Because there is no theory which guarantees strength for any conventional cipher, ciphers traditionally have been considered "strong" when they have been used for a long time with "nobody" knowing how to break them easily. Cryptanalysis seeks to improve this process by applying the known attack strategies to new ciphers, and by actively seeking new ones. It is normal to assume that at least known-plaintext is available; often, defined-plaintext is assumed. The result is typically some value for the amount of "work" which will achieve a "break" (even if that value is impractical); this is "the" strength of the cipher.
But while cryptanalysis can prove "weakness" for a given level of effort, cryptanalysis cannot prove that there is no simpler attack:
Lack of proof of weakness is not proof of strength.
Indeed, when ciphers are used for real, The Opponents can hardly be expected to advertise a successful break, but will instead work hard to reassure users that their ciphers are still secure. The fact that apparently "nobody" knows how to break a cipher is somewhat less reassuring from this viewpoint. In this context, using a wide variety of different ciphers can make good sense: This reduces the value of the information protected by any particular cipher, which thus reduces the rewards from even a successful attack. Having a numerous ciphers also requires The Opponents to field far greater resources to identify, analyze, and automate breaking (when possible) of each different cipher.
Many academic attacks are essentially theoretical, involving huge amounts of data and computation. But even when a direct technical attack is practical, that may be the most difficult, expensive and time-consuming way to obtain the desired information. Other methods include making a paper copy, stealing a copy, bribery, coercion, and electromagnetic monitoring. No cipher can keep secret something which has been otherwise revealed. Information security thus involves far more than just cryptography, and even a cryptographic system is more than just a cipher. Even finding that information has been revealed does not mean that a cipher has been broken.
At one time it was reasonable to say: "Any cipher a man can make, another man can break." However, with the advent of serious computer-based cryptography, that statement is no longer valid, provided that every detail is properly handled. This, of course, often turns out to not be the case.
Cryptography includes at least:
Modern cryptography generally depends upon translating a message into one of an astronomical number of different intermediate representations, or ciphertexts, as selected by a key. If all possible intermediate representations have similar appearance, it may be necessary to try all possible keys to find the one which deciphers the message. By creating mechanisms with an astronomical number of keys, we can make this approach impractical.
Cryptography may also be seen as a zero-sum game, where a cryptographer competes against a cryptanalyst. We might call this the cryptography war.
Note that the successful cryptanalyst must keep good attacks secret, or the opposing cryptographer will just produce a stronger cipher. This means that the cryptographer is in the odd position of never knowing whether his or her best cipher designs are successful, or which side is winning.
Cryptographers are often scientists who are trained to ignore unsubstantiated claims. But there will be no substantiation when a cipher system is attacked and broken for real, yet continued use will endanger all messages so "protected." Thus, it is a very reasonable policy to not adopt a widely-used cipher, and to change ciphers periodically.
dB
Most electronic devices require DC -- at least internally -- for proper operation, so a substantial part of modern design is the "power supply" which converts 120 VAC wall power into 12 VDC, 5 VDC and/or 3 VDC as needed by the circuit and active devices.
Contrary to naive expectations, a complex system almost never performs as desired when first realized. Both hardware and software system design environments generally deal with systems which are not working. (When a system really works, the design and development process is generally over.) Debugging involves identifying problems, analyzing the source of those problems, then changing the construction to fix the problem. (Hopefully, the fix will not itself create new problems.) This form of interactive analysis can be especially difficult because the realized design may not actually be what is described in the schematics, flow-charts, or other working documents: To some extent the real system is unknown.
When a system has many problems, the problems tend to interact, which can make the identification of a particular cause very difficult. This can be managed by "shrinking" the system: first by partitioning the design into components and testing those components, and then by temporarily disabling or removing sections so as to identify the section in which the problem lies. Eventually, with enough testing, partitioning and analysis, the source of any problem can be identified. Some "problems," however, turn out to be the unexpected implications of a complex design and are sometimes accepted as "features" rather than the alternative of a complete design overhaul.
A defined plaintext attack can be a problem for systems which allow unauthorized users to present arbitrary messages for ciphering. Such attack can be made difficult by allowing only authorized users to encipher data, by allowing only a few messages to be enciphered between key changes, by changing keys frequently, and by enciphering each message in a different random message key.
If we choose two values completely independently, we have a DF of 2. But if we must choose two values such that the second is twice the first, we can choose only the first value independently. Imposing a relationship on one of the sampled value means that we will have a DF of one less than the number of samples, even though we may end up with apparently similar sample values.
In a typical goodness of fit test such as chi-square, the reference distribution (the expected counts) is normalized to give the same number of counts as the experiment. This is a constraint, so if we have N bins, we will have a DF of N - 1.
When voltages or currents are measured, power changes as the square of these values, so a decibel is twenty times the base-10 logarithm of the ratio of two voltages or currents.
Also see Differential Cryptanalysis: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
Normally we speak of data diffusion, in which changing a tiny part of the plaintext data may affect the whole ciphertext. But we can also speak of key diffusion, in which changing even a tiny part of the key should change each bit in the ciphertext with probability 0.5.
Perhaps the best diffusing component is substitution, but this diffuses only within a single substituted value. Substitution-permutation ciphers get around this by moving the bits of each substituted element to other elements, substituting again, and repeating. But this only provides guaranteed diffusion if particular substitution tables are constructed. Another alternative is to use some sort of Balanced Block Mixing which has an inherently guaranteed diffusion, or a Variable Size Block Cipher construction. Also see Overall Diffusion.
If we have a discrete distribution, with a finite number of possible result values, we can speak of "frequency" and "probability" distributions: The "frequency distribution" is the expected number of occurrences for each possible value, in a particular sample size. The "probability distribution" is the probability of getting each value, normalized to a probability of 1.0 over the sum of all possible values.
Here is a graph of a typical "discrete probability distribution" or "discrete probability density function," which displays the probability of getting a particular statistic value for the case "nothing unusual found":
0.1| *** | * * Y = Probability of X Y | ** ** y = P(x) | **** **** 0.0 ------------------- X
Unfortunately, it is not really possible to think in the same way about continuous distributions: Since continuous distributions have an infinite number of possible values, the probability of getting any particular value is zero. For continuous distributions, we instead talk about the probability of getting a value in some subrange of the overall distribution. We are often concerned with the probability of getting a particular value or below, or the probability of a particular value or above.
Here is a graph of the related "cumulative probability distribution" or "cumulative distribution function" (c.d.f.) for the case "nothing unusual found":
1.0| ****** | ** Y = Probability (0.0 to 1.0) of finding Y | * a value which is x or less | ** 0.0 -******------------ X
The c.d.f. is just the sum of all probabilities for a given value
or less. This is the usual sort of function used to interpret a
statistic: Given some result, we can
look up the probability of a lesser value (normally called
p) or a greater value (called
Usually, a test statistic is designed so that extreme values are not likely to occur by chance in the case "nothing unusual found" which is the null hypothesis. So if we do find extreme values, we have a strong argument that the results were not due simply to random sampling or other random effects, and may choose to reject the null hypothesis and thus accept the alternative hypothesis.
Common discrete distributions include the binomial distribution, and the Poisson distribution.
Also see: associative and commutative.
This is a particular danger in cryptosystems, since most ciphers are built from less-complex parts. Indeed, a major role of cryptographic design is to combine small component parts into a larger complex system which cannot be split apart.
One way to have a dynamic key in a block cipher is to include the key value along with the plaintext data. But this is normally practical only with blocks of huge size, or variable size blocks.
Another way to have a dynamic key in a block cipher is to add a confusion layer which mixes the key value with the block. For example, exclusive-OR could be used to mix a 64-bit key with a 64-bit data block.
Dynamic Substitution is the use of an invertible substitution table in which the arrangement of the entries changes dynamically during operation. This is particularly useful as a strong replacement for the strengthless exclusive-OR combiner in stream ciphers.
The arrangement of a keyed table starts out unknown to an Opponent. From the Opponent's point of view, each table entry could be any possible value with uniform probability. But after the first value is mapped through that table, the used transformation (table entry) is at least potentially exposed, and no longer can be considered a completely unknown probability. Dynamic Substitution acts to make the used transformation again completely unknown and unbiased, by allowing it to take on any possible mapping. As a first approximation, the amount of information leaked about table contents is replaced by information used to re-define each used entry.
In the usual case, an invertible substitution table is keyed by shuffling under the control of a random number generator. One combiner input value is used to select a value from within that table to be the result or output. The other combiner input value is used simply to select an entry, and then the values at the two selected entries are exchanged. So as soon as a plaintext mapping is used, it is immediately reset to any possibility, and the more often any plaintext value occurs, the more often that transformation changes.
Also see Balanced Block Mixing, and Variable Size Block Cipher.
Since each block -- plaintext or ciphertext -- contains exactly the same number of 1's and 0's, every possible plaintext block is just some permutation of any possible ciphertext block. And since any possible plaintext block can be produced from any ciphertext block in a vast plethora of different ways, the keying sequence is hidden even from known plaintext. And defined plaintext is easily defeated with the usual message key. To the extent that every possible plaintext block can be produced, the cipher approaches perfect secrecy.
See the article Transposition Cipher with Pseudo-Random Shuffling: The Dynamic Transposition Combiner.
ECB
ECB is the naive method of applying a block cipher, in that the plaintext is simply partitioned into appropriate size blocks, and each block is enciphered separately and independently. When we have a small block size, ECB is generally unwise, because language text has biased statistics which will result in some block values being re-used frequently, and this repetition will show up in the raw ciphertext. This is the basis for a successful codebook attack.
On the other hand, if we have a large block, we may expect it to contain enough (at least, say, 64 bits) uniqueness or "entropy" to prevent a codebook attack. In that case, ECB mode has the advantage of supporting independent ciphering of each block. This, in turn, supports various things, like the use of multiple ciphering hardware operating in parallel for higher speeds.
As another example, modern packet-switching network technologies often deliver raw packets out of order. The packets will be re-ordered eventually, but having out-of-sequence packets can be a problem for low-level ciphering if the blocks are not ciphered independently.
It is important to distinguish between a true electromagnetic field, and the simpler and range-limited electric and magnetic fields produced by an electrical clock, motor, or power lines. It is also important to distinguish between the light-like expanding or "radiating" property of an electromagnetic field, and the damaging ionizing radiation produced by a radioactive source.
As far as we know -- and a great many experiments have been conducted on this -- electromagnetic waves are not life-threatening (unless they transfer enough power to dangerously heat the water in our cells). The belief that electromagnetic fields are not dangerous is also reasonable, since light itself is an electromagnetic wave, and life on Earth developed in the context of the electromagnetic field from the Sun. Indeed, plants actually use that field to their and our great benefit.
H(X) = -SUM( pi log2 pi )
Although entropy is sometimes taken as a measure of randomness, calculating entropy requires a knowledge of the probabilities of each value which we often can attain only by sampling. This means that we do not really know the "true" probabilities, but only those we see in our samples. And the "true" probabilities may change through time.
By itself, calculated entropy also does not detect any underlying order that might exist between value probabilities, such as a correlation, or a linear relationship, or any other aspect of cryptographically-weak randomness. The "true entropy" of a random number generator is just the number of bits in the state of that generator, as opposed to an entropy computation on the sequence it produces. So a high entropy value does not imply that a really-random source really is random, or indeed have any relationship to the amount of cryptographic randomness present.
Here we have all three possible sequences from a non-ergodic process: across we have the average of symbols through time (the "temporal average"), and down we have the average of symbols in a particular position over all possible sequences (the "ensemble average"):
A B A B A B ... p(A) = 0.5, p(B) = 0.5, p(E) = 0.0 B A B A B A ... p(A) = 0.5, p(B) = 0.5, p(E) = 0.0 E E E E E E ... p(A) = 0.0, p(B) = 0.0, p(E) = 1.0 ^ ^ ^ ^ ^ ^ +-+-+-+-+-+---- p(A) = 0.3, p(B) = 0.3, p(E) = 0.3 (From: Pierce, J. 1961. Symbols, Signals and Noise. Ch. 3)When a process is non-ergodic, the measurements we take over time from one or a few sequences may not represent all the sequences which may be encountered.
Factorial
See the factorials section of the Ciphers By Ritter / JavaScript computation pages.
If two Boolean functions are not correlated, we expect them to agree half the time, which we might call the "expected distance." When two Boolean functions are correlated, they will have a distance greater or less than the expected distance, and we might call this difference the unexpected distance or UD. The UD can be positive or negative, representing distance to a particular affine function or its complement.
It is easy to do a Fast Walsh Transform by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right will be the first less the second. That is,
(a',b') = (a+b, a-b)
So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible, as shown:
original 1 0 0 1 1 1 0 0 ^---^ ^---^ ^---^ ^---^ first 1 1 1 -1 2 0 0 0 ^-------^ ^-------^ ^-------^ ^-------^ second 2 0 0 2 2 0 2 0 ^---------------^ ^---------------^ ^---------------^ ^---------------^ final 4 0 2 2 0 0 -2 2
The result is the "unexpected distance" to each affine Boolean function. The higher the absolute value, the greater the "linearity"; if we want the nonlinearity, we must subtract the absolute value of each unexpected distance from the expected value, which is half the number of bits in the function. Note that the range of possible values increases by a factor of 2 (in both positive and negative directions) in each sublayer mixing; this is information expansion, which we often try to avoid in cryptography.
Also see: Walsh-Hadamard Transforms: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and the Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.
The FWT provides a strong mathematical basis for block cipher mixing such that all input values will have an equal chance to affect all output values. Cryptographic mixing then occurs in butterfly operations based on balanced block mixing structures which replace the simple add / subtract butterfly in the FWT and confine the value ranges so information expansion does not occur. A related concept is the well-known FFT, which can use exactly the same mixing patterns as the FWT.
Normally, in a Feistel construction, the input block is split into two parts, one of which drives a transformation whose result is exclusive-OR combined into the other block. Then the "other block" value feeds the same transformation, whose result is exclusive-OR combined into the first block. This constitutes 2 of perhaps 16 "rounds."
L R | | |--> F --> + round 1 | | + <-- F <--| round 2 | | v v L' R'
One advantage of the Feistel construction is that the transformation does not need to be invertible. To reverse any particular layer, it is only necessary to apply the same transformation again, which will undo the changes of the original exclusive-OR.
A disadvantage of the Feistel construction is that diffusion depends upon the internal transformation. There is no guarantee of overall diffusion, and the number of rounds required is often found by experiment.
Also see the Fenced DES section of the Ciphers By Ritter page, and A Keyed Shuffling System for Block Cipher Cryptography.
Fencing layers are also used in other types of cipher.
While exceedingly valuable, the FFT tends to run into practical problems in use which can require a deep understanding of the process. For example, the transform assumes that the waveform is "stationary" and thus repetitive and continuous, which is rarely the case. As another example, sampling a continuous wave can create spurious "frequency" values related to the sampling and not the wave itself. Also the range of possible values increases by a factor of 2 (in both positive and negative directions) in every sublayer mixing; this is information expansion, which we often try to avoid in cryptography.
The FFT provides a strong mathematical basis for block cipher mixing such that all input values will have an equal chance to affect all output values. Cryptographic mixing then occurs in butterfly operations based on balanced block mixing structures which replace the simple add / subtract butterfly in the FFT and confine the value ranges so information expansion does not occur. A related concept is the fast Walsh-Hadamard transform (FWT), which can use exactly the same mixing patterns as the FFT.
In general, a field supports the four basic operations (addition, subtraction, multiplication and division), and satisfies the normal rules of arithmetic. An operation on any two elements in a field is a result which is also an element in the field.
Examples of fields include rings of integers modulo some prime. Here are multiplication tables under mod 2, mod 3 and mod 4:
0 1 0 1 2 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 2 1 0 1 2 3 2 0 2 1 2 0 2 0 2 3 0 3 2 1In a field, each element must have an inverse, and the product of an element and its inverse is 1. This means that every non-zero row and column of the multiplication table for a field must contain a 1. Since row 2 of the mod 4 table does not contain a 1, the set of integers mod 4 is not a field.
The order of a field is the number of elements in that field. The integers mod p form a finite field of order p. Similarly, mod 2 polynomials will form a field with respect to an irreducible polynomial, and will have order 2n, which is a very useful size.
f(x) = A0 + SUM (An cos nx + Bn sin nx)Alternately, over the interval [a, a+2c]:
f(x) = a0 + SUM ( an cos(n PI x/c) + bn sin(n PI x/c) ) an = 1/c INTEGRAL[a,a+2c]( f(x) cos(n PI x/c) dx ) bn = 1/c INTEGRAL[a,a+2c]( f(x) sin(n PI x/c) dx )
The use of sine and cosine functions is particularly interesting, since each term represents a single frequency oscillation. So to the extent that we can represent an amplitude waveform as a series of sine and cosine functions, we thus describe the frequency spectrum associated with that waveform. This frequency spectrum describes the frequencies which must be handled by a circuit to reproduce the original waveform. This illuminating computation is called a Fourier transform.
In a cryptographic context, one of the interesting parts of the Fourier transform is that it represents a thorough mixing of each input value to every output value.
Gain
Typically we have mod 2 polynomials with results reduced "modulo" an irreducible "generator" polynomial g of degree n. This is analogous to creating a field from the integers modulo some prime p.
For example, consider GF(24) using the generator polynomial x4 + x + 1, or 10011, which is a degree-4 irreducible. First we multiply two elements as usual:
1 0 1 1 * 1 1 0 0 ---------- 0 0 1 0 1 1 1 0 1 1 --------------- 1 1 1 0 1 0 0Then we "reduce" the result modulo the generator polynomial:
1 1 0 ---------------- 1 0 0 1 1 ) 1 1 1 0 1 0 0 1 0 0 1 1 --------- 1 1 1 0 0 1 0 0 1 1 --------- 1 1 1 1 0 1 0 0 1 1 --------- 1 1 0 1 =========
So, if I did the arithmetic right, the result is the remainder, 1101. I refer to this as arithmetic "mod 2, mod p".
An irreducible is sufficient to form a finite field. However, some special irreducibles are also primitive, and these create "maximal length" sequences in LFSR's.
Goodness-of-fit tests can at best tell us whether one distribution is or is not the same as the other, and they say even that only with some probability. It is important to be very careful about experiment design, so that, almost always, "nothing unusual found" is the goal we seek. When we can match distributions, we are obviously able to state exactly what the experimental distribution should be and is. But there are many ways in which distributions can differ, and simply finding a difference is not evidence of a specific effect. (See null hypothesis.)
A group is basically a
mapping from two elements in the
group, through the group operation m, into the same group:
m:G x G -> G
Hamming Distance
A hash of data will produce a particular hash value, which then can be included in the message before it is sent (or stored). When the data are received (or read) and the hash value computed, this should match the included hash value. So if the hash is different, something has changed, and the usual solution is to request the data be sent again. But the hash value is typically much smaller than the data, so there must be "many" different data sets which will produce that same value. This means that "error detection" inherently cannot detect all possible errors, and this is quite independent of any "linearity" in the hash computation.
An excellent example of a hash function is a CRC operation. CRC is a linear function without cryptographic strength, but does have a strong mathematical basis which is lacking in ad hoc methods. Strength is not needed when keys are processed into the state used in a random number generator, because if either the key or the state becomes known, the keyed cipher has been broken.
In contrast, a cryptographic hash function must be "strong" in the sense that it must be "computationally infeasible" to find two input values which produce the same hash result. In general, this means that the hash result should be 128 bits or more in size.
Sometimes a cryptographic hash function is described as being "collision free," which is a misnomer. A collision occurs when two different texts produce exactly the same hash result. Given enough texts, collisions will of course occur, precisely because any fixed-size result has only so many possible code values. The intent is that collisions be hard to find and particular hash values impossible to create at will.
Each hex value represents exactly four bits, which can be particularly convenient. Also see: binary, octal, and decimal.
A form of homophonic substitution is available in a large block cipher, where a homophonic selection field is enciphered along with the plaintext. Any of the possible values for that field naturally will produce a unique ciphertext. After deciphering any of those ciphertexts, the homophonic selection field could be deleted, and the exact same plaintext recovered. Note that the ability to produce a multitude of different encipherings for exactly the same data is related to the concept of a key.
IDEA
The disturbing aspect of the IDEA design is the extensive use of almost linear operations, and no nonlinear tables at all. While technically nonlinear, the internal operations seem like they might well be linear enough to be attacked.
There are various examples:
Also see: perfect secrecy. From Claude Shannon.
Typically a coil or multiple turns of conductor wound on a magnetic or ferrous core. Current in the conductor creates a magnetic field, thus "storing" charge. When power is removed, the magnetic field collapses to maintain the current flow; this can produce high voltages, as in automobile spark coils.
In some realizations, an intermediate block might be wired connections between layer hardware. In the context of a general purpose computer, an intermediate block might represent the movement of data between operations, or perhaps transient storage in the original block.
+----------+ +----------+ | | INTO | Y | | X | | +----+ | | | f | |f(X)| | | | ---> | +----+ | +----------+ +----------+
g(f(x)) = x = f-1(f(x)).Only functions which are one-to-one can have an inverse.
A cipher which takes plaintext to ciphertext, and ciphertext back to plaintext, using the exact same operation.
A polynomial form of the ever-popular "Sieve of Eratosthenes" can be used to build table of irreducibles through degree 16. That table can then be used to check any potential irreducible through degree 32. While slow, this can be a simple, clear validation of other techniques.
Also see primitive polynomial.
An IV often can be seen as a design-specific form of message key. Sometimes, iterative ciphering under different IV values can provide sufficient keying to perform the message key function.
Generally, an IV must be accompany the ciphertext, and so always expands the ciphertext by the size of the IV.
Jitterizer
The name is taken from the use of an oscilloscope on digital circuits, where a signal which is not "in sync" is said to "jitter." Mechanisms designed to restore synchronization are called "synchronizers," so mechanisms designed to cause jitter can legitimately be called "jitterizers."
KB
In cryptography we have various kinds of keys, including a User Key (the key which a user actually remembers), which may be the same as an Alias Key (the key for an alias file which relates correspondent names with their individual keys). We may also have an Individual Key (the key actually used for a particular correspondent); a Message Key (normally a random value which differs for each and every message); a Running Key (the confusion sequence in a stream cipher, normally produced by a random number generator); and perhaps other forms of key as well.
In general, the value of a cryptographic key is used to initialize the state of a cryptographic mechanism.
Ideally, a key will be a equiprobable selection among a huge number of possibilities. This is the fundamental strength of cryptography, the "needle in a haystack" of false possibilities. But if a key is in some way not a random selection, but is instead biased, the most-likely keys can be examined first, thus reducing the complexity of the search and the effective keyspace.
In most cases, a key will exhibit diffusion across the message; that is, changing even one bit of a key should change every bit in the message with probability 0.5. A key with lesser diffusion may succumb to some sort of divide and conquer attack.
Although this problem is supposedly "solved" by the advent of the public key cipher, in fact, the necessary public key validation is almost as difficult as the original problem. Although public keys can be exposed, they must represent who they claim to represent, or a "spoofer" or man-in-the-middle can operate undetected.
Nor does it make sense to give each individual a separate secret key, when a related group of people would have access to the same files anyway. Typically, a particular group has the same secret key, which will of course be changed when any member leaves. Typically, each individual would have a secret key for each group with whom he or she associates.
Cryptography is based on the idea that if we have a huge number of keys, and select one at random, The Opponents generally must search about half of the possible keys to find the correct one; this is a brute force attack.
Although brute force is not the only possible attack, it is the one attack which will always exist. Therefore, the ability to resist a brute force attack is normally the "design strength" of a cipher. All other attacks should be made even more expensive. To make a brute force attack expensive, a cipher simply needs a keyspace large enough to resist such an attack. Of course, a brute force attack may use new computational technologies such as DNA or "molecular computation." Currently, 120 bits is large enough to prevent even unimaginably large uses of such new technology.
It is probably just as easy to build efficient ciphers which use huge keys as it is to build ciphers which use small keys, and the cost of storing huge keys is probably trivial. Thus, large keys may be useful when this leads to a better cipher design, perhaps with less key processing. Such keys, however, cannot be considered better at resisting a brute force attack than a 120-bit key, since 120 bits is already sufficient.
A substitution table is keyed by creating a particular ordering from each different key. This can be accomplished by shuffling the table under the control of a random number generator which is initialized from the key.
A known plaintext attack is especially dangerous to the usual stream cipher which has an additive combiner, because the known plaintext can be "subtracted" from the ciphertext, thus completely exposing the confusion sequence. This is the sequence produced by the cryptographic random number generator, and can be used to attack that generator. This sort of attack can generally be prevented by using a Dynamic Substitution Combiner instead of the usual additive combiner.
It is surprisingly reasonable that The Opponent might well have some known plaintext (and related ciphertext): This might be the return address on a letter, a known report, or even some suspected words. Sometimes the cryptosystem will carry unauthorized messages like birthday greetings which are then exposed, due to their apparently innocuous content.
The "one-sided" statistics are:
K+ = SQRT(N) * MAX( S(x[j]) - F(x[j]) ) = SQRT(N) * MAX( ((j+1)/n) - F(x[j]) ) K- = SQRT(N) * MAX( F(x[j]) - S(x[j]) ) = SQRT(N) * MAX( F(x[j]) - (j/n) )
And the "two-sided" KS statistic is:
K = SQRT(N) * MAX( ABS( S(x[j]) - F(x[j]) ) ) = MAX( K+, K- )
It appears that the "one-sided" KS distribution is far easier to compute precisely, and may be preferred on that basis.
See the Kolmogorov-Smirnov section of the Ciphers By Ritter / JavaScript computation pages.
Latency
The effect of latency on throughput can often be reduced by pipelining or partitioning the main operation into many small sub-operations, and running each of those in parallel, or at the same time. As each operation finishes, that result is latched and saved temporarily, pending the availability of the next sub-operation hardware. The result is throughput limited only by the longest sub-operation instead of the overall operation.
2 0 1 3 1 3 0 2 0 2 3 1 3 1 2 0
Also see: Latin Squares: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
A Latin square combiner is inherently balanced, because for any particular value of one input, the other input can produce any possible output value. A Latin square can be treated as an array of substitution tables, each of which are invertible, and so can be reversed for use in a suitable extractor. As usual with cryptographic combiners, if we know the output and a specific one of the inputs, we can extract the value of the other input.
For example, a tiny Latin square combiner might combine two 2-bit values each having the range zero to three (0..3). That Latin square would contain four different symbols (here 0, 1, 2, and 3), and thus be a square of order 4:
2 0 1 3 1 3 0 2 0 2 3 1 3 1 2 0
With this square we can combine the values 0 and 2 by selecting the top row (row 0) and the third column (column 2) and returning the value 1.
When extracting, we will know a specific one (but only one) of the two input values, and the result value. Suppose we know that row 0 was selected during combining, and that the output was 1: We can check for the value 1 in each column at row 0 and find column 2, but this involves searching through all columns. We can avoid this overhead by creating the row-inverse of the original Latin square (the inverse of each row), in the well-known way we would create the inverse of any invertible substitution. For example, in row 0 of the original square, selection 0 is the value 2, so, in the row-inverse square, selection 2 should be the value 0, and so on:
1 2 0 3 2 0 3 1 0 3 1 2 3 1 2 0
Then, knowing we are in row 0, the value 1 is used to select the second column, returning the unknown original value of 2.
A practical Latin square combiner might combine two bytes, and thus be a square of order 256, with 65,536 byte entries. In such a square, each 256-element column and each 256-element row would contain each of the values from 0 through 255 exactly once.
Layers can be confusion layers (which simply change the block value), diffusion layers (which propagate changes across the block in at least one direction) or both. In some cases it is useful to do multiple operations as a single layer to avoid the need for internal temporary storage blocks.
There are various ways a relationship can be linear. One way is to consider a, x, and b as integers. Another is for them to be polynomial elements of GF(2n). Yet another is to consider a to be an n by n matrix, with x and b as n-element vectors. There are probably various other ways as well.
Linearity also depends upon our point of view: For example, integer addition is linear in the integers, but when expressed as mod 2 operations, the exact same computation producing the exact same results is not considered linear.
In cryptography the issue may not be as much one of strict mathematical linearity as it is the "distance" between a function and some linear approximation (see Boolean function nonlinearity). True linear functions are used because they are easy and fast, but they are also exceedingly weak. Of course XOR is linear and trivial, yet is used all the time in arguably strong ciphers. But a design using linear components must have other nonlinear components to provide strength.
Also see: Linear Complexity: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
In an n-element shift register (SR), if the last element is connected to the first element, a set of n values can circulate around the SR in n steps. But if the values in two of the elements are combined by exclusive-OR and that result connected to the first element, it is possible to get an almost-perfect maximal length sequence of 2n-1 steps. (The all-zeros state will produce another all-zeros state, and so the system will "lock up" in a degenerate cycle.) Because there are only 2n different states of n binary values, every state value but one must occur exactly once, which is a statistically-satisfying result. Moreover, the values so produced are a perfect permutation of the "counting" numbers (1..2n-1).
A Linear Feedback Shift Register +----+ +----+ +----+ +----+ +----+ "a0" +-<-| a5 |<---| a4 |<-*-| a3 |<---| a2 |<---| a1 |<--+ | +----+ +----+ | +----+ +----+ +----+ | | v | +------------------> (+) ----------------------------+ 1 0 1 0 0 1
In the figure we have a LFSR of degree 5, consisting of 5 storage elements a[5]..a[1] and the feedback computation a[0]=a[5]+a[3]. The stored values may be bits and the operation (+) addition mod 2. A clock edge will simultaneously shift all elements left, and load element a[1] with the feedback result as it was before the clock changed the register. Each SR element is just a time-delayed replica of the element before it, and here the element subscript conveniently corresponds to the delay. We can describe this logically:
a[1][t+1] = a[5][t] + a[3][t]; a[2][t+1] = a[1][t]; a[3][t+1] = a[2][t]; a[4][t+1] = a[3][t]; a[5][t+1] = a[4][t];
Normally the time distinction is ignored, and we can write more generally, for some feedback polynomial C and state polynomial A of degree n:
n a[0] = SUM c[i]*a[i] i=1
The feedback polynomial shown here is 101001, a degree-5 poly running from c[5]..c[0] which is also irreducible. Since we have degree 5 which is a Mersenne prime, C is also primitive. So C produces a maximal length sequence of exactly 31 steps, provided only that A is not initialized as zero. Whenever C is irreducible, the reversed polynomial (here 100101) is also irreducible, and will also produce a maximal length sequence.
LFSR's are often used to generate the confusion sequence for stream ciphers, but this is very dangerous: LFSR's are inherently linear and thus weak. Knowledge of the feedback polynomial and only n element values (from known plaintext) is sufficient to run the sequence backward or forward. And knowledge of only 2n elements is sufficient to develop an unknown feedback polynomial. This means that LFSR's should not be used as stream ciphers without in some way isolating the sequence from analysis. Also see jitterizer and additive RNG.
Also devices which realize symbolic logic, such as Boolean logic, a logic of TRUE or FALSE values. Also see: subjective, objective, contextual, absolute, inductive reasoning, deductive reasoning, and fallacy.
INPUT NOT 0 1 1 0 INPUT AND OR XOR 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0
These Boolean values can be stored as a bit, and can be associated with 0 or 1, FALSE or TRUE, NO or YES, etc.
M-Sequence
The Man-in-the-Middle (MITM) Attack is mainly applicable to public key systems, and focuses on the idea that many people will send their public keys on the network. The bad part of this is a lack of key authentication, because the Man-in-the-Middle can send a key just as easily, and pretend to be the other end. Then, if one uses that key, one has secure communication with The Opponent, instead of the far end. The MITM can receive a message, decipher it, read it, re-encipher it in the correct public key, and send it along. In this way, neither end need know anything is wrong, yet The Opponent is reading the mail.
Perhaps the worst part of this is that a successful MITM attack does not involve any attack on the actual ciphering. And this means that all proofs or confidence in the security of particular ciphering mechanisms is totally irrelevant to the security of a system which supports MITM attacks.
The way to avoid MITM attacks is to certify public keys, but this is inconvenient and time-consuming. Unless the cipher requires keys to be certified, this is rarely done. The worst part of this is that a successful MITM attack consumes few resources, need not "break" the cipher itself, and may provide just the kind of white-collar desktop intelligence a bureaucracy would love.
It is interesting to note that, regardless of how inconvenient it may be to share keys for a secret-key cipher, this is an inherent authentication which prevents MITM attacks.
f: X -> Y ,the mapping or function or transformation f takes any value in the domain X into some value in the range, which is contained in Y. For each element x in X, a mapping associates a single element y in Y. Element f(x) in Y is the image of element x in X.
If no two values of x in X produce the same result f(x), f is one-to-one or injective.
f-1(f(x)) = x.
There are some problems with a strictly mathematical approach to cryptography:
On the other hand, mathematics is irreplaceable in providing the tools to pick out and describe structure in apparently strong cipher designs. Mathematics can identify specific strength problems, and evaluate potential fixes. But there appears to be no real hope of evaluating strength with respect to every possible attack, even using mathematics.
Although mathematical cryptography has held out the promise of providing provable security, in over 50 years of work, no practical cipher has been generally accepted as having proven strength. See, for example: one time pad.
A mechanism can be seen as a process or an implementation for performing that process (such as electronic hardware, computer software, hybrids, or the like).
Although perhaps looked down upon by those of the mathematical cryptography persuasion, mechanistic cryptography certainly does use mathematics to design and predict performance. But rather than being restricted to arithmetic operations, mechanistic cryptography tends to use a wide variety of mechanically-simple components which may not have concise mathematical descriptions. Rather than simply implementing a system of math expressions, complexity is constructed from the various efficient components available to digital computation.
Mersenne Primes: 2 107 9689 216091 3 127 9941 756839 5 521 11213 859433 7 607 19937 1257787 13 1279 21701 1398269 17 2203 23209 19 2281 44497 31 3217 86243 61 4253 110503 89 4423 132049
Normally, the message key is a large random value which becomes the key for ciphering the data in a single message. Normally, the message key itself is enciphered under the User Key or other key for that link. The receiving end first deciphers the message key, then uses that value as the key for deciphering the message data. Alternately, the random value itself may be sent unenciphered, but is then enciphered or hashed (under a keyed cryptographic hash) to produce a value used as the data ciphering key.
The message key assures that the actual data is ciphered under a key which is an arbitrary selection from a huge number of possible keys; it therefore prevents weakness due to user key selection. A message key is used exactly once, no matter how many times the same message is enciphered, so at most, a successful attack on a message key exposes just one message. The internal construction of a random message key cannot be controlled by a user, and thus prevents all attacks based on repeated ciphering under a single key. To the extent that the message key value really is random and is never exposed on either end, the message key is much more easily protected than ordinary text (see ideal secrecy). In a sense, a message key is the higher-level concept of an IV, which is necessarily distinct for each particular design.
Below, we have a toy 32-bit-block Mixing Cipher. Plaintext at the top is transformed into ciphertext at the bottom. Each "S" is an 8-bit substitution table, and each table (and now each mixing operation also) is individually keyed.
Horizontal lines connect elements which are to be mixed together: Each *---* represents a single Balanced Block Mixing or BBM. Each BBM takes two elements, mixes them, and returns two mixed values. The mixed results then replace the original values in the selected positions just like the "butterfly" operations used in some FFT's.
A 32-Bit Mixing Cipher | | | | <- Input Block (Plaintext) S S S S <- Fencing | | | | *---* *---* <- 2 BBM Mixings | | | | *-------* | <- 1 BBM Mixing | *-------* <- 1 BBM Mixing | | | | S S S S <- Fencing | | | | *-------* | | *-------* | | | | *---* *---* | | | | S S S S <- Fencing | | | | <- Output Block (Ciphertext)
By mixing each element with another, and then each pair with
another pair and so on, every element is eventually mixed with every
other element. Each BBM mixing is
dyadic, so each "sub-level" is a mixing of
twice as many elements as the sublevel before it. A block of n
elements is thus fully mixed in
The pattern of these mixings is exactly like some implementations of the FFT, and thus the term "FFT-style." Also see the articles in the Mixing Ciphers section on the Ciphers By Ritter pages.
0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 0 1 + 1 + 1 = 1 0 * 0 = 0 0 * 1 = 0 1 * 0 = 0 1 * 1 = 1Subtraction mod 2 is the same as addition mod 2. The operations + and * can also be considered the logic functions XOR and AND respectively.
Addition and Subtraction: 1 0 1 1 + 0 1 0 1 + 1 1 0 0 --------- 0 0 1 0 Multiplication: 1 0 1 1 * 1 1 0 0 ---------- 0 0 1 0 1 1 1 0 1 1 --------------- 1 1 1 0 1 0 0Polynomial multiplication is not the same as repeated polynomial addition. But there is a fast approach to squaring mod 2 polynomials:
a b c d a b c d ------------ ad bd cd dd ac bc cc dc ab bb cb db aa ba ca da ---------------------- a 0 b 0 c 0 dTo square a mod 2 polynomial, all we have to do is "insert" a zero between every column. Note that aa = a for a = 0 or a = 1, and ab = ba, so either 0 + 0 = 0 or 1 + 1 = 0.
Division: 1 0 1 1 ---------------- 1 1 0 0 ) 1 1 1 0 1 0 0 1 1 0 0 --------- 1 0 1 0 1 1 0 0 --------- 1 1 0 0 1 1 0 0 --------- 0
The decision about whether the divisor "goes into" the dividend is based exclusively on the most-significant (leftmost) digit. This makes polynomial division far easier than integer division.
Mod 2 polynomials behave much like integers in that one polynomial may or may not divide another without remainder. This means that we can expect to find analogies to integer "primes," which we call irreducible polynomials.
Mod 2 polynomials do not constitute a field; clearly, the size of a multiplication is unbounded. However, a finite field of polynomials can be created by choosing an irreducible modulus polynomial, thus producing a Galois field GF 2n.
Multiple encryption using different keys can be a way to increase strength. And multiple encryption using different ciphers can reduce the probability of using a single cipher which has been broken in secret. In both cases, the cost is additional ciphering operations.
Unfortunately, multiple encryption using just two (2) ciphers may not be much advantage: If we assume The Opponents know which ciphers are used, they can manipulate both the plaintext and the ciphertext to search for a match (a "meet-in-the-middle" attack strategy). One way to avoid this is to use three (3) cipherings, as in Triple DES.
Multiple encryption also can be dangerous, if a single cipher is used with the same key each time. Some ciphers are involutions which both encipher and decipher with the same process; these ciphers will decipher a message if it is enciphered a second time under the same key. This is typical of classic additive synchronous stream ciphers, as it avoids the need to have separate encipher and decipher operations. But it also can occur with block ciphers operated in stream-cipher-like modes such as OFB, for exactly the same reason.
Nomenclator
The logically contrary alternative hypothesis H1 is sometimes formulated with the specific hope that something unusual will be found, but this can be very tricky to get right. Many statistical tests (such as goodness-of-fit tests) can only indicate whether something matches what we expect, or does not. But any number of things can cause a mismatch, including a fundamentally flawed experiment. A simple mismatch does not normally imply the presence of a particular quality.
Even in the best possible situation, random sampling will produce a range or distribution of test statistic values. Often, even the worst possible statistic value can be produced by an unlucky sampling of the best possible data. It is thus important to know what distribution to expect because of the sampling alone, so if we find a different distribution, that will be evidence supporting the alternative hypothesis H1.
If we collect enough statistic values, we should see them occur in the ideal distribution for that particular statistic. So if we call the upper 5 percent of the distribution "failure" (this is the significance level) we not only expect but in fact require such "failure" to occur about 1 time in 20. If it does not, we will in fact have detected something unusual, something which might even indicate problems in the experimental design.
If we have only a small number of samples, and do not run repeated trials, a relatively few chance events can produce an improbable statistic value, which might cause us to reject a valid null hypothesis, and so commit a type I error.
On the other hand, if there is a systematic deviation in the underlying distribution, only a very specific type of random sampling could mask that problem. With few samples and trials, though, the chance random masking of a systematic problem is still possible, and could lead to a type II error.
Object Code
Somewhat easier to learn than hexadecimal, since no new numeric symbols are needed, but octal can only represent three bits at a time. This generally means that the leading digit will not take all values, and that means that the representation of the top part of two concatenated values will differ from its representation alone, which can be confusing. Also see: binary and decimal.
OFB is closely related to CFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. OFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.
OFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and also is shifted into the register. The resulting new register value is ciphered, producing another confusion value for use in stream ciphering.
One disadvantage of this, of course, is the need for a full block-wide ciphering operation, typically for each data byte ciphered. The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing.
A realized one time pad (OTP) is essentially a stream cipher with a really random confusion sequence used exactly once. The confusion sequence is the key, and it is as long as the data. Since this amount of keying material can be awkward to transfer and keep, we often see "pseudo" one-time pad designs which attempt to correct this deficiency. Normally, the point is to achieve the theoretical advantages of a one-time pad without the costs; the problem with this is that the one-time pad theory of strength no longer applies. These variations are best seen as classic stream cipher designs.
In a realized one time pad, the confusion sequence must be unpredictable (not generated from a small key value) and must be transported to the far end and held at both locations in absolute secrecy like any other secret key. But where a normal secret key might range perhaps from 16 bytes to 160 bytes, there must be as much OTP sequence as there will be data (which might well be megabytes). And a normal secret key could itself be sent under a key (as in a message key or under a public key). But an OTP sequence cannot be sent under a key, since this would make the OTP as weak as the key, in which case we might as well use a normal cipher. All this implies very significant inconveniences, costs, and risks, well beyond what one would at first expect, so even the realized one time pad is generally considered impractical, except in very special situations.
In a realized one time pad, the confusion sequence itself must be random for, if not, it will be somewhat predictable. And, although we have a great many statistical randomness tests, there is no test which can certify a sequence as either random or unpredictable. This means that a sequence which we assume to be random may not be the unpredictable sequence we need, and we can never know for sure. (This might be considered an argument for using a combiner with strength, such as a Latin square or Dynamic Substitution.) In practice, the much touted "mathematically proven unbreakability" of the one time pad depends upon an assumption of randomness and unpredictability which we can neither test nor prove.
The one time pad sometimes seems to have yet another level of strength above the usual stream cipher, the ever-increasing amount of "unpredictability" or entropy in the confusion sequence, leading to an indefinite unicity distance. In contrast, the typical stream cipher will produce a long sequence from a relatively small amount of initial state, and it can be argued that the entropy of an RNG is just the number of bits in its initial state. In theory, this might mean that the initial state or key used in the stream cipher could be identified after somewhat more than that same amount of data had been enciphered. But it is also perfectly possible for an unsuspected problem to occur in a really-random generator, and then the more sequence generated, the more apparent and useful that problem might be to an Opponent.
Nor does even a theoretical one time pad imply unconditional security: Consider A sending the same message to B and C, using, of course, two different pads. Now, suppose the Opponents can acquire plaintext from B and intercept the ciphertext to C. If the system is using the usual additive combiner, the Opponents can reconstruct the pad between A and C. Now they can send C any message they want, and encipher it under the correct pad. And C will never question such a message, since everyone knows that a one time pad provides "absolute" security as long as the pad is kept secure. Note that both A and C have done this, and they are the only ones who had that pad.
Various companies offer one time pad programs, and sometimes also the keying or "pad" material.
+----------+ +----------+ | | ONTO | | | X | | Y = f(X) | | | f | | | | ---> | | +----------+ +----------+
It can be argued that block cipher operating modes are stream "meta-ciphers" in which the streamed transformation is of full block cipher width, instead of the usual stream cipher bit- or byte-width transformations.
3 1 2 0 0 3 2 1 30 13 22 01 0 2 1 3 2 1 0 3 = 02 21 10 33 1 3 0 2 1 2 3 0 11 32 03 20 2 0 3 1 3 0 1 2 23 00 31 12
Also see Balanced Block Mixing.
Overall diffusion means that the ciphertext will appear to change at random even between related message blocks, thus hiding message relationships which might be used to attack the cipher.
Overall diffusion can be measured statistically in a realized cipher and used to differentiate between better and worse designs. Overall diffusion does not, by itself, define a good cipher, but it is required in a good block cipher.
Also see diffusion, avalanche, strict avalanche criterion and complete.
Padding
In more conventional computing, some additional data needed to fill-out a fixed-size data structure. This meaning also exists in cryptography, where the last block of a fixed-size block cipher often must be padded to fill the block.
The concept behind patenting is to establish intellectual property in a way somewhat related to a mining claim or real estate. An inventor of a machine or process can file a claim on the innovation, provided that it is not previously published, and that someone else does not already have such a claim. Actual patents normally do not claim an overall machine, but just the newly-innovative part, and wherever that part is used, it must be licensed from the inventor. It is common for an inventor to refine earlier work patented by someone else, but if the earlier patent has not expired, the resulting patent often cannot be practiced without a license from the earlier patent holder.
Someone who comes up with a patentable invention and wishes to give up their rights can simply publish a full description of the invention. Simple publication should prevent an application from anyone who has not already established legal proof that they previously came up with the same invention. In the U.S., publication also apparently sets a 1-year clock running for an application to be filed by anyone who does have such proof. But coming up with an invention does not take away someone else's rights if they came up with the same thing first, they may have a year to file, and their case might take several years to prosecute and issue.
In the U.S., a patent is a non-renewable grant, previously lasting 17 years from issue date, now lasting 20 years from application date. Both an application fee and an issue fee are required, as are periodic "maintenance" fees throughout the life of the patent. There are four main requirements:
A U.S. patent is not available if -- before the invention date -- the invention was:
A U.S. patent is not available if -- more than a year before the application date -- the invention was:
When the same invention is claimed by different inventors, deciding who has "priority" to be awarded the patent can require legally provable dates for both "conception" and "reduction to practice":
Also see: prior art and our claims tutorial.
In practice, a patent is rarely the intrusive prohibitive right that it may at first appear to be, because patents are really about money and respect. Ideally, a patent rewards the inventor for doing research and development, and then disclosing an invention to the public; it is also a legal recognition of a contribution to society. If someone infringes a patent in a way which affects sales, or which implies that the inventor cannot do anything about it, the patent holder can be expected to show some interest. But when little or no money is involved, a patent can be infringed repeatedly with little or no response, and typically this will have no effect on future legal action.
This simple introduction cannot begin to describe the complexity involved in filing and prosecuting a patent application. Your author does not recommend going it alone, unless one is willing to put far more time into learning about it and doing it than one could possibly imagine.
Normally the offender will be contacted, and there may be a settlement and proper licensing, or the offender may be able to design around the patent, or offender may simply stop infringing. Should none of these things occur, the appropriate eventual response is a patent infringement lawsuit in federal court.
There are some examples:
Also see: ideal secrecy. From Claude Shannon.
P(n) = n*(n-1)*(n-2)*...*2*1 = n!or n- factorial possible permutations. The number of permutations of n things taken k at a time is:
P(n,k) = n! / (n-k)!
See the permutations section of the Ciphers By Ritter / JavaScript computation pages. Also see combination and symmetric group.
A block cipher can be seen as a transformation between plaintext block values and ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Both plaintext and ciphertext have the same set of possible block values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, which is a permutation of the plaintext values. A block cipher is keyed by constructing a particular permutation of ciphertext values.
Within an explicit table, an arbitrary permutation (one of the set of all possible permutations) can be produced by shuffling the elements under the control of a random number generator. If, as usual, the random number generator has been initialized from a key, a particular permutation can be produced for each particular key; thus, each key selects a particular permutation.
Also, the second part of substitution-permutation block ciphers: First, substitution operations diffuse information across the width of each substitutions. Next, "permutation" operations act to re-arrange the bits of the substituted result (more clearly described as a set of transpositions); this ends a single round. In subsequent rounds, further substitutions and transpositions occur until the block is thoroughly mixed and overall diffusion hopefully achieved.
One problem with PGP is a relatively unworkable facility for authenticating public keys. While the users can compare a cryptographic hash of a key, this requires communication through a different channel, which is more than most users are willing to do. The result is a system which generally supports man-in-the-middle attacks, and these do not require "breaking" either of the ciphers.
A common frequency response has half the output voltage at twice the frequency. But this is actually one-quarter the power and so is a -6 dB / octave drop. For pink noise, the desired voltage drop per octave is 0.707.
The probability of finding exactly k successes when we have expectation u is:
k -u P(k,u) = u e / k!where e is the base of natural logarithms:
e = 2.71828...and u is:
u = n pagain for n independent trials, when each trial has success probability p. In the Poisson distribution, u is also both the mean and the variance
The ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.
If we have an experiment which we think should produce a Poisson distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact Poisson.
Also see the Poisson section of the Ciphers By Ritter / JavaScript computation pages.
cnxn + . . . + c1x + c0The c's or coefficients are elements of some field F. The degree n is the value of the exponent of the highest power term. A mod 2 polynomial of degree n has n+1 bits representing the coefficients for each power: n, n-1, ..., 1, 0.
Perhaps the most insightful part of this is that the addition of coefficients for a particular power does not "carry" into other coefficients or columns.
In DC electronics, simply voltage times current. In AC electronics, the instantaneous product of voltage times current, integrated over a repetitive cycle. In either case the result is in watts, denoted W.
All primitive polynomials are irreducible, but irreducibles are not necessarily primitive, unless the degree of the polynomial is a Mersenne prime. One way to find a primitive polynomial is to select an appropriate Mersenne prime degree and find an irreducible using Algorithm A of Ben Or:
1. Generate a monic random polynomial gx of degree n over GF(q); 2. ux := x; 3. for k := 1 to (n DIV 2) do 4. ux := ux^q mod gx; 5. if GCD(gx, ux-x) <> 1 then go to 1 fi; 6. od
Ben-Or, M. 1981. Probabilistic algorithms in finite fields. Proceedings of the 22nd IEEE Foundations of Computer Science Symposium. 394-398.
The result is a certified irreducible. GF(q) represents the Galois Field to the prime base q; for mod 2 polynomials, q is 2. These computations require mod 2 polynomial arithmetic operations for polynomials of large degree; "uxq" is a polynomial squared, and "mod gx" is a polynomial division. A "monic" polynomial has a leading coefficient of 1; this is a natural consequence of mod 2 polynomials of any degree. The first step assigns the polynomial "x" to the variable ux; the polynomial "x" is x1, otherwise known as "10".
To get primitives of non-Mersenne prime degree n, we certify
irreducibles P of degree n. To do this, we must factor the value
Small primes can be found though the ever-popular Sieve of Eratosthenes, which can also be used to develop a list of small primes used for testing individual values. A potential prime need only be divided by each prime equal to or less than the square-root of the value of interest; if any remainder is zero, the number is not prime.
In a U.S. application for patent, we are interested in the state of the open or public art as it existed as of the invention date, and also one year prior to the filing date. It is that art -- and not something hidden or something later -- against which the new application must be judged. Many things which seem "obvious" in retrospect were really quite innovative at the time they were done.
The usual random number generator is actually pseudorandom. Given the initial state, the entire subsequent sequence is completely pre-determined, but nevertheless exhibits many of the expected characteristics of a random sequence. Pseudorandomness supports generating the exact same cryptographic sequence repeatedly at different times or locations. Pseudorandomness is generally produced by a mathematical process, which may provide good assurances as to the resulting statistics, assurances which a really random generator generally cannot provide.
Either key can be used for enciphering or deciphering. Usually the exposed key is called the "public" key, and the retained hidden key is called the "private" key. The public key is distributed widely, so anyone can use it to encipher a message which presumably can only be deciphered by the hidden private key on the other end. Note that the enciphering end normally does not possess a key which will decipher a message which was just enciphered.
The whole scheme of course depends upon the idea that the private key cannot be developed from knowledge of the public key. The cipher also must resist both known-plaintext and defined-plaintext attack (since anyone can generate any amount of plaintext and encipher it). A public key cipher is vastly slower than a secret key cipher, and so is normally used simply to deliver the message key or session key for a conventional or secret key cipher.
Although at first proclaimed as a solution to the key distribution problem, it soon became apparent that someone could pretend to be someone else, and send out a "spoofed" public key. When people use that key, the spoofer could receive the message, decipher and read it, then re-encipher the message under the correct key and send it to the correct destination. This is known as a man-in-the-middle (MITM) attack.
A MITM attack is unusual in that it can penetrate cipher security without "breaking" either the public key cipher or the internal secret key cipher, and takes almost no computational effort. This is extremely serious because it means that the use of even "unbreakable" ciphers is not sufficient to guarantee privacy. All the effort spent on proving the strength of either cipher is simply wasted when a MITM attack is possible, and MITM attacks are only possible with public key ciphers.
To prevent spoofing, public keys must be authenticated (or validated or certified) as representing who they claim to represent. This can be almost as difficult as the conventional key distribution problem and generally requires complex protocols. And a failure in a key certification protocol can expose a system which uses "unbreakable" ciphers. In contrast, the simple use of an "unbreakable" secret key cipher (with hand-delivered keys) is sufficient to guarantee security. This is a real, vital difference between ciphering models.
Random
Randomness is an attribute of the process which generates or selects "random" numbers rather than the numbers themselves. But the numbers do carry the ghost of their creation: If values really are randomly generated with the same probability, we expect to find almost the same number of occurrences of each value or each sequence of the same length. Over many values and many sequences we expect to see results form in distributions which accord with our understanding of random processes. So if we do not find these expectations in the resulting numbers, we may have reason to suspect that the generating process is not random. Unfortunately, any such suspicion is necessarily statistical in nature, and cannot produce absolute proof in either direction: Randomness can produce any relationship between values, including apparent correlations (or their lack) which do not in fact represent the systematic production of the generator. (Also see the discussions of randomness testing in Statistics and Null Hypothesis, the article Chi-Square Bias in Runs-Up/Down RNG Tests, also Randomness Tests: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.)
From one point of view, there are no "less random" or "more random" sequences, since any sequence can be produced by a random process. And any sequence (at least any particular sequence) also can be produced by a deterministic computational random number generator. (We note that such generators are specifically designed to and do pass statistical randomness tests.) So the difference is not in the sequences, per se, but instead in the generators: For one thing, an RNG sequence is deterministic and therefore may somehow be predicted. But, in practice, extensive analysis could show deviations from randomness in either the deterministic RNG designs or the nondeterministic really random generation equipment, and this could make even a nondeterministic generator somewhat predictable.
There are "more complex" and "less complex" sequences according to various measures. For example:
We should note that the subset of sequences which have a high linear complexity leaves a substantial subset which does not. So if we avoid sequences with low linear complexity, any sequence we do accept must be more probable than it would be in the unfiltered set of all possible sequences. In this case, the expected higher uncertainty of the sequence itself is at least partly offset by the certainty that such a sequence will be used. Similar logic applies to S-box measurement and selection.
Oddly -- and much like strength in ciphers -- the "unpredictable" part of randomness is contextual and subjective, rather than the absolute and objective qualities we like in Science. While the sequence from a complex RNG can appear random, if we know the secret of the generator construction, and its state, we can predict the sequence exactly. But often we are in the position of seeing the sequence alone, without knowing the source, the construction, or the internal state. So while we might see a sequence as "random," that same sequence might be absolutely predictable (and thus not random) to someone who knows "the secret."
In practice, most random number generators are deterministic computational mechanisms, and each number is directly determined from the previous state of the mechanism. Such a sequence is often called pseudo-random, to distinguish it from a really random, sequence somehow composed of actually unrelated values.
A computational random number generator will always generate the same sequence if it is started in the same state. So if we initialize the state from a key, we can use the random number generator to shuffle a table into a particular order which we can reconstruct any time we have the same key. (See, for example: A Keyed Shuffling System for Block Cipher Cryptography.)
Note that random number generators are designed to pass the many statistical tests of randomness; clearly, such tests do not indicate a really random sequence. Moreover, if we define "random" as "the absence of any pattern," the only way we could validate such a sequence is by checking for every possible pattern. But there are too many patterns, so "real" randomness would seem to be impossible to check experimentally. (Also see the discussions of randomness testing in Statistics and Null Hypothesis.)
Also see the article: The Efficient Generation of Cryptographic Confusion Sequences, plus RNG Implementations: A Literature Survey, RNG Surveys: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.
A discrete random variable takes on a finite set of values. The probability of each value is the frequency function or probability density function, and the graph of the frequency function is the frequency distribution.
Examples of a really random source might include radioactive decay, Johnson or thermal noise, shot noise from a Zener diode or reverse-biased junction in breakdown, etc. Clearly, some sort of circuitry will be required to detect these generally low-level events, and the quality of the result is often directly related to the design of the electronic processing. Other sources of randomness might be precise keystroke timing, and the accumulated hash of text of substantial size. Also called physically random and truly random. As opposed to pseudorandom (see random number generator).
Really random values are particularly important as message key objects, or as a sequence for use in a realized one-time pad.
Also see: Random Number Machines: A Literature Survey and Random Electrical Noise: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.
Based on number-theoretic concepts and using huge numerical values, a RSA key must be perhaps ten times or more as long as a secret key for similar security.
Salt
Normally, we cannot hope to examine the full population, and so must instead investigate samples of the population, with the hope that they represent the larger whole. Often, random sampling occurs "without replacement"; effectively, each individual sample is returned to the population before the next sample is drawn.
One possible S-box is the identity transformation (0->0, 1->1, 2->2, ...) which clearly has no effect at all, while every other transformation has at least some effect. So different S-boxes obviously can contain different amounts of some qualities. Qualities often mentioned include avalanche and Boolean function nonlinearity. However, one might expect that different ciphering structures will need different table characteristics to a greater or less degree. So the discussion of S-box strength always occurs within the context of a particular cipher construction.
With respect to avalanche, any input change -- even one bit -- will select a different table entry. Over all possible input values and changes, the number of output bits changed will have a binomial distribution. (See the bit changes section of the Ciphers By Ritter / JavaScript computation pages.) So, in this respect, all tables are equal.
On the other hand, it is possible to arrange tables so that single-bit input changes are guaranteed to produce at least two-bit output changes, and this would seem to improve avalanche. But we note that this is probable even with a randomly-constructed table, so we have to ask just how much this guarantee has improved things. In a Feistel cipher, it seems like this might reduce the number of needed rounds by one. But in actual operation, the plaintext block is generally randomized, as in CBC-mode. This means that the probability of getting a single-bit change in operation is very low anyway.
It is true that cipher avalanche is tested using single-bit input changes, and that is the way avalanche is defined. The point of this is to assure that every output bit is "affected" by every input bit. But I see this as more of an experimental requirement than an operational issue that need be optimized.
With respect to Boolean function nonlinearity, as tables get larger it becomes very difficult -- and essentially impossible -- to find tables with ideal nonlinearity values. This means that we are always accepting a compromise value, and this is especially the case if the table must also have high values of other S-box qualities.
Even randomly-constructed tables tend to have reasonable nonlinearity values. We might expect an 8-bit table to have a nonlinearity of about 100 (that is, 100 bits must change in one of the eight 256-bit output functions to reach the closest affine Boolean function). Experimental measurement of the nonlinearity of 1,000,000 random 8-bit tables shows exactly one table with a nonlinearity as low as 78, and the computed probability of an actually linear table (nonlinearity zero) is something like 10-72 or 2-242.
The NSA-designed 8-bit table in Skipjack cipher has a computed nonlinearity of 104. While not quite the highest value we could find, it is in the top 2.5 percent of the distribution, and it seems improbable that this occurred by accident. We might assume that this table is representative of the modern understanding of the needs of a Feistel design with a fixed table. If so, we might conclude that good nonlinearity (or something very much like it) is a necessary, if not quite sufficient, part of the design.
It is "easy" to construct keyed S-boxes, by shuffling under the control of a keyed cryptographic random number generator. (See, for example: A Keyed Shuffling System for Block Cipher Cryptography.) This has the significant advantage of providing no fixed tables for The Opponent to understand and attack.
One question is whether one should attempt to measure and discard tables with poorer qualities than others. My personal feeling is that the ciphering structure should be strong enough to handle the expected random table distribution without added measurement and selection.
Also see: S-Box Design: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.
While full-size ciphers can never be exhaustively tested, tiny cipher models can be approached experimentally, and any flaws in them probably will be present in the full-scale versions we propose to use. Just as mathematics works the same for numbers large or small, a backdoor cipher built from fixed construction rules must have the same sort of backdoor, whether built large or small.
For block ciphers, the real block size must be at least 128 bits, and the experimental block size probably should be between 8 and 16 bits. Such tiny ciphers can be directly compared to keyed substitution tables of the same size, which are the ideal theoretical model of a block cipher.
Potentially, scalability does far more than just simplify testing: Scalability is an enabling technology that supports experimental analysis which is otherwise impossible.
In a secret key cipher, secrecy implies the use of a strong cipher. Secrecy in communication requires the secure distribution of secret keys to both ends (this is the key distribution problem).
In a public key cipher, the ability to expose keys apparently solves the key distribution problem. But communications secrecy requires that public keys be authenticated (certified) as belonging to their supposed owner. This must occur to cryptographic levels of assurance, because failure leads to immediate vulnerability under a man-in-the-middle attack. The possibility of this sort of attack is very disturbing, because it needs little computation, and does not involve breaking any cipher, which makes all discussion of cipher strength simply irrelevant.
A secure cryptosystem physically or logically prevents unauthorized disclosure of its protected data. This is independent of whether the attacker is a government agent, a criminal, a private detective, some corporate security person, or a friend of an ex-lover. Real security does not care who the attacker is, or what their motive may be, but instead protects against the threat itself. Limited security, on the other hand, often seeks to guess the identity, capabilities and motives of the attacker, and concentrates resources at those points.
There is, of course, no absolute security. But we can have real security against particular, defined threats. Also see: strength.
On the other hand, it can be a mistake to use even a public and well-reviewed cipher, if the cipher protects enough valuable information to support a substantial investment in analysis and equipment to break the cipher. A reasonable alternative is to select from among a wide variety of conceptually different ciphers, each of which thus carries far less information of far less value and so may not warrant a substantial attack investment.
Right-Shifting Shift Register (SR) +----+ +----+ +----+ Carry In -->| A0 |->| A1 |-> ... ->| An |--> Carry Out +----+ +----+ +----+
In digital hardware versions, elements are generally bits, and the stored values actually move from element to element in response to a clock. Analog hardware versions include the charge-coupled devices (CCD's) used in cameras, where the analog values from lines of sensors are sampled in parallel, then serialized and stepped off the chip to be digitized and processed.
In software versions, elements are often bytes or larger values, and the values may not actually move during stepping. Instead, the values may reside in a circular array, and one or more offsets into that array may step. In this way, even huge amounts of state can be "shifted" by changing a single index or pointer.
Within a computer environment, it is easy to shuffle an arbitrary number of symbols using a random number generator, and the algorithm of Durstenfeld, which is described in Knuth II:
Durstenfeld, R. 1964. Algorithm 235, Random Permutation, Procedure SHUFFLE. Communications of the ACM. 7: 420.
Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 139. Reading, Mass: Addison-Wesley.
Basically, the "Sieve of Eratosthenes" starts out with a table of numbers from 1 to some limit, all of which are potential primes, and the knowledge that 2 is a prime. Since 2 is a prime, no other prime can have 2 as a factor, so we run though the table discarding all multiples of 2. The next remaining number above 2 is 3, which we accept as a prime, and then run through the table crossing off all multiples of 3. The next remaining is 5, so we cross off all multiples of 5, and so on. After we cross-off each prime up to the square-root of the highest value in the table, the table will contain only primes.
A similar process works with small polynomials, and small polynomial fields, to find irreducible polynomials.
Perhaps the original classical form of cipher, in which each plaintext character is enciphered as some different character. In essence, the order of the alphabet is scrambled or permuted, and the particular scrambled order (or the scrambling process which creates that particular order) is the cipher key. Normally we think of scrambling alphabetic letters, but any computer coding can be scrambled similarly.
Small, practical examples of simple substitution are easily realized in hardware or software. In software, we can have a table of values each of which can be indexed or selected by element number. In hardware, we can simply have addressable memory. Given an index value, we can select the element at the index location, and read or change the value of the selected element.
A substitution table will be initialized to contain exactly one occurrence of each possible symbol or character. This allows enciphering to be reversed and the ciphertext deciphered. For example, suppose we substitute a two-bit quantity, thus a value 0..3, in a particular table as follows:
2 3 1 0.
The above substitution table takes an input value to an output value by selecting a particular element. For example, an input of 0 selects 2 for output, and an input of 2 selects 1. If this is our enciphering, we can decipher with an inverse table. Since 0 is enciphered as 2, 2 must be deciphered as 0, and since 2 is enciphered as 1, 1 must be deciphered as 2, with the whole table as follows:
3 2 0 1.
Mathematically, a simple substitution is a mapping (from input to output) which is one-to-one and onto, and is therefore invertible.
In statistics, the current symbol from a sequence, or a value which selects or conditions possible outcomes (see: Markov process).
We normally measure "state" in units of information or bits, and 8 bits of "state" can support 28 or 256 different state-value combinations or states.
Also see: deterministic and keyspace.
A useful statistic will have some known (or at least explorable) probability distribution for the case "nothing unusual found." This allows the statistic value to be interpreted as the probability of finding that value or less, for the case "nothing unusual found." Then, if improbable statistic values occur repeatedly and systematically, we can infer that something unusual is being found, leading to the rejection of the null hypothesis.
It is also possible to explore different distributions for the same statistic under different conditions. This can provide a way to guess which condition was in force when the data were obtained.
The usual role of statistics is to identify particular systematic events in the context of expected random variations that may conceal such events. This often occurs in a context of difficult and costly experimentation, and there is a premium on results which are so good that they stand above the noise; it may be that not much is lost if a weak positive is ignored.
In contrast, cryptography and randomness generally support vast amounts of testing at low cost, and we seek weak indications. In this context, we may find it more useful to conduct many tests and collect many statistic values, then visually and mathematically compare the experimental distribution to the ideal for that statistic.
A statistical distribution usually represents what we should expect from random data or random sampling. If we have random data, statistic values exceeding 95% of the distribution (often called failure) should occur about 1 time in 20. And since that one time may happen on the very first test, it is only prudent to conduct many tests and accumulate results which are more likely to represent reality than any one result from a single test.
In statistical randomness testing, "failure" should and must occur with the appropriate frequency. Thus, the failure to fail is itself a failure! This means that the very concept of statistical "failure" often may be inappropriate for cryptographic use. Grading a result as "pass" or "fail" discards all but one bit of information. Further, a pass / fail result is a Bernoulli trial, which would take many, many similar tests to properly characterize. So it may be more appropriate to collect 20 or more statistic probability values, and then compare the accumulation to the expected distribution for that statistic. This will provide a substantial basis for asserting that the sampled process either did or did not produce the same statistic distribution as a random process.
Due to random sampling, any statistical result is necessarily a probability, rather than certainty. An "unlucky" sampling can produce statistical results which imply the opposite of reality. In general, statistics simply cannot provide the 100 percent certainty which is traditionally expected of mathematical "proof."
In a conventional stream cipher, each element (for example, each byte) of the message is ciphered independently, and does not affect any other element.
In a few stream cipher designs, the value of one message byte may change the enciphering of subsequent message bytes; this is forward data diffusion. But a stream cipher cannot change the enciphering of previous message bytes. In contrast, changing even the last bit in a block cipher block will generally change about half of the earlier bits within that same block. Changing a bit in one block may even affect later blocks if we have some sort of stream meta-cipher composed of block cipher transformations, like CBC.
Note that a stream cipher generally does not need data diffusion for strength, as does a block cipher. In a block cipher, it may be possible to separate individual components of the cipher if their separate effects are not hidden by diffusion. But a stream cipher generally re-uses the same transformation, and has no multiple data components to hide.
The classic stream cipher is very simple, consisting of a keyed random number generator which produces a random-like confusion sequence or running key. That sequence is then combined with plaintext data in a simple additive combiner to produce ciphertext.
When an exclusive-OR combiner is used, exactly the same construction will also decipher the ciphertext. But if The Opponents have some known-plaintext and associated ciphertext, they can easily produce the original confusion sequence. This, along with their expected knowledge of the cipher design, may allow them to attack and expose the confusion generator. If this is successful, it will, of course, break the system until the RNG is re-keyed.
The ultimate stream cipher is the one-time pad, in which a really random sequence is never re-used. But if a sequence is re-used, The Opponent can generally combine the two ciphertexts, eliminating the confusion sequence, and producing the combined result of two plaintexts. Such a combination is normally easy to attack and penetrate.
The re-use of confusion sequence is extremely dangerous in a stream cipher design. In general, all stream cipher designs must use a message key to assure that the cipher is keyed with a random value for every new ciphering. This does, of course, expand the ciphertext by the size of the message key.
Another alternative in stream cipher design is to use a stronger combiner, such as Latin square or Dynamic Substitution combining. This can drastically reduce the complexity required in the confusion generator, which normally provides all stream cipher strength. Each of these stronger combiners is nonlinear, with substantial internal state, and the designer may elect to use multiple combinings in sequence, or a selection among different combiners. Neither of these approaches make much sense with an additive combiner.
Cipher "strength" is often taken as an absolute universal negative, the simple non-existence of any attack which could succeed, assuming some level of attack resources. But this means that overall "strength" may be forever impossible to measure, because there is no hope of enumerating and evaluating every possible attack.
Because we have no tools for the discussion of strength under all possible attacks, cipher "strength" is normally discussed in the context of particular attacks. Each known attack approach can be elaborated for a particular cipher, and a value calculated for the effort required to break the cipher in that way; this may set an "upper bound" on the unknown strength of the cipher (although some "elaborations" are clearly better than others). And while this is certainly better than not knowing the strength with respect to known attacks, such attacks may not represent the actual threat to the cipher in the field. (A cipher may even be said to have different "contextual strengths," depending on the knowledge available to different Opponents.) In general, we never know the "lower bound" or "true" strength of a cipher. So, unless a cipher is shown to be weaker than we can accept, cryptanalysis provides no useful information about cipher strength.
It is sometimes argued that "our guys" are just as good as the Opponents, who thus could not break a cipher with less effort than we know. Or it is said that if a better break were known, that secret necessarily would get out. When viewed in isolation such statements are clearly false reasoning, yet these are the sort of assumptions that are often implicitly used to assert strength after cryptanalysis.
Since we cannot know the true situation, for a proper security analysis we must instead assume that our Opponents have more time, are better trained, are better equipped, and may even be smarter than our guys. Further, the Opponents are quite likely to function as a well-motivated group with a common goal and which can keep secrets; clearly, this is a far different situation than the usual academic cryptanalysis. So, again, cryptanalysis by "our guys" provides no information about the strength of the cipher as seen by our Opponents.
Technical strength is just one of the many possibilities for weakness in a cipher system, and perhaps even the least likely. It is surprisingly difficult to construct a cipher system without "holes," despite using good ciphers, and The Opponents get to exploit any overlooked problems. Users must be educated in security, and must actively keep secrets or there will be nothing to protect. In contrast, cryptanalysis is very expensive, success is never assured, and even many of the known attacks are essentially impossible in practice.
Nevertheless, it is a disturbing fact that we do not know and cannot guarantee a "true" strength for any cipher. But there are approaches which may reduce the probability of technical weakness and the extent of any loss:
In general, we can consider a cipher to be a large key-selected transformation between plaintext and ciphertext, with two main types of strength:
Strength is the effectiveness of fixed defense in the cryptography war. In real war, a strong defense might be a fortification at the top of a mountain which could only be approached on a single long and narrow path. Unfortunately, in real military action, time after time, making assumptions about what the opponent "could not" do turned out to be deadly mistakes. In cryptography we can at least imagine that someday we might prove that all approaches but one are actually impossible, and then guard that last approach; see mathematical cryptography.
It is sometimes convenient to see security as a fence around a restricted compound: We can beef up the front gate, and in some way measure that increase in "strength." But none of that matters if someone cuts through elsewhere, or tunnels under, or jumps over. Until we can produce a cipher design which reduces all the possible avenues of attack to exactly one, it will be very difficult to measure "strength."
One possibility might be to construct ciphers in layers of different puzzles: Now, the obvious point of having multiple puzzles is to require multiple solutions before the cipher is broken. But a perhaps less obvious point is to set up the design so that the solution to one puzzle requires The Opponent to commit (in an information sense) in a way that prevents the solution to the next puzzle.
Also see design strength, perfect secrecy, ideal secrecy, and security.
As introduced in Webster and Tavares:
"If a cryptographic function is to satisfy the strict avalanche criterion, then each output bit should change with a probability of one half whenever a single input bit is complemented." [p.524]
Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85. 523-534.
Although the SAC has tightened the understanding of "avalanche," even SAC can be taken too literally. Consider the scaled-down block cipher model of a small invertible keyed substitution table: Any input bit-change thus selects a different table element, and so produces a random new value (over all possible keys). But when we compare the new value with the old, we find that typically half the bits change, and sometimes all the bits change, but never is there no change at all. This is a tiny bias toward change.
If we have a 2-bit (4-element) table, there are 4 values, but after we take one as the original, there are only 3 changed values, not 4. We will see changes of 1 bit, 1 bit, and 2 bits. But this is a change expectation of 2/3 for each output bit, instead of exactly 1/2 as one might interpret from SAC. Although this bias is clearly size-related, its source is invertibility and the definition of change. Thus, even a large block cipher must have some bias, though it is unlikely that we could measure enough cases to see it. The point is that one can extend some of these definitions well beyond their intended role.
Cryptography recognizes four types of substitution:
One of the advantages of S-P construction is that the "permutation" stage can be simply a re-arrangement of wires, taking almost no time. Such a stage is more clearly described as a limited set of "transpositions," rather than the more general "permutation" term. Since substitutions are also permutations (albeit with completely different costs and effects), one might fairly describe such a cipher as a "permutation-permutation cipher," which is not particularly helpful.
A disadvantage of the S-P construction is the need for special substitution patterns which support diffusion. S-P ciphers diffuse bit-changes across the block round-by-round; if one of the substitution table output bits does not change, then no change can be conducted to one of the tables in the next round, which has the effect of reducing the complexity of the cipher. Consequently, special tables are required in S-P designs, but even special tables can only reduce and not eliminate the effect. See Complete.
For the same range of input and output values, two invertible substitution tables differ only in the order or permutation of the values in the table. There are 256 factorial different byte-substitution tables, which is a keyspace of 1648 bits.
A keyed simple substitution table of sufficient size is the ideal block cipher. Unfortunately, with 128-bit blocks being the modern minimum for strength, there would be 2128 entries in that table, which is completely out of the question.
A keyed substitution table of practical size can only be thought of as a weak block cipher by itself, but it can be part of a combination of components which produce a stronger cipher. And since an invertible substitution table is the ideal tiny block cipher, it can be used for direct experimental comparison to a scalable block cipher of that same tiny size.
Suppose we consider a block cipher to be a key-selected permutation of the block values: One question of interest is whether our cipher construction could, if necessary, reach every possible permutation, the symmetric group.
It is now easy to construct large hardware or software systems which are almost unmanageably complex and never error-free. But a good design and development approach can produce systems with far fewer problems. One such approach is:
This is both easier and harder than it looks: there are many ways to decompose a large system, and finding an effective and efficient decomposition can take both experience and trial-and-error. But many of the possible decompositions define components which are less testable or even untestable, so the testability criterion greatly reduces the search.
Testing is no panacea: we cannot hope to find all possible bugs this way. But in practice we can hope to find 90 percent or more of the bugs simply by actually testing each component. (Component testing means that we are forced to think about what each component does, and about its requirements and limits. Then we have to make the realized component conform to those tests, which were based on our theoretical concepts. This will often expose problems, whether in the implementation, the tests, or the concepts.) By testing all components, when we put the system together, we can hope to avoid having to debug multiple independent problems simultaneously.
Other important system design concepts include:
Table Selection Combiner
Some amount of current change seems inevitable when switching occurs, and modern digital computation is based on such switching. But the amount of electromagnetic radiation emitted depends upon the amount of current switched, the length of the conductor, and the speed of the switching (that is, dI/dt, or the rate-of-change in current). In normal processing the amount of radiated energy is very small, but the value can be much larger when fast power drivers are used to send signals across cables of some length. This typically results in broadband noise which can be sensed with a shortwave receiver, a television, or an AM portable radio. Such receivers can be used to monitor attempts at improving the shielding.
Ideally, equipment would be fully enclosed in an electrically unbroken conducting surface. In practice, the conductive enclosure may be sheet metal or screening, with holes for shielded cables. Shielding occurs not primarily from metal per se, but instead from the flow of electrical current in that metal. When an electromagnetic wave passes through a conductive surface, it induces a current, and that current change creates a similar but opposing electromagnetic wave which nearly cancels the original. The metallic surface must conduct in all directions to properly neutralize waves at every location and from every direction.
Stock computer enclosures often have huge unshielded openings which are hidden by a plastic cover. These should be covered with metal plates or screening, making sure that good electrical contact occurs at all places around the edges. Note that assuring good electrical connections can be difficult with aluminum, which naturally forms a thin but hard and non-conductive surface oxide. It is important to actually monitor emission levels with receivers both before and after any change, and extreme success can be very difficult. We can at least make sure that the shielding is tight (that it electrically conducts to all the surrounding metal), that it is as complete as possible, and that external cables are effectively shielded.
Cable shielding extends the conductive envelope around signal wires and into the envelope surrounding the equipment the wire goes to. Any electromagnetic radiation from within a shield will tend to produce an opposing current in the shield conductor which will "cancel" the original radiation. But if a cable shield is not connected at both ends, no opposing current can flow, and no electromagnetic shielding will occur, despite having a metallic "shield" around the cable. It is thus necessary to assure that each external cable has a shield, and that the shield is connected to a conductive enclosure at both ends. (Note that some equipment may have an isolating capacitor between the shield and chassis ground to minimize "ground loop" effects when the equipment at each end of the cable connects to different AC sockets.) When shielding is impossible, it can be useful to place ferrite beads or rings around cables to promote a balanced and therefore essentially non-radiating signal flow.
Perhaps the most worrisome emitter on a personal computer is the display cathode ray tube (CRT). Here we have a bundle of three electron beams, serially modulated, with reasonable current, switching quickly, and repeatedly tracing the exact same picture typically 60 times a second. This produces a recognizable substantial signal, and the repetition allows each display point to be compared across many different receptions, thus removing noise and increasing the effective range of the unintended communication. All things being equal, a liquid-crystal display should radiate a far smaller and also more-complex signal than a desktop CRT.
Originally, a bipolar version with three terminals: Emitter (e), Collector (c), and Base (b). Current flow through the base-emitter junction (Ibe) is amplified by the current gain or beta (B) of the device in allowing current to flow through the collector-base junction and on through the emitter (Ice).
In a sense, a bipolar transistor consists of two back-to-back diodes: the base-collector junction (operated in reverse bias) and the base-emitter junction (operated in forward bias) which influence each other. Current through the base-emitter junction releases either electrons or "holes" which are then drawn to the collector junction by the higher potential there, thus increasing collector current. The current ratio between the base input and the collector output is amplification.
Field-Effect Transistors (FET's, as in MOSFET, etc.) have an extremely high input impedence, taking essentially no input current, and may be more easily fabricated in integrated circuits than bipolars. In an FET, Drain (d) and Source (s) contacts connect to a "doped" semiconductor channel. Extremely close to that channel, but still insulated from it, is a conductive area connected to a Gate (g) contact. Voltage on the gate creates an electrostatic field which interacts with current flowing in the drain-source channel, and can act to turn that current ON or OFF, depending on channel material (P or N), doping (enhancement or depletion), and gate polarity. Sometimes the drain and source terminals are interchangeable, and sometimes the source is connected to the substrate. Instead of an insulated gate, we can also have a reverse-biased diode junction, as in a JFET.
N-channel FET's generally work better than p-channel devices. JFET's can only have "depletion mode," which means that, with the gate grounded to the source, they are ON. N-channel JFET devices go OFF with a negative voltage on the gate. Normally, MOSFET devices are "enhancement mode" and are OFF with their gate grounded. N-channel MOSFET devices go ON with a positive voltage (0.5 to 5v) on the gate. Depletion mode n-channel MOSFET devices are possible, but not common.
In a true security sense, it is impossible to fully trust anyone: Everyone has their weaknesses, their oversights, their own agendas. But normally "trust" involves some form of commitment by the other party to keep any secrets that occur. Normally the other party is constrained in some way, either by their own self-interest, or by contractual, legal, or other consequences of the failure of trust. The idea that there can be any realistic trust between two people who have never met, are not related, have no close friends in common, are not in the same employ, and are not contractually bound, can be a very dangerous delusion. It is important to recognize that no trust is without limit, and those limits are precisely the commitment of the other party, bolstered by the consequences of betrayal. Trust without consequences is necessarily a very weak trust.
Unary
Given any two random Boolean sequences of the same length, we "expect" to find about half of the bits the same, and about half different. This means that the expected Hamming distance between two sequences is half their length.
With respect to Boolean function nonlinearity, the expected distance is not only what we expect, it is also the best we can possibly do, because each affine Boolean function comes in both complemented and uncomplemented versions. So if more than half the bits differ between a random function and one version, then less than half must differ to the other version. This makes the expected distance the ideal reference point for nonlinearity.
Since the FWT automatically produces the difference between the expected distance and the distance to each possible affine Boolean function (of the given length), I call this the unexpected distance. Each term is positive or negative, depending on which version is more correlated to the given sequence, and the absolute value of this is a measure of linearity. But since we generally want nonlinearity, we typically subtract the unexpected value from half the length of the sequence.
"If a secrecy system with a finite key is used, and N letters of cryptogram intercepted, there will be, for the enemy, a certain set of messages with certain probabilities, that this cryptogram could represent. As N increases the field usually narrows down until eventually there is a unique 'solution' to the cryptogram; one message with probability essentially unity while all others are practically zero. A quantity H(N) is defined, called the equivocation, which measures in a statistical way how near the average cryptogram of N letters is to a unique solution; that is, how uncertain the enemy is of the original message after intercepting a cryptogram of N letters." [p.659]
"This gives a way of calculating approximately how much intercepted material is required to obtain a solution to the secrecy system. It appears
. . . that with ordinary languages and the usual types of ciphers (not codes) this 'unicity distance' is approximately H(K)/D. Here H(K) is a number measuring the 'size' of the key space. If all keys are a priori equally likely, H(K) is the logarithm of the number of possible keys. D is the redundancy of thelanguage . . . ." "In simple substitution with a random key H(K) islog10 26! or about 20 and D (in decimal digits per letter) is about .7 for English. Thus unicity occurs at about 30 letters." [p.660]
Shannon, C. 1949. Communication Theory of Secrecy Systems. Bell System Technical Journal. 28: 656-715.
A uniform distribution is the most important distribution in cryptography. For example, a cryptographer strives to make every possible plaintext an equally likely interpretation of any ciphertext (see ideal secrecy). A cryptographer also strives to make every possible key equally likely, given any amount of known plaintext.
On the other hand, a uniform distribution with respect to one quality is not necessarily uniform with respect to another. For example, while keyed shuffling can provably produce any possible permutation with equal probability (a uniform distribution of different tables), those tables will have a Boolean function nonlinearity distribution which is decidedly not uniform. And we might well expect a different non-uniform distribution for every different quality we measure.
Variable Size Block Cipher
A block cipher which supports ciphering in blocks of dynamically variable size. The block size may vary only in steps of some element size (for example, a byte), but blocks could be arbitrarily large.
Three characteristics distinguish a true variable size block cipher from designs which are merely imprecise about the size of block or element they support or the degree to which they support overall diffusion:
Also see Dynamic Substitution Combiner and Balanced Block Mixing.
Walsh Functions
Also see: Fast Walsh-Hadamard Transform.
White noise is normally described as a relative power density in volts squared per hertz. White noise power varies directly with bandwidth, so white noise would have twice as much power in the next higher octave as in the current one. The introduction of a white noise audio signal can destroy high-frequency loudspeakers.
XOR