TUCoPS :: Crypto :: glossary.htm

Ritter's Crypto Glossary
Ritter's Crypto Glossary and Dictionary of Technical Cryptography

Ritter's Crypto Glossary and
Dictionary of Technical Cryptography

Technical Cryptographic Terms Explained

Hyperlinked definitions and discussions of many cryptographic, mathematic, logic, statistics, and electronics terms used in cipher construction and analysis.

A Ciphers By Ritter Page

Terry Ritter

Current Edition: 1999 Jan 19

For a basic introduction to cryptography, see Learning About Cryptography. Please feel free to send comments and suggestions for improvement to: ritter@io.com. You may wish to help support this work by patronizing Ritter's Crypto Bookshop.


Absolute, AC, Additive Combiner, Additive RNG, Affine, Affine Boolean Function, Alphabet, Alternative Hypothesis, Amplifier, Amplitude, Analog, AND, ASCII, Associative, Asymmetric Cipher, Attack, Augmented Repetitions, Authentication, Authenticating Block Cipher, Autokey, Avalanche, Avalanche Effect
Back Door, Balance, Balanced Block Mixer, Balanced Block Mixing, Balanced Combiner, Base-64, Bel, Bent Function, Bernoulli Trials, Bijective, Binary, Binomial Distribution, Birthday Attack, Birthday Paradox, Bit, Block, Block Cipher, Block Size, Boolean, Boolean Function, Boolean Function Nonlinearity, Boolean Logic, Boolean Mapping, Break, Brute Force Attack, Bug, Byte
Capacitor, CBC, c.d.f., CFB, Chain, Chaos, Chi-Square, Cipher, Cipher Taxonomy, Ciphering, Ciphertext, Ciphertext Expansion, Ciphony, Circuit, Clock, Closed, Code, Codebook, Codebook Attack, Combination, Combinatoric, Combiner, Commutative, Complete, Component, Computer, Conductor, Confusion, Confusion Sequence, Congruence, Contextual, Conventional Cipher, Convolution, Correlation, Correlation Coefficient, CRC, Cryptanalysis, Cryptanalyst, Cryptographer, Cryptographic Mechanism, Cryptography, Cryptography War, Cryptology, Current
dB, DC, Debug, Decipher, Decryption, Deductive Reasoning, Defined Plaintext Attack, Degrees of Freedom, DES, Decibel, Decimal, Design Strength, Deterministic, Dictionary Attack, Differential Cryptanalysis, Diffusion, Digital, Diode, Distribution, Distributive, Divide and Conquer, Domain, Dyadic, Dynamic Keying, Dynamic Substitution Combiner, Dynamic Transposition
ECB, Electric Field, Electromagnetic Field, Electronic, Encipher, Encryption, Entropy, Ergodic, Extractor, Exclusive-OR
Factorial, Fallacy, Fast Walsh Transform, FCSR, Feistel Construction, Fenced DES, Fencing, Fencing Layer, FFT, Field, Finite Field, Flip-Flop, Fourier Series, Fourier Theorem, Fourier Transform, Frequency, Function, FWT
Gain, Galois Field, Gate, GF 2n, Goodness of Fit, Group
Hamming Distance, Hardware, Hash, Hexadecimal (Hex), Homophonic, Homophonic Substitution
IDEA, Ideal Secrecy, i.i.d., Inductive Reasoning, Inductor, Injective, Insulator, Integer, Intermediate Block, Interval, Into, Inverse, Invertible, Involution, Irreducible, IV
KB, Kb, Kerckhoff's Requirements, Key, Key Distribution Problem, Keyspace, Keyed Substitution, Known Plaintext Attack, Kolmogorov-Smirnov
Latency, Latin Square, Latin Square Combiner, Layer, LFSR, Linear, Linear Complexity, Linear Feedback Shift Register, Linear Logic Function, Logic, Logic Function, LSB
M-Sequence, Machine Language, Magnetic Field, Man-in-the-Middle Attack, Mapping, Markov Process, Mathematical Cryptography, Maximal Length, MB, Mb, Mechanism, Mechanistic Cryptography, Mersenne Prime, Message Digest, Message Key, MITM, Mixing, Mixing Cipher, Mod 2, Mod 2 Polynomial, Mode, Modulo, Monadic, Monoalphabetic Substitution, Monographic, Multiple Encryption
Nominclator, Nominal, Nonlinearity, NOT, Null Hypothesis
Object Code, Objective, Octal, Octave, OFB, One Time Pad, One-To-One, One Way Diffusion, Onto, Opcode, Operating Mode, Opponent, OR, Order, Ordinal, Orthogonal, Orthogonal Latin Squares, OTP, Overall Diffusion
Padding, Password, Patent, Patent Infringement, Perfect Secrecy, Permutation, PGP, Physically Random, Pink Noise, Plaintext, Poisson Distribution, Polyalphabetic Combiner, Polyalphabetic Substitution, Polygram Substitution, Polygraphic, Polynomial, Polyphonic, Population, Population Estimation, Power, Primitive, Primitive Polynomial, Prime, Prior Art, PRNG, Process, Pseudorandom, Public Key Cipher
Random, Random Number Generator, Random Variable, Range, Really Random, Relay, Research Hypothesis, Resistor, Ring, Root, RMS, Root Mean Square, RNG, Round, RSA, Running Key
Salt, Sample, S-Box, Scalable, Secrecy, Secret Code, Secret Key Cipher, Security, Security Through Obscurity, Semiconductor, Semigroup, Session Key, Set, Shift Register, Shuffle, Sieve of Eratosthenes, Significance, Simple Substitution, Software, Source Code, State, Stationary Process, Statistic, Statistics, Steganography, Stochastic, Stream Cipher, Strength, Strict Avalanche Criterion (SAC), Subjective, Substitution, Substitution-Permutation, Substitution Table, Superencryption, Surjective, Switch, Switching Function, Symmetric Cipher, Symmetric Group, System, System Design
Table Selection Combiner, TEMPEST, Transformer, Transistor, Transposition, Trap Door, Triple DES, Truly Random, Trust, Truth Table, Type I Error, Type II Error
Unary, Unexpected Distance, Unicity Distance, Uniform Distribution
Variable Size Block Cipher, Voltage
Walsh Functions, Weight, Whitening White Noise Wire

In the study of logic, something observed similarly by most observers, or something agreed upon, or which has the same value each time measured. Something not in dispute, unarguable, and independent of other state. As opposed to contextual.

Alternating Current: Electrical power which repeatedly reverses direction of flow. As opposed to DC.

Generally used for power distribution because the changing current supports the use of transformers. Utilities can thus transport power at high voltage and low current, which minimize "ohmic" or I2R losses. The high voltages are then reduced at power substations and again by pole transformers for delivery to the consumer.

Additive Combiner
An additive combiner uses numerical concepts similar to addition to mix multiple values into a single result.

One example is byte addition modulo 256, which simply adds two byte values, each in the range 0..255, and produces the remainder after division by 256, again a value in the byte range of 0..255. Subtraction is also an "additive" combiner.

Another example is bit-level exclusive-OR which is addition mod 2. A byte-level exclusive-OR is a polynomial addition.

Additive RNG
(Additive random number generator.) A LFSR-based RNG typically using multi-bit elements and integer addition (instead of XOR) combining. References include:
Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 26-31. Addison-Wesley: Reading, Massachusetts.
Marsaglia, G. and L. Tsay. 1985. Matrices and the Structure of Random Number Sequences. Linear Algebra and its Applications. 67: 147-156.

Advantages include:

  • A long, mathematically proven cycle length.
  • Especially efficient software implementations.
  • Almost arbitrary initialization (some element must have its least significant bit set).
  • A simple design which is easy to get right.

In addition, a vast multiplicity of independent cycles has the potential of confusing even a "quantum computer," should such a thing become possible.

   For Degree-n Primitive, and Bit Width w

   Total States:       2nw

   Non-Init States:    2n(w-1)

   Number of Cycles:   2(n-1)(w-1)

   Length Each Cycle:  (2n-1)2(w-1)

   Period of LSB:      2n-1

The binary addition of two bits with no carry input is just XOR, so the lsb of an Additive RNG has the usual maximal length period.

A degree-127 Additive RNG using 127 elements of 32 bits each has 24064 unique states. Of these, 23937 are disallowed by initialization (the lsb's are all "0") but this is just one unusable state out of 2127. There are still 23906 cycles which each have almost 2158 steps. (The Cloak2 stream cipher uses an Additive RNG with 9689 elements of 32 bits, and so has 2310048 unique states. These are mainly distributed among 2300328 different cycles with almost 29720 steps each.)

Note that any LFSR, including the Additive RNG, is very weak when used alone. But when steps are taken to hide the sequence (such as using a jitterizer and Dynamic Substitution combining) the result can have significant strength.

Generally speaking, linear. Sometimes affine generalizes "linearity" to expressions of multiple independent variables, with only a single-variable expression being called "linear." From analytic and algebraic geometry.

Assume the flat plane defined by two arbitrary unit vectors e1, e2 and a common origin O; this is a coordinate "frame." Assume a grid of lines parallel to each frame vector, separated by unit lengths (a "metric" which may differ for each vector). If the vectors happen to be perpendicular, we have a Cartesian coordinate system, but in any case we can locate any point on the plane by its position on the grid.

An affine transformation can change the origin, the angle between the vectors, and unit vector lengths. Shapes in the original frame thus become "pinched," "squashed" or "stretched" images under the affine transformation. This same sort of thing generalizes to higher degree expressions.

The Handbook of Mathematics says that if e1, e2, e3 are linearly independent vectors, any vector a can be expressed uniquely in the form a = a1e1 + a2e2 + a3e3 where the ai are the affine coordinates. (p.518)

The VNR Concise Encyclopedia of Mathematics says "All transformations that lead to a uniquely soluble system of linear equations are called affine transformations." (p.534)

Affine Boolean Function
A Boolean function which can be represented in the form:
anxn + an-1xn-1 + ... + a1x1 + a0
where the operations are mod 2: addition is Exclusive-OR, and multiplication is AND.

Note that all of the variables xi are to the first power only, and each coefficient ai simply enables or disables its associated variable. The result is a single Boolean value, but the constant term a0 can produce either possible output polarity.

Here are all possible 3-variable affine Boolean functions (each of which may be inverted by complementing the constant term):

     affine    truth table

          c    0  0  0  0  0  0  0  0

         x0    0  1  0  1  0  1  0  1

      x1       0  0  1  1  0  0  1  1

      x1+x0    0  1  1  0  0  1  1  0

   x2          0  0  0  0  1  1  1  1

   x2+   x0    0  1  0  1  1  0  1  0

   x2+x1       0  0  1  1  1  1  0  0

   x2+x1+x0    0  1  1  0  1  0  0  1

The set of symbols under discussion.

Alternative Hypothesis
In statistics, the statement formulated so that the logically contrary statement, the null hypothesis H0 has a test statistic with a known distribution for the case when there is nothing unusual to detect. Also called the research hypothesis H1, and logically identical to "NOT-H0" or "H0 is not true."

a component or device intended to sense a signal and produce a larger version of that signal. In general, any amplifying device is limited by available power, frequency response, and device maximums for voltage, current, and power dissipation.

Transistors are analog amplifiers which are basically linear over a reasonable range and so require DC power. In contrast, Relays are classically mechanical devices with direct metal-to-metal moving connections, and so can handle generally higher power and AC current.

The signal level, or height.

Pertaining to continuous values. As opposed to digital or discrete quantities.

A Boolean logic function which is also mod 2 multiplication.

A public code for converting between 7-bit values 0..127 (or 00..7f hex) and text characters. ASCII is an acronym for American Standard Code for Information Interchange.


  0  00  ^@  NUL     32  20  SPC     64  40  @      96  60  '

  1  01  ^A  SOH     33  21   !      65  41  A      97  61  a

  2  02  ^B  STX     34  22   "      66  42  B      98  62  b

  3  03  ^C  ETX     35  23   #      67  43  C      99  63  c

  4  04  ^D  EOT     36  24   $      68  44  D     100  64  d

  5  05  ^E  ENQ     37  25   %      69  45  E     101  65  e

  6  06  ^F  ACK     38  26   &      70  46  F     102  66  f

  7  07  ^G  BEL     39  27   '      71  47  G     103  67  g

  8  08  ^H  BS      40  28   (      72  48  H     104  68  h

  9  09  ^I  HT      41  29   )      73  49  I     105  69  i

 10  0a  ^J  LF      42  2a   *      74  4a  J     106  6a  j

 11  0b  ^K  VT      43  2b   +      75  4b  K     107  6b  k

 12  0c  ^L  FF      44  2c   ,      76  4c  L     108  6c  l

 13  0d  ^M  CR      45  2d   -      77  4d  M     109  6d  m

 14  0e  ^N  SO      46  2e   .      78  4e  N     110  6e  n

 15  0f  ^O  SI      47  2f   /      79  4f  O     111  6f  o

 16  10  ^P  DLE     48  30   0      80  50  P     112  70  p

 17  11  ^Q  DC1     49  31   1      81  51  Q     113  71  q

 18  12  ^R  DC2     50  32   2      82  52  R     114  72  r

 19  13  ^S  DC3     51  33   3      83  53  S     115  73  s

 20  14  ^T  DC4     52  34   4      84  54  T     116  74  t

 21  15  ^U  NAK     53  35   5      85  55  U     117  75  u

 22  16  ^V  SYN     54  36   6      86  56  V     118  76  v

 23  17  ^W  ETB     55  37   7      87  57  W     119  77  w

 24  18  ^X  CAN     56  38   8      88  58  X     120  78  x

 25  19  ^Y  EM      57  39   9      89  59  Y     121  79  y

 26  1a  ^Z  SUB     58  3a   :      90  5a  Z     122  7a  z

 27  1b  ^[  ESC     59  3b   ;      91  5b  [     123  7b  {

 28  1c  ^\  FS      60  3c   <      92  5c  \     124  7c  |

 29  1d  ^]  GS      61  3d   =      93  5d  ]     125  7d  }

 30  1e  ^^  RS      62  3e   >      94  5e  ^     126  7e

 31  1f  ^_  US      63  3f   ?      95  5f  _     127  7f  DEL

A dyadic operation in which two sequential operations on three arguments can first operate on either the first two or the last two arguments, producing the same result in either case: (a + b) + c = a + (b + c).

Also see: commutative and distributive.

Asymmetric Cipher
A public key cipher.

General ways in which a cryptanalyst may try to "break" or penetrate the secrecy of a cipher. These are not algorithms; they are just approaches as a starting place for constructing specific algorithms.

Classically, attacks were neither named nor classified; there was just: "here is a cipher, and here is the attack." And while this gradually developed into named attacks, there is no overall attack taxonomy. Currently, attacks are often classified by the information available to the attacker or constraints on the attack, and then by strategies which use the available information. Not only ciphers, but also cryptographic hash functions can be attacked, generally with very different strategies.

Informational Constraints

We are to attack a cipher which enciphers plaintext into ciphertext or deciphers the opposite way, under control of a key. The available information necessarily constrains our attack strategies.

  • Ciphertext Only: We have only ciphertext to work with. Sometimes the statistics of the ciphertext provide insight and can lead to a break.
  • Known Plaintext: We have some, or even an extremely large amount, of plaintext and the associated ciphertext.
  • Defined Plaintext: We can submit arbitrary messages to be ciphered and capture the resulting ciphertext. (Also Chosen Plaintext and Adaptive Chosen Plaintext.)
  • Defined Ciphertext: We can submit arbitrary messages to be deciphered and see the resulting plaintext. (Also Chosen Ciphertext and Adaptive Chosen Ciphertext.)
  • Chosen Key: We can specify a change in any particular key bit, or some other relationship between keys.
  • Timing: We can measure the duration of ciphering operations and use that to reveal the key or data.
  • Fault Analysis: We can induce random faults into the ciphering machinery, and use those to expose the key.
  • Man-in-the-Middle: We can subvert the routing capabilities of a computer network, and pose as the other side to each of the communicators. (Usually a key authentication attack on public key systems.)

Attack Strategies

The goal of an attack is to reveal some unknown plaintext, or the key (which will reveal the plaintext). An attack which succeeds with less effort than a brute-force search we call a break. An "academic" ("theoretical," "certificational") break may involve impractically large amounts of data or resources, yet still be called a "break" if the attack would be easier than brute force. (It is thus possible for a "broken" cipher to be much stronger than a cipher with a short key.) Sometimes the attack strategy is thought to be obvious, given a particular informational constraint, and is not further classified.

  • Brute Force (also Exhaustive Key Search): Try to decipher ciphertext under every possible key until readable messages are produced. (Also "brute force" any searchable-size part of a cipher.)
  • Codebook (the classic "codebreaking" approach): Collect a codebook of transformations between plaintext and ciphertext.
  • Differential Cryptanalysis: Find a statistical correlation between key values and cipher transformations (typically the Exclusive-OR of text pairs), then use sufficient defined plaintext to develop the key.
  • Linear Cryptanalysis: Find a linear approximation to the keyed S-boxes in a cipher, and use that to reveal the key.
  • Meet-in-the-Middle: Given a two-level multiple encryption, search for the keys by collecting every possible result for enciphering a known plaintext under the first cipher, and deciphering the known ciphertext under the second cipher; then find the match.
  • Key Schedule: Choose keys which produce known effects in different rounds.
  • Birthday (usually a hash attack): Use the birthday paradox, the idea that it is much easier to find two values which match than it is to find a match to some particular value.
  • Formal Coding (also Algebraic): From the cipher design, develop equations for the key in terms of known plaintext, then solve those equations.
  • Correlation: In a stream cipher, distinguish between data and confusion, or between different confusion streams, from a statistical imbalance in a combiner.
  • Dictionary: Form a list of the most-likely keys, then try those keys one-by-one (a way to improve brute force).
  • Replay: Record and save some ciphertext blocks or messages (especially if the content is known), then re-send those blocks when useful.

Many attacks try to isolate unknown small components or aspects so they can be solved separately, a process known as divide and conquer. Also see: security.

Augmented Repetitions
When sampling with replacement, eventually we again find some object or value which has been found before. We call such an occurrence a "repetition." A value found exactly twice is a double, or "2-rep"; a value found three times is a triple or "3-rep," and so on.

For a known population, the number of repetitions expected at each level has long been understood to be a binomial expression. But if we are sampling in an attempt to establish the effective size of an unknown population, we have two problems:

  1. The binomial equations which predict expected repetitions do not reverse well to predict population, and
  2. Exact repetitions discard information and so are less accurate than we would like. For example, if we have a double and then find another of that value, we now have a triple, and one less double. So if we are using doubles to predict population, the occurrence of a triple influences the predicted population in exactly the wrong direction.

Fortunately, there is an unexpected and apparently previously unknown combinatoric relationship between the population and the number of combinations of occurrences of repeated values. This allows us to convert any number of triples and higher n-reps to the number of 2-reps which have the same probability. So if we have a double, and then get another of the same value, we have a triple, which we can convert into three 2-reps. The total number of 2-reps from all repetitions (the augmented 2-reps value) is then used to predict population.

We can relate the number of samples s to the population N through the expected number of augmented doubles Ead:

     Ead(N,s) = s(s-1) / 2N .

This equation is exact, provided we interpret all the exact n-reps in terms of 2-reps. For example, a triple is interpreted as three doubles; the augmentation from 3-reps to 2-reps is (3 C 2) or 3. The augmented result is the sum of the contributions from all higher repetition levels:

           n    i

     ad = SUM  ( ) r[i] .

          i=2   2

where ad is the number of augmented doubles, and r[i] is the exact repetition count at the i-th level.

And this leads to an equation for predicting population:

     Nad(s,ad) = s(s-1) / 2 ad .

This predicts the population Nad as based on a mean value of augmented doubles ad. Clearly, we expect the number of samples to be far larger than the number of augmented doubles, but an error in the augmented doubles ad should produce a proportionally similar error in the predicted population Nad. We typically develop ad to high precision by averaging the results of many large trials.

However, since the trials should have approximately a simple Poisson distribution (which has only a single parameter), we could be a bit more clever and fit the results to the expected distribution, thus perhaps developing a bit more accuracy.

Also see the article: Estimating Population from Repetitions in Accumulated Random Samples, and the Population Estimation Worksheets in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.

One of the objectives of cryptography: Assurance that a message has not been modified in transit or storage (message authentication or message integrity). Also key authentication for public keys. Also user or source identification, which may verify the right to send the message in the first place.

Message Authentication

One form of message authentication computes a CRC hash across the plaintext data, and appends the CRC remainder (or result) to the plaintext data: this adds a computed redundancy to an arbitrary message. The CRC result is then enciphered along with the data. When the message is deciphered, if a second CRC operation produces the same result, the message can be assumed unchanged.

Note that a CRC is a fast, linear hash. Messages with particular CRC result values can be constructed rather easily. However, if the CRC is hidden behind strong ciphering, an Opponent is unlikely to be able to change the CRC value systematically or effectively. In particular, this means that the CRC value will need more protection than a simple exclusive-OR stream cipher or the exclusive-OR approach to handling short last blocks in a block cipher.

A similar approach to message authentication uses a nonlinear cryptographic hash function. These also add a computed redundancy to the message, but generally require significantly more computation than a CRC. It is thought to be exceedingly difficult to construct messages with a particular cryptographic hash result, so the hash result perhaps need not be hidden by encryption.

One form of cryptographic hash is DES CBC mode: using a key different than that used for encryption, the final block of ciphertext is the hash of the message. This obviously doubles the computation when both encryption and authentication are needed. And since any cryptographic hash is vulnerable to birthday attacks, the small 64-bit block size implies that we should be able to find two different messages with the same hash value by constructing and hashing "only" about 232 different messages.

Another approach to message authentication is to use an authenticating block cipher; this is often a block cipher which has a large block, with some "extra data" inserted in an "authentication field" as part of the plaintext before enciphering each block. The "extra data" can be some transformation of the key, the plaintext, and/or a sequence number. This essentially creates a homophonic block cipher: If we know the key, many different ciphertexts will produce the same plaintext field, but only one of those will have the correct authentication field.

The usual approach to authentication in a public key cipher is to encipher with the private key. The resulting ciphertext can then be deciphered by the public key, which anyone can know. Since even the wrong key will produce a "deciphered" result, it is also necessary to identify the resulting plaintext as a valid message; in general this will also require redundancy in the form of a hash value in the plaintext. The process provides no secrecy, but only a person with access to the private key could have enciphered the message.

User Authentication

The classical approach to user authentication is a password; this is "something you know." One can also make use of "something you have" (such as a secure ID card), or "something you are" (biometrics).

The classic problem with passwords is that they must be remembered by ordinary people, and so carry a limited amount of uniqueness. Easy-to-remember passwords are often common language phrases, and so often fall to a dictionary attack. More modern approaches involve using a Diffie-Hellman key exchange, plus the password, thus minimizing exposure to a dictionary attack. This does require a program on the user end, however.

Key Authentication

In secret key ciphers, key authentication is inherent in secure key distribution.

In public key ciphers, public keys are exposed and often delivered insecurely. But someone who uses the wrong key may unknowingly have "secure" communications with an Opponent, as in a man-in-the-middle attack. It is thus absolutely crucial that public keys be authenticated or certified as a separate process. Normally this implies the need for a Certification Authority or CA.

Authenticating Block Cipher
A block cipher mechanism which inherently contains an authentication value or field.

A cipher whose key is produced by message data. One common form is "ciphertext feedback," where ciphertext is "fed back" into the state of the random number generator used to produce the confusion sequence for a stream cipher.

The observed property of a block cipher constructed in layers or "rounds" with respect to a tiny change in the input. The change of a single input bit generally produces multiple bit-changes after one round, many more bit-changes after another round, until, eventually, about half of the block will change. An analogy is drawn to an avalanche in snow, where a small initial effect can lead to a dramatic result. As originally described by Feistel:
"As the input moves through successive layers the pattern of 1's generated is amplified and results in an unpredictable avalanche. In the end the final output will have, on average, half 0's and half 1's . . . ." [p.22]

Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 15-23.

Also see mixing, diffusion, overall diffusion, strict avalanche criterion, complete, S-box, and the bit changes section of the Ciphers By Ritter / JavaScript computation pages.

Avalanche Effect
The result of avalanche. As described by Webster and Tavares:
"For a given transformation to exhibit the avalanche effect, an average of one half of the output bits should change whenever a single input bit is complemented." [p.523]

Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85. 523-534.

Also see the bit changes section of the Ciphers By Ritter / JavaScript computation pages.

Back Door

A cipher design fault, planned or accidental, which allows the apparent strength of the design to be easily avoided by those who know the trick. When the design background of a cipher is kept secret, a back door is often suspected. Similar to trap door.

A term used in S-box and Boolean function analysis. As described by Lloyd:
"A function is balanced if, when all input vectors are equally likely, then all output vectors are equally likely."

Lloyd, S. 1990. Properties of binary functions. Advances in Cryptology -- EUROCRYPT '90. 124-139.

There is some desire to generalize this definition to describe multiple-input functions. (Is a function "balanced" if, for one value on the first input, all output values can be produced, but for another value on the first input, only some output values are possible?) Presumably a two-input balanced function would be balanced for either input fixed at any value, which would essentially be a Latin square or a Latin square combiner.

Balanced Block Mixer
A process or any implementation (for example, hardware, computer software, hybrids, or the like) for performing Balanced Block Mixing.

Balanced Block Mixing
The block mixing mechanism described in U.S. Patent 5,623,549 (see the BBM articles on the Ciphers By Ritter page).

A Balanced Block Mixer is an m-input-port m-output-port mechanism with various properties:

  1. The overall mapping is one-to-one and invertible: Every possible input value (over all ports) to the mixer produces a different output value (including all ports), and every possible output value is produced by a different input value;

  2. Each output port is a function of every input port;

  3. Any change to any one of the input ports will produce a change to every output port;

  4. Stepping any one input port through all possible values (while keeping the other input ports fixed) will step every output port through all possible values.

If we have a two port mixer, with input ports labeled A and B, output ports labeled X and Y, and some irreducible mod 2 polynomial p of degree appropriate to the port size, a Balanced Block Mixer is formed by the equations:

X = 3A + 2B (mod 2)(mod p),
Y = 2A + 3B (mod 2)(mod p).

This particular BBM is a self-inverse or involution, and so can be used without change whether enciphering or deciphering. One possible value for p for mixing 8-bit values is 100011011.

Balanced Block Mixing functions probably should be thought of as orthogonal Latin squares. For example, here is a tiny nonlinear "2-bit" BBM:

   3 1 2 0   0 3 2 1       30  13  22  01

   0 2 1 3   2 1 0 3   =   02  21  10  33

   1 3 0 2   1 2 3 0       11  32  03  20

   2 0 3 1   3 0 1 2       23  00  31  12

Suppose we wish to mix (1,3); 1 selects the second row up in both squares, and 3 selects the rightmost column, thus selecting (2,0) as the output. Since there is only one occurrence of (2,0) among all entry pairs, this discrete mixing function is reversible, as well as being balanced on both inputs.

Cryptographic advantages of balanced block mixing include the fact that each output is always balanced with respect to either input, and that no information is lost in the mixing. This allows us to use balanced block mixing as the "butterfly" operations in a fast Walsh-Hadamard transform or the well-known FFT. By using the mixing patterns of these transforms, we can mix 2n elements such that each input is guaranteed to affect each and every output in a balanced way. And if we use keying to generate the tables, we can have a way to mix huge blocks in small nonlinear mixing tables with overall mixing guarantees.

Also see Mixing Cipher, Dynamic Substitution Combiner, Variable Size Block Cipher, and the Active Balanced Block Mixing in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.

Balanced Combiner
In the context of cryptography, a combiner mixes two input values into a result value. A balanced combiner must provide a balanced relationship between each input and the result.

In a statically-balanced combiner, any particular result value can be produced by any value on one input, simply by selecting some appropriate value for the other input. In this way, knowledge of only the output value provides no information -- not even statistical information -- about either input.

The common examples of cryptographic combiner, including byte exclusive-OR (mod 2 polynomial addition), byte addition (integer addition mod 256), or other "additive" combining, are perfectly balanced. Unfortunately, these simple combiners are also very weak, being inherently linear and without internal state.

A Latin square combiner is an example of a statically-balanced reversible nonlinear combiner with massive internal state. A Dynamic Substitution Combiner is an example of a dynamically or statistically-balanced reversible nonlinear combiner with substantial internal state.

A public code for converting between 6-bit values 0..63 (or 00..3f hex) and text symbols accepted by most computers:

        0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f

   0    A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P

   1    Q  R  S  T  U  V  W  X  Y  Z  a  b  c  d  e  f

   2    g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v

   3    w  x  y  z  0  1  2  3  4  5  6  7  8  9  +  /

   use "=" for padding

The base-10 logarithm of the ratio of two power values (which is also the same as the difference between the log of each power value). The basis for the more-common term decibel: One bel equals 10 decibels.

Bent Function
A bent function is a Boolean function whose fast Walsh transform has the same absolute value in each term (except, possibly, the zeroth). This means that the bent function has the same distance from every possible affine Boolean function.

We can do FWT's in "the bottom panel" at the end of Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.

Here is every bent sequence of length 4, first in {0,1} notation, then in {1,-1} notation, with their FWT results:

   bent {0,1}       FWT           bent {1,-1}       FWT

   0  0  0  1    1 -1 -1  1        1  1  1 -1    2  2  2 -2

   0  0  1  0    1  1 -1 -1        1  1 -1  1    2 -2  2  2

   0  1  0  0    1 -1  1 -1        1 -1  1  1    2  2 -2  2

   1  0  0  0    1  1  1  1       -1  1  1  1    2 -2 -2 -2

   1  1  1  0    3  1  1 -1       -1 -1 -1  1   -2 -2 -2  2

   1  1  0  1    3 -1  1  1       -1 -1  1 -1   -2  2 -2  2

   1  0  1  1    3  1 -1  1       -1  1 -1 -1   -2 -2  2 -2

   0  1  1  1    3 -1 -1 -1        1 -1 -1 -1   -2  2  2  2

These sequences, like all true bent sequences, are not balanced, and the zeroth element of the {0,1} FWT is the number of 1's in the sequence.

Here are some bent sequences of length 16:

   bent {0,1}    0 1 0 0   0 1 0 0   1 1 0 1   0 0 1 0

    FWT          6,-2,2,-2,2,-2,2,2,-2,-2,2,-2,-2,2,-2,-2

   bent {1,-1}   1 -1 1 1   1 -1 1 1   -1 -1 1 -1   1 1 -1 1

    FWT          4,4,-4,4,-4,4,-4,-4,4,4,-4,4,4,-4,4,4

   bent {0,1}    0 0 1 0   0 1 0 0   1 0 0 0   1 1 1 0

    FWT          6,2,2,-2,-2,2,-2,2,-2,-2,-2,-2,2,2,-2,-2

   bent {1,-1}   1 1 -1 1   1 -1 1 1   -1 1 1 1   -1 -1 -1 1

    FWT          4,-4,-4,4,4,-4,4,-4,4,4,4,4,-4,-4,4,4

Bent sequences are said to have the highest possible uniform nonlinearity. But, to put this in perspective, recall that we expect a random sequence of 16 bits to have 8 bits different from any particular sequence, linear or otherwise. That is also the maximum possible nonlinearity, and here we actually get a nonlinearity of 6.

There are various more or less complex constructions for these sequences. In most cryptographic uses, bent sequences are modified slightly to achieve balance.

Bernoulli Trials
In statistics, observations or sampling with replacement which has exactly two possible outcomes, typically called "success" and "failure." Bernoulli trials have these characteristics:
  • Each trial is independent,
  • Each outcome is determined only by chance, and
  • The probability of success is fixed.

Bernoulli trials have a Binomial distribution.

A mapping f: X -> Y which is both one-to-one and onto. For each unique x in X there is corresponding unique y in Y. An invertible mapping function.

From the Latin for "dual" or "pair." Dominantly used to indicate "base 2": The numerical representation in which each digit has an alphabet of only two symbols: 0 and 1. This is just one particular coding or representation of a value which might otherwise be represented (with the exact same value) as octal (base 8), decimal (base 10), or hexadecimal (base 16). Also see bit and Boolean.

Possibly also the confusing counterpart to unary when describing the number of inputs or arguments to a function, but dyadic is almost certainly a better choice.

Binomial Distribution
In statistics, the probability of finding exactly k successes in n independent Bernoulli trials, when each trial has success probability p:

               n    k      n-k

   P(k,n,p) = ( )  p  (1-p)


This ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.

If we have an experiment which we think should produce a binomial distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact binomial.

Also see the binomial section of the Ciphers By Ritter / JavaScript computation pages.

Birthday Attack
A form of attack in which it is necessary to obtain two identical values from a large population. The "birthday" part is the realization that it is far easier to find an arbitrary matching pair than to match any particular value. Often a hash attack.

Also see: birthday paradox.

Birthday Paradox
The apparent paradox that, in a schoolroom of only 23 students, there is a 50 percent probability that at least two will have the same birthday. The "paradox" is that we have an even chance of success with at most 23 different days represented.

The "paradox" is resolved by noting that we have a 1/365 chance of success for each possible pairing of students, and there are 253 possible pairs or combinations of 23 things taken 2 at a time. (To count the number of pairs, we can choose any of the 23 students as part of the pair, then any of the 22 remaining students as the other part. But this counts each pair twice, so we have 23 * 22 / 2 = 253 different pairs.)

We can compute the overall probability of success from the probability of failure (1 - 1/365 = 0.99726) multiplied by itself for each pair. The overall probability of failure is thus 0.99726253 (0.99726 to the 253rd power) or 0.4995. So the success probability for 253 pairs is 0.5005.

We can relate the probability of finding at least one "double" of some birthday (Pd) to the expected number of doubles (Ed) as:

   Pd = 1 - e-Ed  ,


   Ed = -Ln( 1 - Pd )


   365 * -Ln( 0.5 ) = 365 * 0.693 = 253  .

Also see: Estimating Population from Repetitions in Accumulated Random Samples, my "birthday" article.

A contraction of "binary digit." The smallest possible unit of information. A Boolean value: True or False; Yes or No; one or zero; Set or Cleared. Virtually all information to be communicated or stored digitally is coded in some way which fundamentally relies on individual bits. Alphabetic characters are often stored in eight bits, which is a byte.

Some amount of data treated as a single unit. For example, the DES block cipher has a 64-bit block. So DES ciphers 64 bits (8 bytes or typically 8 ASCII characters) at once.

A 64-bit block supports 264 or about 1.8 x 1019 block values or code values. Each different permutation of those values can be considered a complete code. A block cipher has the ability to select from among many such codes using a key.

It is not normally possible to block-cipher just a single bit or a single byte of a block. An arbitrary stream of data can always be partitioned into one or more fixed-size blocks, but it is likely that at least one block will not be completely filled. Using fixed-size blocks generally means that the associated system must support data expansion in enciphering, if only by one block. Handling even minimal data expansion may be difficult in some systems.

Block Cipher
A cipher which requires the accumulation of data (in a block) before ciphering can complete. Other than simple transposition ciphers, this seems to be the province of ciphers designed to emulate a keyed simple substitution with a table of size far too large to realize. A block cipher operates on a block of data (for example, multiple bytes) in a single ciphering, as opposed to a stream cipher, which operates on bytes or bits as they occur. Block ciphers can be called "codebook-style" ciphers. Also see Variable Size Block Cipher.

A block cipher is a transformation between plaintext block values and ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Within a particular block size, both plaintext and ciphertext have the same set of possible values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, and this is a permutation of the plaintext values. A block cipher is keyed by constructing a particular permutation of ciphertext values.

Block Cipher Data Diffusion

In an ideal block cipher, changing even a single bit of the input block will change all bits of the ciphertext result, each with independent probability 0.5. This means that about half of the bits in the output will change for any different input block, even for differences of just one bit. This is overall diffusion and is present in a block cipher, but not in a stream cipher. Data diffusion is a simple consequence of the keyed invertible simple substitution nature of the ideal block cipher.

Improper diffusion of data throughout a block cipher can have serious strength implications. One of the functions of data diffusion is to hide the different effects of different internal components. If these effects are not in fact hidden, it may be possible to attack each component separately, and break the whole cipher fairly easily.

Partitioning Messages into Fixed Size Blocks

A large message can be ciphered by partitioning the plaintext into blocks of a size which can be ciphered. This essentially creates a stream meta-cipher which repeatedly uses the same block cipher transformation. Of course, it is also possible to re-key the block cipher for each and every block ciphered, but this is usually expensive in terms of computation and normally unnecessary.

A message of arbitrary size can always be partitioned into some number of whole blocks, with possibly some space remaining in the final block. Since partial blocks cannot be ciphered, some random padding can be introduced to fill out the last block, and this naturally expands the ciphertext. In this case it may also be necessary to introduce some sort of structure which will indicate the number of valid bytes in the last block.

Block Partitioning without Expansion

Proposals for using a block cipher supposedly without data expansion may involve creating a tiny stream cipher for the last block. One scheme is to re-encipher the ciphertext of the preceding block, and use the result as the confusion sequence. Of course, the cipher designer still needs to address the situation of files which are so short that they have no preceding block. Because the one-block version is in fact a stream cipher, we must be very careful to never re-use a confusion sequence. But when we only have one block, there is no prior block to change as a result of the data. In this case, ciphering several very short files could expose those files quickly. Furthermore, it is dangerous to encipher a CRC value in such a block, because exclusive-OR enciphering is transparent to the field of mod 2 polynomials in which the CRC operates. Doing this could allow an Opponent to adjust the message CRC in a known way, thus avoiding authentication exposure.

Another proposal for eliminating data expansion consists of ciphering blocks until the last short block, then re-positioning the ciphering window to end at the last of the data, thus re-ciphering part of the prior block. This is a form of chaining and establishes a sequentiality requirement which requires that the last block be deciphered before the next-to-the-last block. Or we can make enciphering inconvenient and deciphering easy, but one way will be a problem. And this approach cannot handle very short messages: its minimum size is one block. Yet any general-purpose ciphering routine will encounter short messages. Even worse, if we have a short message, we still need to somehow indicate the correct length of the message, and this must expand the message, as we saw before. Thus, overall, this seems a somewhat dubious technique.

On the other hand, it does show a way to chain blocks for authentication in a large-block cipher: We start out by enciphering the data in the first block. Then we position the next ciphering to start inside the ciphertext of the previous block. Of course this would mean that we would have to decipher the message in reverse order, but it would also propagate any ciphertext changes through the end of the message. So if we add an authentication field at the end of the message (a keyed value known on both ends), and that value is recovered upon deciphering (this will be the first block deciphered) we can authenticate the whole message. But we still need to handle the last block padding problem and possibly also the short message problem.

Block Size and Plaintext Randomization

Ciphering raw plaintext data can be dangerous when the cipher has a small block size. Language plaintext has a strong, biased distribution of symbols and ciphering raw plaintext would effectively reduce the number of possible plaintexts blocks. Worse, some plaintexts would be vastly more probable than others, and if some known plaintext were available, the most-frequent blocks might already be known. In this way, small blocks can be vulnerable to classic codebook attacks which build up the ciphertext equivalents for many of the plaintext phrases. This sort of attack confronts a particular block size, and for these attacks Triple-DES is no stronger than simple DES, because they both have the same block size.

The usual way of avoiding these problems is to randomize the plaintext block with an operating mode such as CBC. This can ensure that the plaintext data which is actually ciphered is evenly distributed across all possible block values. However, this also requires an IV which thus expands the ciphertext.

Another approach is to apply data compression to the plaintext before enciphering. If this is to be used instead of plaintext randomization, the designer must be very careful that the data compression does not contain regular features which could be exploited by The Opponents.

An alternate approach is to use blocks of sufficient size for them to be expected to have a substantial amount of uniqueness or "entropy." If we expect plaintext to have about one bit of entropy per byte of text, we might want a block size of at least 64 bytes before we stop worrying about an uneven distribution of plaintext blocks. This is now a practical block size.

TRUE or FALSE; one bit of information.

Boolean Function
A function which produces a Boolean result. The individual output bits of an S-box can each be considered to be separate Boolean functions.

Boolean Function Nonlinearity
The number of bits which must change in the truth table of a Boolean function to reach the closest affine Boolean function. This is the Hamming distance from the closest "linear" function.

Typically computed by using a fast Walsh-Hadamard transform on the Boolean-valued truth table of the function. This produces the unexpected distance to every possible affine Boolean function (of the given length). Scanning those results for the maximum value implies the minimum distance to some particular affine sequence.

Especially useful in S-box analysis, where the nonlinearity for the table is often taken to be the minimum of the nonlinearity values computed for each output bit.

Also see the Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.

Boolean Logic
The logic which applies to variables which have only two possible values. Also the digital hardware devices which realize such logic, and are used to implement a electronic digital computers.

Boolean Mapping
A mapping of some number n Boolean variables into some number m Boolean results. For example, an S-box.

The result of a successful cryptanalytic attack. To destroy the advantage of a cipher in hiding information.

A cipher is "broken" when the information in a message can be extracted without the key, or when the key itself can be recovered. The strength of a cipher can be considered to be the minimum effort required for a break, by any possible attack. A break is particularly significant when the work involved need not be repeated on every message.

The use of the term "break" can be misleading when an impractical amount of work is required to achieve the break. This case might be better described a "theoretical" or "certificational" weakness.

Block Size
The amount of data in a block. For example, the size of the DES block is 64 bits or 8 bytes or 8 octets.

Brute Force Attack
A form of attack in which each possibility is tried until success is obtained. Typically, a ciphertext is deciphered under different keys until plaintext is recognized. On average, this may take about half as many decipherings as there are keys.

Recognizing plaintext may or may not be easy. Even when the key length of a cipher is sufficient to prevent brute force attack, that key will be far too small to produce every possible plaintext from a given ciphertext (see perfect secrecy). Combined with the fact that language is redundant, this means that very few of the decipherings will be words in proper form. Of course, if the plaintext is not language, but is instead computer code, compressed text, or even ciphertext from another cipher, recognizing a correct deciphering can be difficult.

Brute force is the obvious way to attack a cipher, and the way any cipher can be attacked, so ciphers are designed to have a large enough keyspace to make this much too expensive to use in practice. Normally, the design strength of a cipher is based on the cost of a brute-force attack.

Technical slang for "error in design or implementation." An unexpected system flaw. Debugging is a normal part of system development and interactive system design.

A collection of eight bits. Also called an "octet." A byte can represent 256 different values or symbols. The common 7-bit ASCII codes used to represent characters in computer use are generally stored in a byte; that is, one byte per character.


A basic electronic component which acts as a reservoir for electrical power in the form of voltage. A capacitor thus acts to "even out" the voltage across its terminals, and to "conduct" voltage changes from one terminal to the other. A capacitor "blocks" DC and conducts AC in proportion to frequency. Capacitance is measured in Farads: A current of 1 Amp into a capacitance of 1 Farad produces a voltage change of 1 Volt per Second across the capacitor.

Typically, two conductive "plates" or metal foils separated by a thin insulator, such as air, paper, or ceramic. An electron charge on one plate attracts the opposite charge on the other plate, thus "storing" charge. A capacitor can be used to collect a small current over long time, and then release a high current for a short time, as used in a camera strobe or "flash."

Also see inductor and resistor.

CBC or Cipher Block Chaining is an operating mode for block ciphers. CBC mode is essentially a crude meta-stream cipher which streams block transformations.

In CBC mode the ciphertext value of the preceding block is exclusive-OR combined with the plaintext value for the current block. This has the effect of distributing the combined block values evenly among all possible block values, and so prevents codebook attacks.

On the other hand, ciphering the first block generally requires an IV or initial value to start the process. The IV necessarily expands the ciphertext, which may or may not be a problem. And the IV must be dynamically random-like so that statistics cannot be developed on the first block of each message sent under the same key.

In CBC mode, each random-like confusing value is the ciphertext from each previous block. Clearly this ciphertext is exposed to The Opponent, so there would seem to be little benefit associated with hiding the IV, which is just the first of these values. But if The Opponent knows the first sent plaintext, and can intercept and change the message IV, The Opponent can manipulate the first block of received plaintext. Because the IV does not represent a message enciphering, manipulating this value does not also change any previous block.

Accordingly, the IV may be sent enciphered or may be specifically authenticated in some way. Alternately, the complete body of the plaintext message may be authenticated, often by a CRC. The CRC remainder should be block ciphered, perhaps as part of the plaintext.

In statistics, cumulative distribution function. A function which gives the probability of obtaining a particular value or lower.

CFB or Ciphertext FeedBack is an operating mode for a block cipher.

CFB is closely related to OFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. CFB generally forms an autokey stream cipher. CFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.

CFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and the resulting ciphertext is also shifted into the register. The new register value is ciphered, producing another confusion value for use in stream ciphering.

One disadvantage of this, of course, is the need for a full block-wide ciphering operation, typically for each data byte ciphered. The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing.

An operation repeated in a sequence, such that each result depends upon the previous result, or an initial value. One example is the CBC operating mode.


The unexpected ability to find numerical relationships in physical processes formerly considered random. Typically these take the form of iterative applications of fairly simple computations. In a chaotic system, even tiny changes in state eventually lead to major changes in state; this is called "sensitive dependence on initial conditions." It has been argued that every good computational random number generator is "chaotic" in this sense.

In physics, the "state" of an analog physical system cannot be fully measured, which always leaves some remaining uncertainty to be magnified on subsequent steps. And, in many cases, a physical system may be slightly affected by thermal noise and thus continue to accumulate new information into its "state."

In a computer, the state of the digital system is explicit and complete, and there is no uncertainty. No noise is accumulated. All operations are completely deterministic. This means that, in a computer, even a "chaotic" computation is completely predictable and repeatable.

In statistics, a goodness of fit test used for comparing two distributions. Mainly used on nominal and ordinal measurements. Also see: Kolmogorov-Smirnov.

In the usual case, many independent samples are counted by category or separated into value-range "bins." The reference distribution gives us the the number of values to expect in each bin. Then we compute a X2 test statistic related to the difference between the distributions:

   X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )

("SQR" is the squaring function, and we require that each expectation not be zero.) Then we use a tabulation of chi-square statistic values to look up the probability that a particular X2 value or lower (in the c.d.f.) would occur by random sampling if both distributions were the same. The statistic also depends upon the "degrees of freedom," which is almost always one less than the final number of bins. See the chi-square section of the Ciphers By Ritter / JavaScript computation pages.

The c.d.f. percentage for a particular chi-square value is the area of the statistic distribution to the left of the statistic value; this is the probability of obtaining that statistic value or less by random selection when testing two distributions which are exactly the same. Repeated trials which randomly sample two identical distributions should produce about the same number of X2 values in each quarter of the distribution (0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100%). So if we repeatedly find only very high percentage values, we can assume that we are probing different distributions. And even a single very high percentage value would be a matter of some interest.

Any statistic probability can be expressed either as the proportion of the area to the left of the statistic value (this is the "cumulative distribution function" or c.d.f.), or as the area to the right of the value (this is the "upper tail"). Using the upper tail representation for the X2 distribution can make sense because the usual chi-squared test is a "one tail" test where the decision is always made on the upper tail. But the "upper tail" has an opposite "sense" to the c.d.f., where higher statistic values always produce higher percentage values. Personally, I find it helpful to describe all statistics by their c.d.f., thus avoiding the use of a wrong "polarity" when interpreting any particular statistic. While it is easy enough to convert from the c.d.f. to the complement or vise versa (just subtract from 1.0), we can base our arguments on either form, since the statistical implications are the same.

It is often unnecessary to use a statistical test if we just want to know whether a function is producing something like the expected distribution: We can look at the binned values and generally get a good idea about whether the distributions change in similar ways at similar places. A good rule-of-thumb is to expect chi-square totals similar to the number of bins, but distinctly different distributions often produce huge totals far beyond the values in any table, and computing an exact probability for such cases is simply irrelevant. On the other hand, it can be very useful to perform 20 to 40 independent experiments to look for a reasonable statistic distribution, rather than simply making a "yes / no" decision on the basis of what might turn out to be a rather unusual result.

Since we are accumulating discrete bin-counts, any fractional expectation will always differ from any actual count. For example, suppose we expect an even distribution, but have many bins and so only accumulate enough samples to observe about 1 count for every 2 bins. In this situation, the absolute best sample we could hope to see would be something like (0,1,0,1,0,1,...), which would represent an even, balanced distribution over the range. But even in this best possible case we would still be off by half a count in each and every bin, so the chi-square result would not properly characterize this best possible sequence. Accordingly, we need to accumulate enough samples so that the quantization which occurs in binning does not appreciably affect the accuracy of the result. Normally I try to expect at least 10 counts in each bin.

But when we have a reference distribution that trails off toward zero, inevitably there will be some bins with few counts. Taking more samples will just expand the range of bins, some of which will be lightly filled in any case. We can avoid quantization error by summing both the observations and expectations from multiple bins, until we get a reasonable expectation value (again, I like to see 10 counts or more). In this way, the "tails" of the distribution can be more properly (and legitimately) characterized.

In general, a key-selected secret transformation between plaintext and ciphertext. Specifically, a secrecy mechanism or process which operates on individual characters or bits independent of semantic content. As opposed to a secret code, which generally operates on words, phrases or sentences, each of which may carry some amount of complete meaning. Also see: cryptography, block cipher, stream cipher, a cipher taxonomy, and substitution.

A good cipher can transform secret information into a multitude of different intermediate forms, each of which represents the original information. Any of these intermediate forms or ciphertexts can be produced by ciphering the information under a particular key value. The intent is that the original information only be exposed by one of the many possible keyed interpretations of that ciphertext. Yet the correct interpretation is available merely by deciphering under the appropriate key.

A cipher appears to reduce the protection of secret information to enciphering under some key, and then keeping that key secret. This is a great reduction of effort and potential exposure, and is much like keeping your valuables in your house, and then locking the door when you leave. But there are also similar limitations and potential problems.

With a good cipher, the resulting ciphertext can be stored or transmitted otherwise exposed without also exposing the secret information hidden inside. This means that ciphertext can be stored in, or transmitted through, systems which have no secrecy protection. For transmitted information, this also means that the cipher itself must be distributed in multiple places, so in general the cipher cannot be assumed to be secret. With a good cipher, only the deciphering key need be kept secret.

A Cipher Taxonomy
For the analysis of cipher operation it is useful to collect ciphers into groups based on their functioning (or intended functioning). The goal is to group ciphers which are essentially similar, so that as we gain an understanding of one cipher, we can apply that understanding to others in the same group. We thus classify not by the components which make up the cipher, but instead on the "black-box" operation of the cipher itself.

We seek to hide distinctions of size, because operation is independent of size, and because size effects are usually straightforward. We thus classify serious block ciphers as keyed simple substitution, just like newspaper amusement ciphers, despite their obvious differences in strength and construction. This allows us to compare the results from an ideal tiny cipher to those from a large cipher construction; the grouping thus can provide benchmark characteristics for measuring large cipher constructions.

We could of course treat each cipher as an entity unto itself, or relate ciphers by their dates of discovery, the tree of developments which produced them, or by known strength. But each of these criteria is more or less limited to telling us "this cipher is what it is." We already know that. What we want to know is what other ciphers function in a similar way, and then whatever is known about those ciphers. In this way, every cipher need not be an island unto itself, but instead can be judged and compared in a related community of similar techniques.

Our primary distinction is between ciphers which handle all the data at once (block ciphers), and those which handle some, then some more, then some more (stream ciphers). We thus see the usual repeated use of a block cipher as a stream meta-cipher which has the block cipher as a component. It is also possible for a stream cipher to be re-keyed or re-originate frequently, and so appear to operate on "blocks." Such a cipher, however, would not have the overall diffusion we normally associate with a block cipher, and so might usefully be regarded as a stream meta-cipher with a stream cipher component.

The goal is not to give each cipher a label, but instead to seek insight. Each cipher in a particular general class carries with it the consequences of that class. And because these groupings ignore size, we are free to generalize from the small to the large and so predict effects which may be unnoticed in full-size ciphers.

    A block cipher requires the accumulation of some amount of data or multiple data elements for ciphering to complete. (Sometimes stream ciphers accumulate data for convenience, as in cylinder ciphers, which nevertheless logically cipher each character independently.)

    (Note that this definition is somewhat broader than the now common understanding of a huge, and thus emulated, Simple Substitution. But there are ciphers which require blocked plaintext and which do not emulate Simple Substitution, and calling these something other than "block" ciphers negates the advantage of a taxonomy.)

      • A "codebook" or "simple substitution."
      • Each code value becomes a distinguishable element. Thus, substitution generally converts a collection of independent elements to a single related unit.
      • Keying constitutes a permutation or re-arrangement of the fixed set of possible code values.
      • Avalanche or data diffusion is a natural consequence of an arbitrary selection among all possible code values.
      • The usual complete binary substitution distributes bit-changes between code values binomially, and this effect can be sampled and examined statistically.
      • Avalanche is two-way diffusion in the sense that "later" plaintext can change "earlier" ciphertext.
      • A conventional block cipher is built from small components with a design intended to simulate a substitution table of a size vastly larger than anything which could be practically realized.

      1. Transposition Cipher
        • Clearly, it is necessary for all message elements which will be transposed to be collected before operations begin; this is the block cipher signature.
        • Any possible transposition is necessarily a subset of an arbitrary substitution; thus, transposition can be seen as a particular keying subset of substitution.
        • Notice, however, that the usual avalanche signature of substitution is not present, and of course the actual data values are not changed at all by transposition, just moved about.
        • Also notice that we are close to using the idea of permutation in two very different ways: first as a particular n-bit to n-bit substitution, and second as a particular re-arrangement of characters in the block. These have wildly different ciphering effects.

    • A stream cipher does not need to accumulate some amount of data or multiple data elements for ciphering to complete. (Since we define only two main "types" of cipher, a stream cipher is the opposite of a block cipher and vise versa. It is extremely important that the definitions for block and stream ciphering enclose the universe of all possible ciphers.)
    • A stream cipher has the ability to transform individual elements one-by-one. The actual transformation usually is a block transformation, and may be repeated with the same or different keying.
    • In a stream cipher, data diffusion may or may not occur, but if it does, it is necessarily one-way (from earlier to later elements).
    • Since elements are ciphered one-by-one, changing part of the plaintext can affect that part and possibly later parts of the ciphertext; this is a stream cipher signature.
    • The simple re-use of a block transformation to cover more data than a single block is a stream operation.

      • With a truly random sequence, used once, we have a one time pad.
      • With a pseudorandom confusion sequence and a simple additive combiner, we have a Vernam cipher.
      • A simple additive transformation becomes weak upon the second character ciphered, or immediately, under known plaintext, making strength dependent on the confusion sequence.
      • More complex transformations imply the need for correspondingly less strong confusion sequences.

      1. Autokey
        • Normally the use of ciphertext, but also perhaps plaintext, as the cipher key.
        • Can create a random-like confusion stream which will re-synchronize after ciphertext data loss.
        • Under known-plaintext, the common "ciphertext feedback" version exposes both the confusion sequence and the input which creates that sequence. This is a lot of pressure on a single transformation.

      • The repeated use of a single fixed substitution.
      • A conventional block cipher simulates a large substitution.
      • A substitution becomes weak when its code values are re-used.
      • Code value re-use can be minimized by randomizing the plaintext block (e.g., CBC). This distributes the plaintext evenly across the possible block values, but at some point the transformation itself must change or be exposed.
      • Another alternative is to use a very large block so that code value re-use is made exceedingly unlikely. A large block also has room for a dynamic keying field which would make code value re-use even more unlikely.

      • The use of multiple fixed substitutions.
      • By itself, the use of multiple alphabets in a regular sequence is inherently not much stronger than just a single alphabet.
      • It is of course possible to select an alphabet or transformation at pseudo-random, for example by re-keying DES after every block ciphered. This brings back sequence strength as an issue, and opens up the sequence generator starting state as an IV.
      • A related possibility is the use of a Latin square combiner which effectively selects among a balanced set of different fixed substitution alphabets.

      1. Cylinder
        • A cipher which has or simulates the use of a number of different alphabet disks on a common rod.
        • Primary keying is the arrangement of the alphabet around each disk, and the selection and arrangement of disks.
        • By entering the plaintext on one row, any of n-1 other rows can be sent as ciphertext; this selection is an IV.
        • If the plaintext data are redundant, it is possible to avoid sending the IV by selecting the one of n-1 possible decipherings which shows redundancy. But this is not generally possible when ciphering arbitrary binary data.
        • If an IV is selected first, each character ciphering in that "chunk" is independent of each other ciphering. There is no data diffusion.
        • In general, each disk is used at fixed periodic intervals through the text, which is weak.
        • The ciphertext selection is homophonic, in the sense that different ciphertext rows each represent exactly the same plaintext.
        • Cylinder operation is not polyphonic in the usual sense: While a single ciphertext can imply any other row is plaintext, generally only one row has a reasonable plaintext meaning.

    4. DYNAMIC
      • The use of one (monoalphabetic) or multiple (polyalphabetic) substitutions which change during ciphering.

      • The iterative re-use of a stream cipher with a new random IV on each iteration so as to eventually achieve the effect of a message key.
      • Each iteration seemingly must expand the ciphertext by the size of the IV, although this is probably about the same expansion we would have with a message key.
      • Unfortunately, each iteration will take some time.

The use of a cipher. The general term which includes both enciphering and deciphering.

The result of enciphering. Ciphertext will contain the same information as the original plaintext, but hide the original information, typically under the control of a key. Without the key it should be impractical to recover the original information from the ciphertext.

Ciphertext Expansion
When the ciphertext is larger than the original plaintext.

Ciphertext expansion is the general situation: Stream ciphers need a message key, and block ciphers with a small block need some form of plaintext randomization, which generally needs an IV to protect the first block. Only block ciphers with a large size block generally can avoid ciphertext expansion, and then only if each block can be expected to hold sufficient uniqueness or "entropy" to prevent a codebook attack.

It is certainly true that in most situations of new construction a few extra bytes are not going to be a problem. However, in some situations, and especially when a cipher is to be installed into an existing system, the ability to encipher data without requiring additional storage can be a big advantage. Ciphering data without expansion supports the ciphering of data structures which have been defined and fixed by the rest of the system, provided only that one can place the cipher at the interface "between" two parts of the system. This is also especially efficient, as it avoids the process of acquiring a different, larger, amount of store for each ciphering. Such an installation also can apply to the entire system, and not require the re-engineering of all applications to support cryptography in each one.

Audio or voice encryption. A contraction of "ciphered telephony."

The "circular" flow of electrons from a power source, through conductors and components and back to the power source. Or the arrangement of components which allows such flow and performs some function.

A repetitive or cyclic timing signal to coordinate state changes in a digital system. A clock can coordinate the movement of data and results through various stages of processing. Although a clock signal is digital, the source of the repetitive signal is almost always an analog circuit.

In an analog system we might produce a known delay by slowly charging a capacitor and measuring the voltage across it continuously until the voltage reaches the desired level. A big problem with this is that the circuit becomes increasingly susceptible to noise at the end of the interval.

In a digital system we create a delay by simply counting clock cycles. Since all external operations are digital, noise effects are virtually eliminated, and we can easily create accurate delays which are as long as the count in any counter we can build.

An operation on a set which produces only elements in that set.

Symbols or values which stand for symbols, values, sequences, or even operations (as in computer "opcodes"). As opposed to a cipher, which operates only on individual characters or bits, classically, codes also represent words, phrases, and entire sentences. One application was to decrease the cost of telegraph messages. In modern usage, a code is often simply a correspondence between information (such as character symbols) and values (such as the ASCII code or Base-64), although computer opcodes do have independent meanings and variable lengths.

Coding is a very basic part of modern computation and generally implies no secrecy or information hiding. Some codes are "secret codes," however, and then the transformation between the information and the coding is kept secret. Also see: cryptography and substitution.

Literally, the listing or "book" of code transformations. More generally, any collection of such transformations. Classically, letters, common words and useful phrases were numbered in a codebook; messages transformed into those numbers were "coded messages." Also see nomenclator. A "codebook style cipher" refers to a block cipher.

Codebook Attack
A form of attack in which The Opponent simply tries to build or collect a codebook of all the possible transformations between plaintext and ciphertext under a single key. This is the classic approach we normally think of as "codebreaking."

The usual ciphertext-only approach depends upon the plaintext having strong statistical biases which make some values far more probable than others, and also more probable in the context of particular preceding known values. Such attacks can be defeated if the plaintext data are randomized and thus evenly and independently distributed among the possible values. (This may have been the motivation for the use of a random confusion sequence in a stream cipher.)

When a codebook attack is possible on a block cipher, the complexity of the attack is controlled by the size of the block (that is, the number of elements in the codebook) and not the strength of the cipher. This means that a codebook attack would be equally effective against either DES or Triple-DES.

One way a block cipher can avoid a codebook attack is by having a large block size which will contain an unsearchable amount of plaintext "uniqueness" or entropy. Another approach is to randomize the plaintext block, often by using an operating mode such as CBC.

The mathematical term for any particular subset of symbols, independent of order. (Also called the binomial coefficient.) The number of combinations of n things, taken k at a time, read "n choose k" is:


   ( ) = C(n,k) =  n! / (k! (n-k)!)


See the combinations section of the Ciphers By Ritter / JavaScript computation pages. Also see permutation.

Combinatorics is a branch of mathematics, like analysis or number theory. Combinatorics is often related to counting the subsets of finite sets. One result is to help us to understand the probability of a particular subset in the universe of possible values.

Consider a block cipher: For any given size block, there is some fixed number of possible messages. Since every enciphering must be reversible (deciphering must work), we have a 1:1 mapping between plaintext and ciphertext blocks. The set of all plaintext values and the set of all ciphertext values is the same set; particular values just have different meanings in each set.

Keying gives us no more ciphertext values, it only re-uses the values which are available. Thus, keying a block cipher consists of selecting a particular arrangement or permutation of the possible block values. Permutations are a combinatoric topic. Using combinatorics we can talk about the number of possible permutations or keys in a block cipher, or in cipher components like substitution tables.

Permutations can be thought of as the number of unique arrangements of a given length on a particular set. Other combinatoric concepts include binomials and combinations (the number of unique given-length subsets of a given set).

In a cryptographic context, a combiner is a mechanism which mixes two data sources into a single result. A "combiner style cipher" refers to a stream cipher.

Reversible combiners are used to encipher plaintext into ciphertext in a stream cipher. The ciphertext is then deciphered into plaintext using a related inverse or extractor mechanism.

Irreversible or non-invertible combiners are often used to mix multiple RNG's into a single confusion sequence, also for use in stream cipher designs.

Also see balanced combiner, additive combiner and complete, and The Story of Combiner Correlation: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.

A dyadic operation in which exchanging the two argument values must produce the same result: a + b = b + a.

Also see: associative and distributive.

A term used in S-box analysis to describe a property of the value arrangement in an invertible substitution or, equivalently, a block cipher. If we have some input value, and then change one bit in that value, we expect about half the output bits to change; this is the result of diffusion; when partial diffusion is repeated we develop avalanche; and the ultimate result is strict avalanche. Completeness tightens this concept and requires that changing a particular input bit produce a change in a particular output bit, at some point in the transformation (that is, for at least one input value). Completeness requires that this relationship occur at least once for every combination of input bit and output bit. It is tempting to generalize the definition to apply to multi-bit element values, where this makes more sense.

Completeness does not require that an input bit change an output bit for every input value (which would not make sense anyway, since every output bit must be changed at some point, and if they all had to change at every point, we would have all the output bits changing, instead of the desired half). The inverse of a complete function is not necessarily also complete.

As originally defined in Kam and Davida:

"For every possible key value, every output bit ci of the SP network depends upon all input bits p1,...,pn and not just a proper subset of the input bits." [p.748]
Kam, J. and G. Davida. 1979. Structured Design of Substitution-Permutation Encryption Networks. IEEE Transactions on Computers. C-28(10): 747-753.

A part of a larger construction; a building-block in an overall design or system. Modern digital design is based on the use of a few general classes of pre-defined, fully-specified parts. Since even digital logic can use or even require analog values internally, by enclosing these values the logic component can hide complexity and present the appearance of a fully digital device.

The most successful components are extremely general and can be used in many different ways. Even as a brick is independent of the infinite variety of brick buildings, a flip-flop is independent of the infinite variety of logic machines which use flip-flops.

The source of the ability to design and build a wide variety of different electronic logic machines is the ability to interconnect and use a few very basic but very general parts.

Electronic components include

Cryptographic system components include:

Originally the job title for a person who performed a laborious sequence of arithmetic computations. Now a machine for performing such calculations.

A logic machine with:

  1. Some limited set of fundamental computations. Typical operations include simple arithmetic and Boolean logic. Each operation is selected by a particular operation code value or "opcode." This is a hardware interpretation of the opcode.

  2. The ability to follow a list of instructions or commands, performing each in sequence. Thus capable of simulating a wide variety of far more complex "instructions."

  3. The ability to execute or perform at least some instructions conditionally, based on parameter values or intermediate results.

  4. The ability to store values into a numbered "address space" which is far larger than the instruction set, and later to recover those values when desired.

Also see: source code, object code and software.

A material in which electron flow occurs easily. Typically a metal; usually copper, sometimes silver, brass or even aluminum. A wire. As opposed to an insulator.

Those parts of a cipher mechanism which change the correspondence between input values and output values. In contrast to diffusion.

Confusion Sequence
The sequence combined with data in a stream cipher. Normally produced by a random number generator, it is also called a "running key."

In the study of logic, an observed fact dependent upon other facts not being observed. Or a statement which is conditionally true, provided other unmentioned conditions have the appropriate state. As opposed to absolute.

Conventional Cipher
A secret key cipher.

Casually speaking, the remainder after a division of integers.

In number theory we say than integer a (exactly) divides integer b (denoted a | b) if and only if there is an integer k such that ak = b.

In number theory we say that integer a is congruent to integer b modulo m, denoted a = b (mod m), if and only if m | (a - b). Here m is the divisor or modulus.

Polynomial multiplication. A multiplication of each term against each other term, with no "carries" from term to term. Also see correlation.

Used in the analysis of signal processing to develop the response of a processing system to a complicated real-valued input signal. The input signal is first separated into some number of discrete impulses. Then the system response to an impulse -- the output level at each unit time delay after the impulse -- is determined. Finally, the expected response is computed as the sum of the contributions from each input impulse, multiplied by the magnitude of each impulse. This is an approximation to the convolution integral with an infinite number of infinitesimal delays. Although originally accomplished graphically, the process is just polynomial multiplication.

It is apparently possible to compute the convolution of two sequences by taking the FFT of each, multiplying these results term-by-term, then taking the inverse FFT. While there is an analogous relationship in the FWT, in this case the "delays" between the sequences represent mod 2 distance differences, which may or may not be useful.

In general, the probability that two sequences of symbols will, in any position, have the same symbol. We expect two random binary sequences to have the same symbols about half the time.

One way to evaluate the correlation of two real-valued sequences is to multiply them together term-by-term and sum all results. If we do this for all possible "delays" between the two sequences, we get a "vector" or 1-dimensional array of correlations which is a convolution. Then the maximum value represents the delay with the best correlation.

Correlation Coefficient
The value from -1 to +1 describing the correlation of two binary sequences, averaged over the length of interest. Correlation coefficient values are related to the probability that, given a symbol from one sequence, the other sequence will have that same symbol. A value of:
  • -1 implies a 0.0 probability (the second sequence is the complement of the first),
  • 0 implies a 0.5 probability (the sequences are uncorrelated), and
  • +1 implies a 1.0 probability (the sequences are the same).

"The correlation coefficient associated with a pair of Boolean functions f(a) and g(a) is denoted by C(f,g) and is given by

C(f,g) = 2 * prob(f(a) = g(a)) - 1 ."

Daemen, J., R. Govaerts and J. Vanderwalle. 1994. Correlation Matrices. Fast Software Encryption. 276. Springer-Verlag.

Cyclic Redundancy Check: A fast error-check hash based on mod 2 polynomial operations.

A CRC is essentially a fast remainder operation over a huge numeric value which is the data. (For best speed, the actual computation occurs as mod 2 polynomial operations.) The CRC result is an excellent (but linear) hash value corresponding to the data.

No CRC has any appreciable strength, but some applications -- even in cryptography -- need no strength:

  • One example is authentication, provided the linear CRC hash result is protected by a block cipher.
  • Another example is key processing, where the uncertainty in a User Key phrase of arbitrary size is collected into a hash result of fixed size. In general, the hash result would be just as good for The Opponent as the original key phrase, so no strength shield could possibly improve the situation.
  • A third example is the accumulation of the uncertainty in slightly uncertain physically random events. When true randomness is accumulated, it is already as unknowable as any strength shield could make it.

That aspect of cryptology which concerns the strength analysis of a cryptographic system, and the penetration or breaking of a cryptographic system. Also "codebreaking."

Because there is no theory which guarantees strength for any conventional cipher, ciphers traditionally have been considered "strong" when they have been used for a long time with "nobody" knowing how to break them easily. Cryptanalysis seeks to improve this process by applying the known attack strategies to new ciphers, and by actively seeking new ones. It is normal to assume that at least known-plaintext is available; often, defined-plaintext is assumed. The result is typically some value for the amount of "work" which will achieve a "break" (even if that value is impractical); this is "the" strength of the cipher.

But while cryptanalysis can prove "weakness" for a given level of effort, cryptanalysis cannot prove that there is no simpler attack:

Lack of proof of weakness is not proof of strength.

Indeed, when ciphers are used for real, The Opponents can hardly be expected to advertise a successful break, but will instead work hard to reassure users that their ciphers are still secure. The fact that apparently "nobody" knows how to break a cipher is somewhat less reassuring from this viewpoint. In this context, using a wide variety of different ciphers can make good sense: This reduces the value of the information protected by any particular cipher, which thus reduces the rewards from even a successful attack. Having a numerous ciphers also requires The Opponents to field far greater resources to identify, analyze, and automate breaking (when possible) of each different cipher.

Many academic attacks are essentially theoretical, involving huge amounts of data and computation. But even when a direct technical attack is practical, that may be the most difficult, expensive and time-consuming way to obtain the desired information. Other methods include making a paper copy, stealing a copy, bribery, coercion, and electromagnetic monitoring. No cipher can keep secret something which has been otherwise revealed. Information security thus involves far more than just cryptography, and even a cryptographic system is more than just a cipher. Even finding that information has been revealed does not mean that a cipher has been broken.

At one time it was reasonable to say: "Any cipher a man can make, another man can break." However, with the advent of serious computer-based cryptography, that statement is no longer valid, provided that every detail is properly handled. This, of course, often turns out to not be the case.

Someone who attacks ciphers with cryptanalysis. A "codebreaker." Often called the Opponent by cryptographers, in recognition of the (serious) game of thrust and parry between these parties.

Someone who creates ciphers using cryptography.

Cryptographic Mechanism
A process for enciphering and/or deciphering, or an implementation (for example, hardware, computer software, hybrid, or the like) for performing that process. See also cryptography and mechanism.

Greek for "hidden writing." The art and science of transforming information into an intermediate form which secures that information while in storage or in transit. A part of cryptology, further divided into secret codes and ciphers. As opposed to steganography, which seeks to hide the existence of any message, cryptography seeks to render a message unintelligible even when the message is completely exposed.

Cryptography includes at least:

Cryptography may also include:
  • nonrepudiation (the inability to deny sending a message),
  • access control (user or source authentication), and
  • availability (keeping security services available).

Modern cryptography generally depends upon translating a message into one of an astronomical number of different intermediate representations, or ciphertexts, as selected by a key. If all possible intermediate representations have similar appearance, it may be necessary to try all possible keys to find the one which deciphers the message. By creating mechanisms with an astronomical number of keys, we can make this approach impractical.

Cryptography may also be seen as a zero-sum game, where a cryptographer competes against a cryptanalyst. We might call this the cryptography war.

Cryptography War
Cryptography may be seen as a dynamic battle between cryptographer and cryptanalyst. The cryptographer tries to produce a cipher which can retain secrecy. Then, when it becomes worthwhile, one or more cryptanalysts try to penetrate that secrecy by attacking the cipher. Fortunately for the war, even after fifty years of mathematical cryptology, not one practical cipher has been accepted as proven secure in practice. (See, for example, the one-time pad.)

Note that the successful cryptanalyst must keep good attacks secret, or the opposing cryptographer will just produce a stronger cipher. This means that the cryptographer is in the odd position of never knowing whether his or her best cipher designs are successful, or which side is winning.

Cryptographers are often scientists who are trained to ignore unsubstantiated claims. But there will be no substantiation when a cipher system is attacked and broken for real, yet continued use will endanger all messages so "protected." Thus, it is a very reasonable policy to not adopt a widely-used cipher, and to change ciphers periodically.

The field of study which generally includes steganography, cryptography and cryptanalysis.

The measure of electron flow, in amperes. Current is analogous to the amount of water flow, as opposed to pressure or voltage. A flowing electrical current will create a magnetic field around the conductor. A changing electrical current may create an electromagnetic field.



Direct Current: Electrical power which flows in one direction, more or less constantly. As opposed to AC.

Most electronic devices require DC -- at least internally -- for proper operation, so a substantial part of modern design is the "power supply" which converts 120 VAC wall power into 12 VDC, 5 VDC and/or 3 VDC as needed by the circuit and active devices.

The interactive analytical process of correcting the design of a complex system. A normal part of the development process, although when bugs are not caught during development, they can remain in production systems.

Contrary to naive expectations, a complex system almost never performs as desired when first realized. Both hardware and software system design environments generally deal with systems which are not working. (When a system really works, the design and development process is generally over.) Debugging involves identifying problems, analyzing the source of those problems, then changing the construction to fix the problem. (Hopefully, the fix will not itself create new problems.) This form of interactive analysis can be especially difficult because the realized design may not actually be what is described in the schematics, flow-charts, or other working documents: To some extent the real system is unknown.

When a system has many problems, the problems tend to interact, which can make the identification of a particular cause very difficult. This can be managed by "shrinking" the system: first by partitioning the design into components and testing those components, and then by temporarily disabling or removing sections so as to identify the section in which the problem lies. Eventually, with enough testing, partitioning and analysis, the source of any problem can be identified. Some "problems," however, turn out to be the unexpected implications of a complex design and are sometimes accepted as "features" rather than the alternative of a complete design overhaul.

The process which can reveal the information or plaintext hidden in message ciphertext (provided it is the correct process, with the proper key). The inverse of encipher.

The general term for extracting information which was hidden by encryption.

Deductive Reasoning
In the study of logic, reasoning about a particular case from one or more general statements; a proof. Also see: inductive reasoning and fallacy.

Defined Plaintext Attack
A form of attack in which the Opponent can present arbitrary plaintext to be enciphered, and then capture the resulting ciphertext. The ultimate form of known plaintext attack.

A defined plaintext attack can be a problem for systems which allow unauthorized users to present arbitrary messages for ciphering. Such attack can be made difficult by allowing only authorized users to encipher data, by allowing only a few messages to be enciphered between key changes, by changing keys frequently, and by enciphering each message in a different random message key.

Degrees of Freedom
In statistics, the number of completely independent values in a sample. The number of sampled values or observations or bins, less the number of defined or freedom-limiting relationships or "constraints" between those values.

If we choose two values completely independently, we have a DF of 2. But if we must choose two values such that the second is twice the first, we can choose only the first value independently. Imposing a relationship on one of the sampled value means that we will have a DF of one less than the number of samples, even though we may end up with apparently similar sample values.

In a typical goodness of fit test such as chi-square, the reference distribution (the expected counts) is normalized to give the same number of counts as the experiment. This is a constraint, so if we have N bins, we will have a DF of N - 1.

The particular block cipher which is the U.S. Data Encryption Standard. A 64-bit block cipher with a 56-bit key organized as 16 rounds of operations.

Ten times the base-10 logarithm of the ratio of two power values. Denoted by dB. One-tenth of a bel.

When voltages or currents are measured, power changes as the square of these values, so a decibel is twenty times the base-10 logarithm of the ratio of two voltages or currents.

Base 10: The numerical representation in which each digit has an alphabet of ten symbols, usually 0 through 9. Also see: binary, octal, and hexadecimal.

Design Strength
The keyspace; the effort required for a brute force attack.

A process whose sequence of operations is fully determined by its initial state. A mechanical or clockwork-like process whose outcome is inevitable, given its initial setting. Pseudorandom.

Dictionary Attack
Typically an attack on a secret password. A dictionary of common passwords is developed, and a brute force attack conducted on the target with each common password.

Differential Cryptanalysis
A form of attack in which the difference between values (or keys) is used to gain some information about the system.

Also see Differential Cryptanalysis: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.

Diffusion is the property of an operation such that changing one bit (or byte) of the input will change adjacent or near-by bits (or bytes) after the operation. In a block cipher, diffusion propagates bit-changes from one part of a block to other parts of the block. Diffusion requires mixing, and the step-by-step process of increasing diffusion is described as avalanche. Diffusion is in contrast to confusion.

Normally we speak of data diffusion, in which changing a tiny part of the plaintext data may affect the whole ciphertext. But we can also speak of key diffusion, in which changing even a tiny part of the key should change each bit in the ciphertext with probability 0.5.

Perhaps the best diffusing component is substitution, but this diffuses only within a single substituted value. Substitution-permutation ciphers get around this by moving the bits of each substituted element to other elements, substituting again, and repeating. But this only provides guaranteed diffusion if particular substitution tables are constructed. Another alternative is to use some sort of Balanced Block Mixing which has an inherently guaranteed diffusion, or a Variable Size Block Cipher construction. Also see Overall Diffusion.

Pertaining to discrete or distinct finite values. As opposed to analog or continuous quantities.

An electronic device with two terminals which allows current to flow in only one direction.

In statistics, the range of values which a random variable, and the probability that each value or range of values will occur. Also the probability of test statistic values for the case "nothing unusual found," which is the null hypothesis.

If we have a discrete distribution, with a finite number of possible result values, we can speak of "frequency" and "probability" distributions: The "frequency distribution" is the expected number of occurrences for each possible value, in a particular sample size. The "probability distribution" is the probability of getting each value, normalized to a probability of 1.0 over the sum of all possible values.

Here is a graph of a typical "discrete probability distribution" or "discrete probability density function," which displays the probability of getting a particular statistic value for the case "nothing unusual found":

   0.1|        ***

      |       *   *         Y = Probability of X

    Y |     **     **       y = P(x)

      | ****         ****

   0.0 -------------------


Unfortunately, it is not really possible to think in the same way about continuous distributions: Since continuous distributions have an infinite number of possible values, the probability of getting any particular value is zero. For continuous distributions, we instead talk about the probability of getting a value in some subrange of the overall distribution. We are often concerned with the probability of getting a particular value or below, or the probability of a particular value or above.

Here is a graph of the related "cumulative probability distribution" or "cumulative distribution function" (c.d.f.) for the case "nothing unusual found":

   1.0|            ******

      |          **           Y = Probability (0.0 to 1.0) of finding

    Y |         *                 a value which is x or less

      |       **

   0.0 -******------------


The c.d.f. is just the sum of all probabilities for a given value or less. This is the usual sort of function used to interpret a statistic: Given some result, we can look up the probability of a lesser value (normally called p) or a greater value (called q = 1.0 - p).

Usually, a test statistic is designed so that extreme values are not likely to occur by chance in the case "nothing unusual found" which is the null hypothesis. So if we do find extreme values, we have a strong argument that the results were not due simply to random sampling or other random effects, and may choose to reject the null hypothesis and thus accept the alternative hypothesis.

Common discrete distributions include the binomial distribution, and the Poisson distribution.

The case of a dyadic operation, which may be called "multiplication," which can be applied to equations involving another dyadic operation, which may be called "addition," such that: a(b + c) = ab + ac and (b + c)a = ba + bc.

Also see: associative and commutative.

Divide and Conquer
The general concept of being able to split a complexity into several parts, each part naturally being less complex than the total. If this is possible, The Opponent may be able to solve all of the parts far easier than the supposedly complex whole. Often part of an attack.

This is a particular danger in cryptosystems, since most ciphers are built from less-complex parts. Indeed, a major role of cryptographic design is to combine small component parts into a larger complex system which cannot be split apart.

The set of all arguments x which can be applied to a mapping. Also see range.

Relating to dyad, which is Greek for dual or having two parts. In particular, a function with two inputs or arguments. Also see: monadic, unary and binary.

Dynamic Keying
That aspect of a cipher which allows a key to be changed with minimal overhead. A dynamically-keyed block cipher might impose little or no additional computation to change a key on a block-by-block basis. The dynamic aspect of keying could be just one of multiple keying mechanisms in the same cipher.

One way to have a dynamic key in a block cipher is to include the key value along with the plaintext data. But this is normally practical only with blocks of huge size, or variable size blocks.

Another way to have a dynamic key in a block cipher is to add a confusion layer which mixes the key value with the block. For example, exclusive-OR could be used to mix a 64-bit key with a 64-bit data block.

Dynamic Substitution Combiner
The combining mechanism described in U.S. Patent 4,979,832 (see the Dynamic Substitution articles on the Ciphers By Ritter page).

Dynamic Substitution is the use of an invertible substitution table in which the arrangement of the entries changes dynamically during operation. This is particularly useful as a strong replacement for the strengthless exclusive-OR combiner in stream ciphers.

The arrangement of a keyed table starts out unknown to an Opponent. From the Opponent's point of view, each table entry could be any possible value with uniform probability. But after the first value is mapped through that table, the used transformation (table entry) is at least potentially exposed, and no longer can be considered a completely unknown probability. Dynamic Substitution acts to make the used transformation again completely unknown and unbiased, by allowing it to take on any possible mapping. As a first approximation, the amount of information leaked about table contents is replaced by information used to re-define each used entry.

In the usual case, an invertible substitution table is keyed by shuffling under the control of a random number generator. One combiner input value is used to select a value from within that table to be the result or output. The other combiner input value is used simply to select an entry, and then the values at the two selected entries are exchanged. So as soon as a plaintext mapping is used, it is immediately reset to any possibility, and the more often any plaintext value occurs, the more often that transformation changes.

Also see Balanced Block Mixing, and Variable Size Block Cipher.

Dynamic Transposition
A block cipher which first creates an exact bit-balance within each block, and then shuffles the bits within a block, each block being permuted independently from a keyed random number generator.

Since each block -- plaintext or ciphertext -- contains exactly the same number of 1's and 0's, every possible plaintext block is just some permutation of any possible ciphertext block. And since any possible plaintext block can be produced from any ciphertext block in a vast plethora of different ways, the keying sequence is hidden even from known plaintext. And defined plaintext is easily defeated with the usual message key. To the extent that every possible plaintext block can be produced, the cipher approaches perfect secrecy.

See the article Transposition Cipher with Pseudo-Random Shuffling: The Dynamic Transposition Combiner.


ECB or Electronic Code Book is an operating mode for block ciphers. Presumably the name comes from the observation that a block cipher under a fixed key functions much like a physical codebook: Each possible plaintext block value has a corresponding ciphertext value, and vise versa.

ECB is the naive method of applying a block cipher, in that the plaintext is simply partitioned into appropriate size blocks, and each block is enciphered separately and independently. When we have a small block size, ECB is generally unwise, because language text has biased statistics which will result in some block values being re-used frequently, and this repetition will show up in the raw ciphertext. This is the basis for a successful codebook attack.

On the other hand, if we have a large block, we may expect it to contain enough (at least, say, 64 bits) uniqueness or "entropy" to prevent a codebook attack. In that case, ECB mode has the advantage of supporting independent ciphering of each block. This, in turn, supports various things, like the use of multiple ciphering hardware operating in parallel for higher speeds.

As another example, modern packet-switching network technologies often deliver raw packets out of order. The packets will be re-ordered eventually, but having out-of-sequence packets can be a problem for low-level ciphering if the blocks are not ciphered independently.

Electric Field
The fundamental physical force resulting from the attraction of opposing charges.

Electromagnetic Field
The remarkable self-propagating physical field consisting of energy distributed between electric and magnetic fields. Energy in the electric or potential field collapses and creates or "charges up" a magnetic field. Energy in the magnetic field collapses and "charges up" an electric field. This process allows physical electrical and magnetic fields -- two fairly short-range phenomena -- to "propagate" and thus carry energy over relatively large distances at the speed of light. Examples include light, "radio" waves (including TV, cell phones, etc.), and microwave cooking.

It is important to distinguish between a true electromagnetic field, and the simpler and range-limited electric and magnetic fields produced by an electrical clock, motor, or power lines. It is also important to distinguish between the light-like expanding or "radiating" property of an electromagnetic field, and the damaging ionizing radiation produced by a radioactive source.

As far as we know -- and a great many experiments have been conducted on this -- electromagnetic waves are not life-threatening (unless they transfer enough power to dangerously heat the water in our cells). The belief that electromagnetic fields are not dangerous is also reasonable, since light itself is an electromagnetic wave, and life on Earth developed in the context of the electromagnetic field from the Sun. Indeed, plants actually use that field to their and our great benefit.

Having to do with the control and use of physical electrons, as electrical potential or voltage, electrical flow or current, and generally both. See hardware and component.

The process which will transform information or plaintext into one of plethora of intermediate forms or ciphertext, as selected by a key. The inverse of decipher.

The general term for hiding information in secret code or cipher.

In information theory, our "uncertainty" as to the value of a random variable. Given the non-zero probability (p) of each value (i), we can calculate an entropy (H) in bits for random variable X as:

   H(X) = -SUM( pi log2 pi )

Although entropy is sometimes taken as a measure of randomness, calculating entropy requires a knowledge of the probabilities of each value which we often can attain only by sampling. This means that we do not really know the "true" probabilities, but only those we see in our samples. And the "true" probabilities may change through time.

By itself, calculated entropy also does not detect any underlying order that might exist between value probabilities, such as a correlation, or a linear relationship, or any other aspect of cryptographically-weak randomness. The "true entropy" of a random number generator is just the number of bits in the state of that generator, as opposed to an entropy computation on the sequence it produces. So a high entropy value does not imply that a really-random source really is random, or indeed have any relationship to the amount of cryptographic randomness present.

In statistics and information theory, a particularly "simple" and easily modelled stationary (homogenous) stochastic (random) process (function) in which the "temporal average" is the same as the "ensemble average." In general, a process in which no state is prevented from re-occurring. Ergodic processes are the basis for many important results in information theory, and are thus a technical requirement before those results can be applied.

Here we have all three possible sequences from a non-ergodic process: across we have the average of symbols through time (the "temporal average"), and down we have the average of symbols in a particular position over all possible sequences (the "ensemble average"):

   A B A B A B ...   p(A) = 0.5, p(B) = 0.5, p(E) = 0.0

   B A B A B A ...   p(A) = 0.5, p(B) = 0.5, p(E) = 0.0

   E E E E E E ...   p(A) = 0.0, p(B) = 0.0, p(E) = 1.0

   ^ ^ ^ ^ ^ ^

   +-+-+-+-+-+----   p(A) = 0.3, p(B) = 0.3, p(E) = 0.3

(From:  Pierce, J.  1961.  Symbols, Signals and Noise.  Ch. 3)

When a process is non-ergodic, the measurements we take over time from one or a few sequences may not represent all the sequences which may be encountered.

In a cryptographic context, an extractor is a mechanism which produces the inverse effect of a combiner. This allows data to be enciphered in a combiner, and then deciphered in an extractor. Sometimes an extractor is exactly the same as the combiner, as is the case for exclusive-OR.

A Boolean logic function which is also mod 2 addition. Also called XOR.


The factorial of natural number n, written n!, is the product of all integers from 1 to n.

See the factorials section of the Ciphers By Ritter / JavaScript computation pages.

In the philosophical study of logic, apparently-reasonable arguments which lead to false conclusions. Also see: inductive reasoning and deductive reasoning. Including:
  1. Fallacies of Insufficient Evidence
    1. Accident -- a special circumstance makes a rule inapplicable
    2. Hasty Generalization
    3. non causa pro causa ("False Cause")
      • post hoc ergo propter hoc ("after this therefore because of this")
      • reductio ad absurdum -- the assumption that a particular one of multiple assumptions is necessarily false if the argument leads to a contradiction.
    4. ad ignorantium ("Appeal to Ignorance") -- a belief which is assumed true because it is not proven false.
    5. Card Stacking -- a deliberate withholding of evidence which does not support the author's conclusions.
  2. Fallacies of Irrelevance (ignoratio elenchi) -- ignoring the question
    1. ad hominem ("Name Calling").
    2. ad populum ("Plain Folks") -- an appeal to the prejudices and biases of the audience.
    3. ad misericordiam ("Appeal to Pity")
    4. ad verecundiam ("Inappropriate Authority") -- a testimonial from someone with expertise in a different field.
    5. tu quoque ("You Did It Too").
    6. ad baculum ("Appeal to force") -- e.g., threats.
    7. Red Herring -- information used to throw the discussion off track.
    8. Opposition ("Guilt by Association") -- to condemn an idea because of who is for it.
    9. Genetic -- attacking the source of the idea, rather than the idea itself.
    10. Bandwagon
  3. Fallacies of Ambiguity
    1. Equivocation -- the use of a word in a sense different than that understood by the reader.
    2. Amphiboly -- some sentences admit more than one interpretation.
    3. Accent -- some sentences have different meanings depending on which word is stressed.
    4. Composition -- the implication that what is true of the parts must also be true of the whole.
    5. Division -- the implication that what is true of the whole must be true of its parts.
    6. False Analogy
  4. Fallacies of the Misuse of Logic
    1. petitio principii ("Begging the Question") -- restating one of the premises as the conclusion; assuming the truth of a proposition which needs to be proven.
      • circulus in probando ("Circular Argument")
    2. non sequitur ("Does Not Follow") -- the stated conclusion does not follow from the evidence supplied.
    3. plurimum interrogationum ("Complex Question") -- e.g., "When did you stop beating your wife?"
    4. Garbled Syllogism -- an illogical argument phrased in logical terms.
    5. Either-Or -- assuming a question has only two sides.

Fast Walsh Transform
(Also Walsh-Hadamard transform.) When applied to a Boolean function, a Fast Walsh Transform is essentially a correlation count between the given function and each Walsh function. Since the Walsh functions are essentially the affine Boolean functions, the FWT computes the unexpected distance from a given function to each affine function. It does this in time proportional to n log n, for functions of n bits, with n some power of 2.

If two Boolean functions are not correlated, we expect them to agree half the time, which we might call the "expected distance." When two Boolean functions are correlated, they will have a distance greater or less than the expected distance, and we might call this difference the unexpected distance or UD. The UD can be positive or negative, representing distance to a particular affine function or its complement.

It is easy to do a Fast Walsh Transform by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right will be the first less the second. That is,

   (a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible, as shown:

   original      1   0   0   1   1   1   0   0

                 ^---^   ^---^   ^---^   ^---^

      first      1   1   1  -1   2   0   0   0

                 ^-------^       ^-------^

                     ^-------^       ^-------^

     second      2   0   0   2   2   0   2   0





      final      4   0   2   2   0   0  -2   2

The result is the "unexpected distance" to each affine Boolean function. The higher the absolute value, the greater the "linearity"; if we want the nonlinearity, we must subtract the absolute value of each unexpected distance from the expected value, which is half the number of bits in the function. Note that the range of possible values increases by a factor of 2 (in both positive and negative directions) in each sublayer mixing; this is information expansion, which we often try to avoid in cryptography.

Also see: Walsh-Hadamard Transforms: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and the Active Boolean Function Nonlinearity Measurement in JavaScript page of the Ciphers By Ritter / JavaScript computation pages.

The FWT provides a strong mathematical basis for block cipher mixing such that all input values will have an equal chance to affect all output values. Cryptographic mixing then occurs in butterfly operations based on balanced block mixing structures which replace the simple add / subtract butterfly in the FWT and confine the value ranges so information expansion does not occur. A related concept is the well-known FFT, which can use exactly the same mixing patterns as the FWT.

Feedback with Carry Shift Register. A sequence generator analogous to a LFSR, but separately storing and using a "carry" value from the computation.

Feistel Construction
The Feistel construction is the widely-known method of constructing block ciphers used in DES. Horst Feistel worked for IBM in the 60's and 70's, and was awarded a number of crypto patents, including: 3,768,359, 3,768,360, and 4,316,055.

Normally, in a Feistel construction, the input block is split into two parts, one of which drives a transformation whose result is exclusive-OR combined into the other block. Then the "other block" value feeds the same transformation, whose result is exclusive-OR combined into the first block. This constitutes 2 of perhaps 16 "rounds."

     L          R

     |          |

     |--> F --> +    round 1

     |          |

     + <-- F <--|    round 2

     |          |

     v          v

     L'         R'

One advantage of the Feistel construction is that the transformation does not need to be invertible. To reverse any particular layer, it is only necessary to apply the same transformation again, which will undo the changes of the original exclusive-OR.

A disadvantage of the Feistel construction is that diffusion depends upon the internal transformation. There is no guarantee of overall diffusion, and the number of rounds required is often found by experiment.

Fenced DES
A block cipher with three layers, in which the outer layers consist of fencing tables, and the inner layer consists of DES used as a component. For block widths over 64 bits, Balanced Block Mixing technology assures that any bit change is propagated to each DES operation.

Also see the Fenced DES section of the Ciphers By Ritter page, and A Keyed Shuffling System for Block Cipher Cryptography.

Fencing is a term-of-art which describes a layer of substitution tables. In schematic or data-flow diagrams, the row of tiny substitution boxes stands like a picket fence between the data on each side.

Fencing Layer
A fencing layer is a variable size block cipher layer composed of small (and therefore realizable) substitutions. Typically the layer contains many separate keyed substitution tables. To make the layer extensible, either the substitutions can be re-used in some order, or in some pre-determined sequence, or the table to be used at each position selected by some computed value.

Fencing layers are also used in other types of cipher.

Fast Fourier Transform. A numerically advantageous way of computing a Fourier transform. Basically a way of transforming information from amplitude values sampled periodically through time, into amplitude values sampled periodically through complex frequency. The FFT performs this transformation in time proportional to n log n, for some n a power of 2.

While exceedingly valuable, the FFT tends to run into practical problems in use which can require a deep understanding of the process. For example, the transform assumes that the waveform is "stationary" and thus repetitive and continuous, which is rarely the case. As another example, sampling a continuous wave can create spurious "frequency" values related to the sampling and not the wave itself. Also the range of possible values increases by a factor of 2 (in both positive and negative directions) in every sublayer mixing; this is information expansion, which we often try to avoid in cryptography.

The FFT provides a strong mathematical basis for block cipher mixing such that all input values will have an equal chance to affect all output values. Cryptographic mixing then occurs in butterfly operations based on balanced block mixing structures which replace the simple add / subtract butterfly in the FFT and confine the value ranges so information expansion does not occur. A related concept is the fast Walsh-Hadamard transform (FWT), which can use exactly the same mixing patterns as the FFT.

In abstract algebra, a commutative ring in which all non-zero elements have a multiplicative inverse. (This means we can divide.)

In general, a field supports the four basic operations (addition, subtraction, multiplication and division), and satisfies the normal rules of arithmetic. An operation on any two elements in a field is a result which is also an element in the field.

Examples of fields include rings of integers modulo some prime. Here are multiplication tables under mod 2, mod 3 and mod 4:

        0  1             0  1  2             0  1  2  3

    0   0  0        0    0  0  0        0    0  0  0  0

    1   0  1        1    0  1  2        1    0  1  2  3

                    2    0  2  1        2    0  2  0  2

                                        3    0  3  2  1

In a field, each element must have an inverse, and the product of an element and its inverse is 1. This means that every non-zero row and column of the multiplication table for a field must contain a 1. Since row 2 of the mod 4 table does not contain a 1, the set of integers mod 4 is not a field.

The order of a field is the number of elements in that field. The integers mod p form a finite field of order p. Similarly, mod 2 polynomials will form a field with respect to an irreducible polynomial, and will have order 2n, which is a very useful size.

Finite Field
A Galois field: A mathematical field of non-infinite order. As opposed to an infinite field, such as the integers, rationals, reals and complex numbers.

  • In a finite field, every nonzero element x can be squared, cubed, and so on, and at some power will eventually become 1. The smallest (positive) power n at which xn = 1 is the order of element x. This of course makes x an "nth root of unity," in that it satisfies the equation xn = 1.
  • A finite field of order q will have one or more primitive elements a whose order is q-1 and whose powers cover all nonzero field elements.
  • For every element x in a finite field of order q, xq = x.

A class of digital logic component which has a single bit of state with various control signals to effect a state change. There are several common versions:

  • Latch -- the output follows the input, but only while the clock input is "1"; lowering the clock prevents the output from changing.

  • SR FF -- Set / Reset; typically created by cross-connecting two 2-input NAND gates, in which case the inputs are complemented: a "0" on the S input forces a stable "1" state, which is held until a "0" on the R input forces a "0".

  • D or "delay" FF -- senses the input value at the time of a particular clock transition.

  • JK FF -- the J input is an AND enable for a clocked or synchronous transition to "1"; the K input is an AND enable for a clocked transition to "0"; and often there are S and R inputs to force "1" or "0" (respectively) asynchronously.

Fourier Series
An infinite series in which the terms are constants (A, B) multiplied by sine or cosine functions of integer multiples (n) of the variable (x). One way to write this would be:

   f(x) = A0 + SUM (An cos nx + Bn sin nx)

Alternately, over the interval [a, a+2c]:

   f(x) = a0 + SUM ( an cos(n PI x/c) + bn sin(n PI x/c) )

   an = 1/c INTEGRAL[a,a+2c]( f(x) cos(n PI x/c) dx )

   bn = 1/c INTEGRAL[a,a+2c]( f(x) sin(n PI x/c) dx )

Fourier Theorem
Under suitable conditions any periodic function can be represented by a Fourier series. (Various other "orthogonal functions" are now known.)

The use of sine and cosine functions is particularly interesting, since each term represents a single frequency oscillation. So to the extent that we can represent an amplitude waveform as a series of sine and cosine functions, we thus describe the frequency spectrum associated with that waveform. This frequency spectrum describes the frequencies which must be handled by a circuit to reproduce the original waveform. This illuminating computation is called a Fourier transform.

Fourier Transform
The Fourier transform relates amplitude samples at periodic discrete times to amplitude samples at periodic discrete frequencies. There are thus two representations: the amplitude vs. time waveform, and the amplitude vs. complex frequency (magnitude and phase) spectrum. Exactly the same information is present in either representation, and the transform supports converting either one into the other. This computation is efficiently performed by the FFT.

In a cryptographic context, one of the interesting parts of the Fourier transform is that it represents a thorough mixing of each input value to every output value.

The number of repetitions or cycles per second. Now measured in Hertz (Hz); previously called cycles-per-second (cps).

A mapping; sometimes specifically confined to numbers.

Fast Walsh Transform.


The amplitude change due to amplification. A negative gain is in fact a loss.

Galois Field
Finite field. First encountered by the 19-year-old student Evariste Galois, in 1830 France, a year or so before dying in a duel.

A digital logic component which is a simple logic function, possibly with a complemented output. Some common Boolean logic gates include:

GF 2n
The Galois field or finite field of 2n polynomials of degree n-1 or less.

Typically we have mod 2 polynomials with results reduced "modulo" an irreducible "generator" polynomial g of degree n. This is analogous to creating a field from the integers modulo some prime p.

For example, consider GF(24) using the generator polynomial x4 + x + 1, or 10011, which is a degree-4 irreducible. First we multiply two elements as usual:

          1 0 1 1

        * 1 1 0 0




      1 0 1 1

    1 0 1 1


    1 1 1 0 1 0 0

Then we "reduce" the result modulo the generator polynomial:

                        1 1 0


    1 0 0 1 1 ) 1 1 1 0 1 0 0

                1 0 0 1 1


                  1 1 1 0 0

                  1 0 0 1 1


                    1 1 1 1 0

                    1 0 0 1 1


                        1 1 0 1


So, if I did the arithmetic right, the result is the remainder, 1101. I refer to this as arithmetic "mod 2, mod p".

An irreducible is sufficient to form a finite field. However, some special irreducibles are also primitive, and these create "maximal length" sequences in LFSR's.

Goodness of Fit
In statistics, a test used to compare two distributions. For nominal or "binned" measurements, a chi-square test is common. For ordinal or ordered measurements, a Kolmogorov-Smirnov test is appropriate.

Goodness-of-fit tests can at best tell us whether one distribution is or is not the same as the other, and they say even that only with some probability. It is important to be very careful about experiment design, so that, almost always, "nothing unusual found" is the goal we seek. When we can match distributions, we are obviously able to state exactly what the experimental distribution should be and is. But there are many ways in which distributions can differ, and simply finding a difference is not evidence of a specific effect. (See null hypothesis.)

In abstract algebra, a nonempty set G with one dyadic (two-input, one-output) operation which we choose to call "multiplication" and denote * as usual. If elements (not necessarily numbers) a, b are in R, then ab (or a*b) is also in R. The following properties hold:
  1. Multiplication is associative: (ab)c = a(bc)
  2. There is a multiplicative identity: for e in G, ea = ae = a
  3. There is a multiplicative inverse: for a in G, there is an a-1 in G such that a-1a = e = aa-1

A group is basically a mapping from two elements in the group, through the group operation m, into the same group:

     m:G x G -> G

Hamming Distance

A measure of the difference or "distance" between two binary sequences of equal length; in particular, the number of bits which differ between the sequences. This is the weight or the number of 1-bits in the exclusive-OR of the two sequences.

The physical realization of computation. Typically, the electronic digital logic, power supply, and various electro-mechanical components such as disk drives, switches, and possibly relays which make up a computer or other digital system. As opposed to software. See system design and debug.

A classic computer operation which forms a fixed-size result from an arbitrary amount of data. Ideally, even the smallest change to the input data will change about half of the bits in the result. Often used for table look-up, so that very similar language terms or phrases will be well-distributed throughout the table. Also often used for error-detection, and, known as a message digest, authentication.

A hash of data will produce a particular hash value, which then can be included in the message before it is sent (or stored). When the data are received (or read) and the hash value computed, this should match the included hash value. So if the hash is different, something has changed, and the usual solution is to request the data be sent again. But the hash value is typically much smaller than the data, so there must be "many" different data sets which will produce that same value. This means that "error detection" inherently cannot detect all possible errors, and this is quite independent of any "linearity" in the hash computation.

An excellent example of a hash function is a CRC operation. CRC is a linear function without cryptographic strength, but does have a strong mathematical basis which is lacking in ad hoc methods. Strength is not needed when keys are processed into the state used in a random number generator, because if either the key or the state becomes known, the keyed cipher has been broken.

In contrast, a cryptographic hash function must be "strong" in the sense that it must be "computationally infeasible" to find two input values which produce the same hash result. In general, this means that the hash result should be 128 bits or more in size.

Sometimes a cryptographic hash function is described as being "collision free," which is a misnomer. A collision occurs when two different texts produce exactly the same hash result. Given enough texts, collisions will of course occur, precisely because any fixed-size result has only so many possible code values. The intent is that collisions be hard to find and particular hash values impossible to create at will.

Hexadecimal (Hex)
Base 16. The numerical representation in which each digit has an alphabet of sixteen symbols, generally 0 through 9, plus A through F, or "a" through "f".

Each hex value represents exactly four bits, which can be particularly convenient. Also see: binary, octal, and decimal.

Greek for "the same sound." The concept of having different letter sequences which are pronounced alike. In cryptography, a cipher which translates a single plaintext symbol into any one of multiple ciphertext symbols which all have the same meaning. Also see polyphonic, polygraphic and monographic.

Homophonic Substitution
A type of substitution in which an original symbol is replaced by any one of multiple unique symbols. Intended to combat the property of simple substitution in which the most-frequent symbols in the plaintext always produce the most-frequent symbols in the ciphertext.

A form of homophonic substitution is available in a large block cipher, where a homophonic selection field is enciphered along with the plaintext. Any of the possible values for that field naturally will produce a unique ciphertext. After deciphering any of those ciphertexts, the homophonic selection field could be deleted, and the exact same plaintext recovered. Note that the ability to produce a multitude of different encipherings for exactly the same data is related to the concept of a key.


The secret key block cipher used in PGP. Designed by James Massey and Xuejia Lai in several installments, called PES, IPES and IDEA. It is round-based, with a 64-bit block size, a 128-bit key, and no internal tables.

The disturbing aspect of the IDEA design is the extensive use of almost linear operations, and no nonlinear tables at all. While technically nonlinear, the internal operations seem like they might well be linear enough to be attacked.

Ideal Secrecy
The strength delivered by even a simple cipher when each and every plaintext is equally probable and independent of every other plaintext.

There are various examples:

  • The use of CBC mode in DES: By making every plaintext block equally probable, DES is greatly strengthened against codebook attack.
  • The transmission of random message key values: To the extent that every value is equally probable, even a very simple cipher is sufficient to protect those values.
  • The use of a keyed simple substitution of the ciphertext to add strength, as used in the Penknife stream cipher design.
  • The use of data compression to reduce the redundancy in a message before ciphering: This of course can only reduce language redundancy. (Also, many compression techniques send pre-defined tables before the data and so are not suitable in this application.)

Also see: perfect secrecy. From Claude Shannon.

In statistics: Independent, Identically Distributed. Generally related to the random sampling of a single distribution.

Inductive Reasoning
In the study of logic, reasoning from the observation of some particular cases to produce a general statement. While often incorrect, inductive reasoning does provide a way to go beyond known truth to new statements which may then be tested. And certain types of inductive reasoning can be assigned a correctness probability using statisticical techniques. Also see: deductive reasoning and fallacy.

A basic electronic component which acts as a reservoir for electrical power in the form of current. An inductor thus acts to "even out" the current flowing through it, and to "emphasize" current changes across the terminals. An inductor conducts DC and opposes AC in proportion to frequency. Inductance is measured in Henrys: A voltage of 1 Volt across an inductance of 1 Henry produces a current change of 1 Ampere per Second through the inductor.

Typically a coil or multiple turns of conductor wound on a magnetic or ferrous core. Current in the conductor creates a magnetic field, thus "storing" charge. When power is removed, the magnetic field collapses to maintain the current flow; this can produce high voltages, as in automobile spark coils.

Also see capacitor and resistor.

One-to-one. A mapping f: X -> Y where no two values x in X produce the same result f(x) in Y. A one-to-one mapping is invertible for those values of X which produce unique results f(x), but there may not be a full inverse mapping g: Y -> X.

A material in which electron flow is difficult or impossible. Classically air or vacuum, or wood, paper, glass, ceramic, plastic, etc. As opposed to a conductor.

An element in the set consisting of counting numbers: 1, 2, 3, ..., their negatives: -1, -2, -3, ..., and zero.

Intermediate Block
In the context of a layered block cipher, the data values produced by one layer then used by the next.

In some realizations, an intermediate block might be wired connections between layer hardware. In the context of a general purpose computer, an intermediate block might represent the movement of data between operations, or perhaps transient storage in the original block.

In statistics, measurements in which the numerical value has meaning. Also see: nominal, and ordinal.

A mapping f: X -> Y which only partially covers Y. An inverse mapping g: Y -> X may not exist if, for example, multiple elements x in X produce the same f(x) in Y.

   +----------+          +----------+

   |          |   INTO   | Y        |

   |    X     |          |   +----+ |

   |          |    f     |   |f(X)| |

   |          |   --->   |   +----+ |

   +----------+          +----------+

A mapping or function g(y) or f -1(y), related to some function f(x) such that for each x in X:
   g(f(x)) = x = f-1(f(x)).

Only functions which are one-to-one can have an inverse.

A mapping or function which has an inverse. A transformation which can be reversed.

A type of mapping which is a self-inverse.

A cipher which takes plaintext to ciphertext, and ciphertext back to plaintext, using the exact same operation.

A polynomial only evenly divisible by itself and 1. The polynomial analogy to integer primes. Often used to generate a residue class field for polynomial operations.

A polynomial form of the ever-popular "Sieve of Eratosthenes" can be used to build table of irreducibles through degree 16. That table can then be used to check any potential irreducible through degree 32. While slow, this can be a simple, clear validation of other techniques.

Also see primitive polynomial.

"Initial value," "initializing value" or "initialization vector." An external value needed to start off cipher operations. Most often associated with CBC mode.

An IV often can be seen as a design-specific form of message key. Sometimes, iterative ciphering under different IV values can provide sufficient keying to perform the message key function.

Generally, an IV must be accompany the ciphertext, and so always expands the ciphertext by the size of the IV.


A particular cryptographic mechanism intended to complicate the sequence produced by a linear random number generator by deleting elements from the sequence at pseudo-random.

The name is taken from the use of an oscilloscope on digital circuits, where a signal which is not "in sync" is said to "jitter." Mechanisms designed to restore synchronization are called "synchronizers," so mechanisms designed to cause jitter can legitimately be called "jitterizers."


Kilobyte. 210 or 1024 bytes.

Kilobit. 210 or 1024 bits.

Kerckhoff's Requirements
General cryptosystem requirements formulated in 1883 (from the Handbook of Applied Cryptography):

  1. The system should be, if not theoretically unbreakable, unbreakable in practice. (Of course there are no realized systems which are "theoretically unbreakable," but there is also little point in using a known breakable cipher.)
  2. Compromise of the system details should not inconvenience the correspondents. (Nowadays we generally assume that the Opponent will have full details of the cipher, since, for a cipher to be widely used, it must be present at many locations and is therefore likely to be exposed. We also assume that the Opponent will have some amount of known-plaintext to work with.)
  3. The key should be rememberable without notes and easily changed. (This is still an issue. Hashing allows us to use long language phrases, but the best approach may someday be to have both a hardware key card and a key phrase.)
  4. The cryptogram should be transmissible by telegraph. (This is not very important nowadays, since even binary ciphertext can be converted into ASCII for transmission if necessary.)
  5. The encryption apparatus should be portable and operable by a single person. (Software encryption approaches this ideal.)
  6. The system should be easy, requiring neither the knowledge of a long list of rules nor mental strain. (Software encryption has the potential to approach this, but often fails to do so. We might think of the need to certify public keys, which is still often left up to the user, and thus often does not occur.)

The general concept of protecting things with a "lock," thus making those things available only if one has the correct "key." In a cipher, the ability to select a particular transformation between a plaintext message and a corresponding ciphertext. By using a particular key, we can create any one of many different ciphertexts for the exact same message. And if we know the correct key, we can transform the ciphertext back into the original message. By supporting a vast number of different key possibilities (a large keyspace), we hope to make it impossible for someone to decipher the message by trying every key in a brute force attack.

In cryptography we have various kinds of keys, including a User Key (the key which a user actually remembers), which may be the same as an Alias Key (the key for an alias file which relates correspondent names with their individual keys). We may also have an Individual Key (the key actually used for a particular correspondent); a Message Key (normally a random value which differs for each and every message); a Running Key (the confusion sequence in a stream cipher, normally produced by a random number generator); and perhaps other forms of key as well.

In general, the value of a cryptographic key is used to initialize the state of a cryptographic mechanism.

Ideally, a key will be a equiprobable selection among a huge number of possibilities. This is the fundamental strength of cryptography, the "needle in a haystack" of false possibilities. But if a key is in some way not a random selection, but is instead biased, the most-likely keys can be examined first, thus reducing the complexity of the search and the effective keyspace.

In most cases, a key will exhibit diffusion across the message; that is, changing even one bit of a key should change every bit in the message with probability 0.5. A key with lesser diffusion may succumb to some sort of divide and conquer attack.

Key Distribution Problem
The problem of distributing keys to both ends of a communication path, especially in the case of secret key ciphers, since secret keys must be transported and held in absolute secrecy. Also the problem of distributing vast numbers of keys, if each user is given a separate key.

Although this problem is supposedly "solved" by the advent of the public key cipher, in fact, the necessary public key validation is almost as difficult as the original problem. Although public keys can be exposed, they must represent who they claim to represent, or a "spoofer" or man-in-the-middle can operate undetected.

Nor does it make sense to give each individual a separate secret key, when a related group of people would have access to the same files anyway. Typically, a particular group has the same secret key, which will of course be changed when any member leaves. Typically, each individual would have a secret key for each group with whom he or she associates.

The number of distinct key-selected transformations supported by a particular cipher. Normally described in terms of bits, as in the number of bits needed to count every distinct key. This is also the amount of state required to support a state value for each key. The keyspace in bits is the log2 (the base-2 logarithm) of the number of different keys, provided that all keys are equally probable.

Cryptography is based on the idea that if we have a huge number of keys, and select one at random, The Opponents generally must search about half of the possible keys to find the correct one; this is a brute force attack.

Although brute force is not the only possible attack, it is the one attack which will always exist. Therefore, the ability to resist a brute force attack is normally the "design strength" of a cipher. All other attacks should be made even more expensive. To make a brute force attack expensive, a cipher simply needs a keyspace large enough to resist such an attack. Of course, a brute force attack may use new computational technologies such as DNA or "molecular computation." Currently, 120 bits is large enough to prevent even unimaginably large uses of such new technology.

It is probably just as easy to build efficient ciphers which use huge keys as it is to build ciphers which use small keys, and the cost of storing huge keys is probably trivial. Thus, large keys may be useful when this leads to a better cipher design, perhaps with less key processing. Such keys, however, cannot be considered better at resisting a brute force attack than a 120-bit key, since 120 bits is already sufficient.

Keyed Substitution
Two substitution tables of the same size with the same values can differ only in the ordering or permutation of the values in the tables. A huge keying potential exists: The typical "n-bit-wide" substitution table has 2n elements, and (2n)! ("two to the nth factorial") different permutations or key possibilities. A single 8-bit substitution table has a keyspace of 1648 bits.

A substitution table is keyed by creating a particular ordering from each different key. This can be accomplished by shuffling the table under the control of a random number generator which is initialized from the key.

Known Plaintext Attack
A type of attack in which the cryptanalyst has some quantity of related plaintext and ciphertext. This allows the ciphering transformation to be examined directly.

A known plaintext attack is especially dangerous to the usual stream cipher which has an additive combiner, because the known plaintext can be "subtracted" from the ciphertext, thus completely exposing the confusion sequence. This is the sequence produced by the cryptographic random number generator, and can be used to attack that generator. This sort of attack can generally be prevented by using a Dynamic Substitution Combiner instead of the usual additive combiner.

It is surprisingly reasonable that The Opponent might well have some known plaintext (and related ciphertext): This might be the return address on a letter, a known report, or even some suspected words. Sometimes the cryptosystem will carry unauthorized messages like birthday greetings which are then exposed, due to their apparently innocuous content.

In statistics, a goodness of fit test used to compare two distributions of ordinal data, where measurements may be re-arranged and placed in order. Also see chi-square.

  • n independent samples are collected and arranged in numerical order in array X as x[0]..x[n-1].
  • S(x[j]) is the fraction of the n observations which are less than or equal to x[j]; in the ordered array this is just ((j+1)/n).
  • F(x) is the reference cumulative distribution, the probability that a random value will be less than or equal to x. Here we want F(x[j]), the fraction of the distribution to the left of x[j] which is a value from the array.

The "one-sided" statistics are:

   K+ = SQRT(N) * MAX( S(x[j]) - F(x[j]) )

      = SQRT(N) * MAX( ((j+1)/n) - F(x[j]) )

   K- = SQRT(N) * MAX( F(x[j]) - S(x[j]) )

      = SQRT(N) * MAX( F(x[j]) - (j/n) )

And the "two-sided" KS statistic is:

   K = SQRT(N) * MAX( ABS( S(x[j]) - F(x[j]) ) )

     = MAX( K+, K- )

It appears that the "one-sided" KS distribution is far easier to compute precisely, and may be preferred on that basis.

See the Kolmogorov-Smirnov section of the Ciphers By Ritter / JavaScript computation pages.


A form of delay. Typically a hardware term, latency often refers to the time need to perform an operation. In the past, operation delay has largely been dominated by the time taken for gate switching transistors to turn on and off. Currently, operation delay is more often dominated by the time it takes to transport the electrical signals to and from gates on long, thin conductors.

The effect of latency on throughput can often be reduced by pipelining or partitioning the main operation into many small sub-operations, and running each of those in parallel, or at the same time. As each operation finishes, that result is latched and saved temporarily, pending the availability of the next sub-operation hardware. The result is throughput limited only by the longest sub-operation instead of the overall operation.

Latin Square
A Latin square of order n is an n by n array containing symbols from some alphabet of size n, arranged such that each symbol appears exactly once in each row and exactly once in each column. Also see Latin square combiner and orthogonal Latin squares.

   2  0  1  3

   1  3  0  2

   0  2  3  1

   3  1  2  0

Also see: Latin Squares: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.

Latin Square Combiner
A cryptographic combining mechanism in which one input selects a column and the other input selects a row in an existing Latin square; the value of the selected element is the combiner result.

A Latin square combiner is inherently balanced, because for any particular value of one input, the other input can produce any possible output value. A Latin square can be treated as an array of substitution tables, each of which are invertible, and so can be reversed for use in a suitable extractor. As usual with cryptographic combiners, if we know the output and a specific one of the inputs, we can extract the value of the other input.

For example, a tiny Latin square combiner might combine two 2-bit values each having the range zero to three (0..3). That Latin square would contain four different symbols (here 0, 1, 2, and 3), and thus be a square of order 4:

   2  0  1  3

   1  3  0  2

   0  2  3  1

   3  1  2  0

With this square we can combine the values 0 and 2 by selecting the top row (row 0) and the third column (column 2) and returning the value 1.

When extracting, we will know a specific one (but only one) of the two input values, and the result value. Suppose we know that row 0 was selected during combining, and that the output was 1: We can check for the value 1 in each column at row 0 and find column 2, but this involves searching through all columns. We can avoid this overhead by creating the row-inverse of the original Latin square (the inverse of each row), in the well-known way we would create the inverse of any invertible substitution. For example, in row 0 of the original square, selection 0 is the value 2, so, in the row-inverse square, selection 2 should be the value 0, and so on:

   1  2  0  3

   2  0  3  1

   0  3  1  2

   3  1  2  0

Then, knowing we are in row 0, the value 1 is used to select the second column, returning the unknown original value of 2.

A practical Latin square combiner might combine two bytes, and thus be a square of order 256, with 65,536 byte entries. In such a square, each 256-element column and each 256-element row would contain each of the values from 0 through 255 exactly once.

In the context of block cipher design, a layer is particular transformation or set of operations applied across the block. In general, a layer is applied once, and different layers have different transformations. As opposed to rounds, where a single transformation is repeated in each round.

Layers can be confusion layers (which simply change the block value), diffusion layers (which propagate changes across the block in at least one direction) or both. In some cases it is useful to do multiple operations as a single layer to avoid the need for internal temporary storage blocks.

Linear Feedback Shift Register.

Like a line; having an equation of the form ax + b .

There are various ways a relationship can be linear. One way is to consider a, x, and b as integers. Another is for them to be polynomial elements of GF(2n). Yet another is to consider a to be an n by n matrix, with x and b as n-element vectors. There are probably various other ways as well.

Linearity also depends upon our point of view: For example, integer addition is linear in the integers, but when expressed as mod 2 operations, the exact same computation producing the exact same results is not considered linear.

In cryptography the issue may not be as much one of strict mathematical linearity as it is the "distance" between a function and some linear approximation (see Boolean function nonlinearity). True linear functions are used because they are easy and fast, but they are also exceedingly weak. Of course XOR is linear and trivial, yet is used all the time in arguably strong ciphers. But a design using linear components must have other nonlinear components to provide strength.

Linear Complexity
The length of the shortest Linear Feedback Shift Register which can produce a given sequence.

Also see: Linear Complexity: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.

Linear Feedback Shift Register
An efficient structure for producing sequences, often used in random number generator applications.

In an n-element shift register (SR), if the last element is connected to the first element, a set of n values can circulate around the SR in n steps. But if the values in two of the elements are combined by exclusive-OR and that result connected to the first element, it is possible to get an almost-perfect maximal length sequence of 2n-1 steps. (The all-zeros state will produce another all-zeros state, and so the system will "lock up" in a degenerate cycle.) Because there are only 2n different states of n binary values, every state value but one must occur exactly once, which is a statistically-satisfying result. Moreover, the values so produced are a perfect permutation of the "counting" numbers (1..2n-1).

     A Linear Feedback Shift Register

         +----+    +----+    +----+    +----+    +----+  "a0"

     +-<-| a5 |<---| a4 |<-*-| a3 |<---| a2 |<---| a1 |<--+

     |   +----+    +----+  | +----+    +----+    +----+   |

     |                     v                              |

     +------------------> (+) ----------------------------+

       1         0         1         0         0         1

In the figure we have a LFSR of degree 5, consisting of 5 storage elements a[5]..a[1] and the feedback computation a[0]=a[5]+a[3]. The stored values may be bits and the operation (+) addition mod 2. A clock edge will simultaneously shift all elements left, and load element a[1] with the feedback result as it was before the clock changed the register. Each SR element is just a time-delayed replica of the element before it, and here the element subscript conveniently corresponds to the delay. We can describe this logically:

   a[1][t+1] = a[5][t] + a[3][t];

   a[2][t+1] = a[1][t];

   a[3][t+1] = a[2][t];

   a[4][t+1] = a[3][t];

   a[5][t+1] = a[4][t];

Normally the time distinction is ignored, and we can write more generally, for some feedback polynomial C and state polynomial A of degree n:


   a[0] = SUM  c[i]*a[i]


The feedback polynomial shown here is 101001, a degree-5 poly running from c[5]..c[0] which is also irreducible. Since we have degree 5 which is a Mersenne prime, C is also primitive. So C produces a maximal length sequence of exactly 31 steps, provided only that A is not initialized as zero. Whenever C is irreducible, the reversed polynomial (here 100101) is also irreducible, and will also produce a maximal length sequence.

LFSR's are often used to generate the confusion sequence for stream ciphers, but this is very dangerous: LFSR's are inherently linear and thus weak. Knowledge of the feedback polynomial and only n element values (from known plaintext) is sufficient to run the sequence backward or forward. And knowledge of only 2n elements is sufficient to develop an unknown feedback polynomial. This means that LFSR's should not be used as stream ciphers without in some way isolating the sequence from analysis. Also see jitterizer and additive RNG.

Linear Logic Function
A Boolean switching or logic function which can be realized using only XOR and AND types of functions, which correspond to addition mod 2 and multiplication mod 2, respectively.

A branch of philosophy related to distinguishing between correct and incorrect reasoning. Even an invalid argument can sometimes produce a correct conclusion. But a valid argument must always produce a correct conclusion.

Also devices which realize symbolic logic, such as Boolean logic, a logic of TRUE or FALSE values. Also see: subjective, objective, contextual, absolute, inductive reasoning, deductive reasoning, and fallacy.

Logic Function
Fundamental digial logic operations. The fundamental two-input (dyadic) one-output Boolean functions are AND and OR. The fundamental one-input (monadic) one-output operation is NOT. These can be used in various ways to build exclusive-OR (XOR), which is also widely used as a fundamental function. Here we show the truth tables for the fundamental functions:


     0      1

     1      0

  INPUT    AND   OR    XOR

   0 0      0     0     0

   0 1      0     1     1

   1 0      0     1     1

   1 1      1     1     0

These Boolean values can be stored as a bit, and can be associated with 0 or 1, FALSE or TRUE, NO or YES, etc.

Least-Significant Bit. Typically the rightmost bit.


A maximal length shift register sequence.

Machine Language
Also "machine code." A computer program in the form of the numeric values or "operation codes" ("opcodes") which the computer can directly execute as instructions, commands, or "orders." Thus, the very public code associated with the instructions available in a particular computer. Also the programming of a computer at the bit or hexadecimal level, below even assembly language. Also see source code and object code.

Magnetic Field
The fundamental physical force resulting from moving charges. Also see: electromagnetic field.

Man-in-the-Middle Attack
The original model used to analyze cryptosystems assumed that an Opponent could listen to the ciphertext traffic, and perhaps even interfere with it, but not that messages could be intercepted and completely hidden. Unfortunately, this is in fact the situation in a store-and-forward computer network like the Internet. Routing is not secure on the Internet, and it is at least conceivable that messages between two people could be routed through connections on the other side of the world. This might be exploited to make such messages flow through a particular computer for special processing.

The Man-in-the-Middle (MITM) Attack is mainly applicable to public key systems, and focuses on the idea that many people will send their public keys on the network. The bad part of this is a lack of key authentication, because the Man-in-the-Middle can send a key just as easily, and pretend to be the other end. Then, if one uses that key, one has secure communication with The Opponent, instead of the far end. The MITM can receive a message, decipher it, read it, re-encipher it in the correct public key, and send it along. In this way, neither end need know anything is wrong, yet The Opponent is reading the mail.

Perhaps the worst part of this is that a successful MITM attack does not involve any attack on the actual ciphering. And this means that all proofs or confidence in the security of particular ciphering mechanisms is totally irrelevant to the security of a system which supports MITM attacks.

The way to avoid MITM attacks is to certify public keys, but this is inconvenient and time-consuming. Unless the cipher requires keys to be certified, this is rarely done. The worst part of this is that a successful MITM attack consumes few resources, need not "break" the cipher itself, and may provide just the kind of white-collar desktop intelligence a bureaucracy would love.

It is interesting to note that, regardless of how inconvenient it may be to share keys for a secret-key cipher, this is an inherent authentication which prevents MITM attacks.

Given sets X and Y, and operation f

     f: X -> Y  ,

the mapping or function or transformation f takes any value in the domain X into some value in the range, which is contained in Y. For each element x in X, a mapping associates a single element y in Y. Element f(x) in Y is the image of element x in X.

  • If f(X) covers all elements in Y, f is a mapping of X onto Y, and is surjective.

  • If f(X) only partially covers Y, f is a mapping of X into Y.

If no two values of x in X produce the same result f(x), f is one-to-one or injective.

Markov Process
In statistics, a stochastic (random) process (function) in which all possible outcomes are defined by the current state, independent of all previous states. Also see: stationary process.

Mathematical Cryptography
Cryptography based on mathematical operations, such as taking extremely large values to extremely large powers, modulo the product of two primes. Normally heavily involved with number theory. As opposed to mechanistic cryptography.

There are some problems with a strictly mathematical approach to cryptography:

  1. Mathematical symbology has evolved for concise expression. It is thus not "isomorphic" to the complexity of the implementation, and so is not a good vehicle for the design-time trade-off of computation versus strength.
  2. Most mathematical operations are useful or "beautiful" relationships specifically intended to support understanding in either direction, as opposed to relationships which might be particularly difficult to reverse or infer. So when using the traditional operations for cryptography, we must first defeat the very properties which made these operations so valuable in their normal use.
  3. Mathematics has evolved to produce, describe and expose structure, as in useful or "beautiful" large-scale relationships and groupings. But, in a sense, relationships and groupings are the exact opposite of the fine-grained completely random mappings that cryptography would like to see. Such mappings are awkward to express mathematically, and contain little of the structure which mathematics is intended to describe.
  4. There may be an ingrained tendency in math practitioners, based on long practice, to construct math-like relationships, and such relationships are not desirable in this application. So when using math to construct cryptography, we may first have to defeat our own training and tendencies to group, understand and simplify.

On the other hand, mathematics is irreplaceable in providing the tools to pick out and describe structure in apparently strong cipher designs. Mathematics can identify specific strength problems, and evaluate potential fixes. But there appears to be no real hope of evaluating strength with respect to every possible attack, even using mathematics.

Although mathematical cryptography has held out the promise of providing provable security, in over 50 years of work, no practical cipher has been generally accepted as having proven strength. See, for example: one time pad.

Megabyte. 220 or 1,048,576 bytes.

Megabit. 220 or 1,048,576 bits.

Maximal Length
A linear feedback shift register (LFSR) sequence of 2n-1 steps (assuming a bit-wide shift register of n bits. This means that every binary value the register can hold, except zero, will occur on some step, and then not occur again until all other values have been produced. A maximal-length LFSR can be considered a binary counter in which the count values have been shuffled or enciphered. And while the sequence from a normal binary counter is perfectly balanced, the sequence from a maximal-length LFSR is almost perfectly balanced. Also see M-sequence.

The logical concept of a machine, which may be realized either as a physical machine, or as a sequence of logical commands executed by a physical machine.

A mechanism can be seen as a process or an implementation for performing that process (such as electronic hardware, computer software, hybrids, or the like).

Mechanistic Cryptography
Cryptography based on mechanisms, or machines. As opposed to mathematical cryptography.

Although perhaps looked down upon by those of the mathematical cryptography persuasion, mechanistic cryptography certainly does use mathematics to design and predict performance. But rather than being restricted to arithmetic operations, mechanistic cryptography tends to use a wide variety of mechanically-simple components which may not have concise mathematical descriptions. Rather than simply implementing a system of math expressions, complexity is constructed from the various efficient components available to digital computation.

Mersenne Prime
A prime p for which 2p - 1 is also prime. For example, 5 is a Mersenne prime because 25 - 1 = 31, and 31 is prime. For mod 2 polynomials of Mersenne prime degree, every irreducible is also primitive.

Mersenne Primes:

    2         107          9689         216091

    3         127          9941         756839

    5         521         11213         859433

    7         607         19937        1257787

   13        1279         21701        1398269

   17        2203         23209

   19        2281         44497

   31        3217         86243

   61        4253        110503

   89        4423        132049

Message Digest
A small value which represents an entire message for purposes of authentication; a hash.

Message Key
A key transported with the message and used for deciphering the message. (The idea of a "session key" is very similar, but lasts across multiple messages.)

Normally, the message key is a large random value which becomes the key for ciphering the data in a single message. Normally, the message key itself is enciphered under the User Key or other key for that link. The receiving end first deciphers the message key, then uses that value as the key for deciphering the message data. Alternately, the random value itself may be sent unenciphered, but is then enciphered or hashed (under a keyed cryptographic hash) to produce a value used as the data ciphering key.

The message key assures that the actual data is ciphered under a key which is an arbitrary selection from a huge number of possible keys; it therefore prevents weakness due to user key selection. A message key is used exactly once, no matter how many times the same message is enciphered, so at most, a successful attack on a message key exposes just one message. The internal construction of a random message key cannot be controlled by a user, and thus prevents all attacks based on repeated ciphering under a single key. To the extent that the message key value really is random and is never exposed on either end, the message key is much more easily protected than ordinary text (see ideal secrecy). In a sense, a message key is the higher-level concept of an IV, which is necessarily distinct for each particular design.

Man In The Middle.

The act of transforming multiple input values into one or more output values, such that changing any input value will change the output value. There is no implication that the result must be balanced, but effective mixing may need to be, in some sense, complete. Also see Mixing Cipher, combiner, Latin square combiner, and Balanced Block Mixing.

Mixing Cipher
A block cipher based on Balanced Block Mixing of small elements in FFT-like or FWT-like mixing patterns.

Below, we have a toy 32-bit-block Mixing Cipher. Plaintext at the top is transformed into ciphertext at the bottom. Each "S" is an 8-bit substitution table, and each table (and now each mixing operation also) is individually keyed.

Horizontal lines connect elements which are to be mixed together: Each *---* represents a single Balanced Block Mixing or BBM. Each BBM takes two elements, mixes them, and returns two mixed values. The mixed results then replace the original values in the selected positions just like the "butterfly" operations used in some FFT's.

   A 32-Bit Mixing Cipher

    |   |   |   |    <- Input Block (Plaintext)

    S   S   S   S    <- Fencing

    |   |   |   |

    *---*   *---*    <- 2 BBM Mixings

    |   |   |   |

    *-------*   |    <- 1 BBM Mixing

    |   *-------*    <- 1 BBM Mixing

    |   |   |   |

    S   S   S   S    <- Fencing

    |   |   |   |

    *-------*   |

    |   *-------*

    |   |   |   |

    *---*   *---*

    |   |   |   |

    S   S   S   S    <- Fencing

    |   |   |   |    <- Output Block (Ciphertext)

By mixing each element with another, and then each pair with another pair and so on, every element is eventually mixed with every other element. Each BBM mixing is dyadic, so each "sub-level" is a mixing of twice as many elements as the sublevel before it. A block of n elements is thus fully mixed in log2 n sublevels, and each result element is equally influenced equally by each and every input element.

The pattern of these mixings is exactly like some implementations of the FFT, and thus the term "FFT-style." Also see the articles in the Mixing Ciphers section on the Ciphers By Ritter pages.

Mod 2
The field formed from the set of integers {0,1} with operations + and * producing the remainder after dividing by modulus 2. Thus:

     0 + 0 = 0

     0 + 1 = 1

     1 + 0 = 1

     1 + 1 = 0

     1 + 1 + 1 = 1

     0 * 0 = 0

     0 * 1 = 0

     1 * 0 = 0

     1 * 1 = 1

Subtraction mod 2 is the same as addition mod 2. The operations + and * can also be considered the logic functions XOR and AND respectively.

Mod 2 Polynomial
A polynomial in which the coefficients are taken mod 2. The four arithmetic operations addition, subtraction, multiplication and division are supported. As usual, mod 2 subtraction is the same as mod 2 addition. Each column of coefficients is added separately, without "carrys" to an adjacent column:

Addition and Subtraction:

       1 0 1 1

     + 0 1 0 1

     + 1 1 0 0


       0 0 1 0


          1 0 1 1

        * 1 1 0 0




      1 0 1 1

    1 0 1 1


    1 1 1 0 1 0 0

Polynomial multiplication is not the same as repeated polynomial addition. But there is a fast approach to squaring mod 2 polynomials:

               a  b  c  d

               a  b  c  d


              ad bd cd dd

           ac bc cc dc

        ab bb cb db

     aa ba ca da


      a  0  b  0  c  0  d

To square a mod 2 polynomial, all we have to do is "insert" a zero between every column. Note that aa = a for a = 0 or a = 1, and ab = ba, so either 0 + 0 = 0 or 1 + 1 = 0.


                    1 0 1 1


    1 1 0 0 ) 1 1 1 0 1 0 0

              1 1 0 0


                  1 0 1 0

                  1 1 0 0


                    1 1 0 0

                    1 1 0 0



The decision about whether the divisor "goes into" the dividend is based exclusively on the most-significant (leftmost) digit. This makes polynomial division far easier than integer division.

Mod 2 polynomials behave much like integers in that one polynomial may or may not divide another without remainder. This means that we can expect to find analogies to integer "primes," which we call irreducible polynomials.

Mod 2 polynomials do not constitute a field; clearly, the size of a multiplication is unbounded. However, a finite field of polynomials can be created by choosing an irreducible modulus polynomial, thus producing a Galois field GF 2n.

One possibility is: block cipher operating mode.

Casually, the remainder after an integer division by a modulus; see congruence. When the modulus is prime, this may generate a useful field.

Relating to monad, which is Greek for single or one. In particular, a function with a single input or argument, also called unary. Also see: dyadic.

Monoalphabetic Substitution
Substitution using a single alphabet. Also called simple substitution. As opposed to Polyalphabetic Substitution.

Greek for "single letter." A cipher which translates one plaintext symbol at a time into ciphertext. As opposed to polygraphic; also see homophonic and polyphonic.

Multiple Encryption
Enciphering or encrypting a message more than once. This usually has the strength advantage of producing a very random-like ciphertext from the first pass, which is of course the "plaintext" for the next pass.

Multiple encryption using different keys can be a way to increase strength. And multiple encryption using different ciphers can reduce the probability of using a single cipher which has been broken in secret. In both cases, the cost is additional ciphering operations.

Unfortunately, multiple encryption using just two (2) ciphers may not be much advantage: If we assume The Opponents know which ciphers are used, they can manipulate both the plaintext and the ciphertext to search for a match (a "meet-in-the-middle" attack strategy). One way to avoid this is to use three (3) cipherings, as in Triple DES.

Multiple encryption also can be dangerous, if a single cipher is used with the same key each time. Some ciphers are involutions which both encipher and decipher with the same process; these ciphers will decipher a message if it is enciphered a second time under the same key. This is typical of classic additive synchronous stream ciphers, as it avoids the need to have separate encipher and decipher operations. But it also can occur with block ciphers operated in stream-cipher-like modes such as OFB, for exactly the same reason.


Originally, a list of transformations from names to symbols or numbers for diplomatic communications. Later, typically a list of transformations from names, polygraphic syllables, and monographic letters, to numbers. Usually the monographic transformations had multiple or homophonic alternatives for frequently-used letters. Generally smaller than a codebook, due to the use of the syllables instead of a comprehensive list of phrases. A sort of early manual cipher with some characteristics of a code, that operated like a codebook.

In statistics, measurements which are in categories or "bins." Also see: ordinal, and interval.

The extent to which a function is not linear. See Boolean function nonlinearity.

A Boolean logic function which is the "complement" or the mod 2 addition of 1.

Null Hypothesis
In statistics, the particular statement or hypothesis H0 which is accepted unless a statistic testing that hypothesis produces evidence to the contrary. Normally, the null hypothesis is accepted when the associated statistical test indicates "nothing unusual found."

The logically contrary alternative hypothesis H1 is sometimes formulated with the specific hope that something unusual will be found, but this can be very tricky to get right. Many statistical tests (such as goodness-of-fit tests) can only indicate whether something matches what we expect, or does not. But any number of things can cause a mismatch, including a fundamentally flawed experiment. A simple mismatch does not normally imply the presence of a particular quality.

Even in the best possible situation, random sampling will produce a range or distribution of test statistic values. Often, even the worst possible statistic value can be produced by an unlucky sampling of the best possible data. It is thus important to know what distribution to expect because of the sampling alone, so if we find a different distribution, that will be evidence supporting the alternative hypothesis H1.

If we collect enough statistic values, we should see them occur in the ideal distribution for that particular statistic. So if we call the upper 5 percent of the distribution "failure" (this is the significance level) we not only expect but in fact require such "failure" to occur about 1 time in 20. If it does not, we will in fact have detected something unusual, something which might even indicate problems in the experimental design.

If we have only a small number of samples, and do not run repeated trials, a relatively few chance events can produce an improbable statistic value, which might cause us to reject a valid null hypothesis, and so commit a type I error.

On the other hand, if there is a systematic deviation in the underlying distribution, only a very specific type of random sampling could mask that problem. With few samples and trials, though, the chance random masking of a systematic problem is still possible, and could lead to a type II error.

Object Code

Typically, machine language instructions represented in a form which can be "linked" with other routines. Also see source code.

In the study of logic, reality observed without interpretation. As opposed to subjective or interpreted reality. Alternately, a goal.

Base 8: The numerical representation in which each digit has an alphabet of eight symbols, generally 0 through 7.

Somewhat easier to learn than hexadecimal, since no new numeric symbols are needed, but octal can only represent three bits at a time. This generally means that the leading digit will not take all values, and that means that the representation of the top part of two concatenated values will differ from its representation alone, which can be confusing. Also see: binary and decimal.

A frequency ratio of 2:1. From an 8-step musical scale.

OFB or Output FeedBack is an operating mode for a block cipher.

OFB is closely related to CFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. OFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.

OFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and also is shifted into the register. The resulting new register value is ciphered, producing another confusion value for use in stream ciphering.

One disadvantage of this, of course, is the need for a full block-wide ciphering operation, typically for each data byte ciphered. The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing.

One Time Pad
The term "one time pad" (OTP) is rather casually used for two fundamentally different types of cipher:

  1. The Theoretical One Time Pad: a theoretical random source produces values which are combined with data to produce ciphertext. In a theoretical discussion of this concept, we can simply assume perfect randomness in the source, and this assumption supports a mathematical proof that the cipher is unbreakable. But the theoretical result applies to reality only if we can prove the assumption is valid in reality. Unfortunately, we cannot do this, because provably perfect randomness apparently cannot be attained in practice. So the theoretical OTP does not really exist, except as a goal.

  2. The Realized One Time Pad: a really random source produces values which are combined with data to produce ciphertext. But because we can neither assume nor prove perfect, theoretical-class randomness in any real generator, this cipher does not have the mathematical proof of the theoretical system. Thus, a realized one time pad is NOT proven unbreakable, although it may in fact be unbreakable in practice. In this sense, it is much like other realized ciphers.

A realized one time pad (OTP) is essentially a stream cipher with a really random confusion sequence used exactly once. The confusion sequence is the key, and it is as long as the data. Since this amount of keying material can be awkward to transfer and keep, we often see "pseudo" one-time pad designs which attempt to correct this deficiency. Normally, the point is to achieve the theoretical advantages of a one-time pad without the costs; the problem with this is that the one-time pad theory of strength no longer applies. These variations are best seen as classic stream cipher designs.

In a realized one time pad, the confusion sequence must be unpredictable (not generated from a small key value) and must be transported to the far end and held at both locations in absolute secrecy like any other secret key. But where a normal secret key might range perhaps from 16 bytes to 160 bytes, there must be as much OTP sequence as there will be data (which might well be megabytes). And a normal secret key could itself be sent under a key (as in a message key or under a public key). But an OTP sequence cannot be sent under a key, since this would make the OTP as weak as the key, in which case we might as well use a normal cipher. All this implies very significant inconveniences, costs, and risks, well beyond what one would at first expect, so even the realized one time pad is generally considered impractical, except in very special situations.

In a realized one time pad, the confusion sequence itself must be random for, if not, it will be somewhat predictable. And, although we have a great many statistical randomness tests, there is no test which can certify a sequence as either random or unpredictable. This means that a sequence which we assume to be random may not be the unpredictable sequence we need, and we can never know for sure. (This might be considered an argument for using a combiner with strength, such as a Latin square or Dynamic Substitution.) In practice, the much touted "mathematically proven unbreakability" of the one time pad depends upon an assumption of randomness and unpredictability which we can neither test nor prove.

The one time pad sometimes seems to have yet another level of strength above the usual stream cipher, the ever-increasing amount of "unpredictability" or entropy in the confusion sequence, leading to an indefinite unicity distance. In contrast, the typical stream cipher will produce a long sequence from a relatively small amount of initial state, and it can be argued that the entropy of an RNG is just the number of bits in its initial state. In theory, this might mean that the initial state or key used in the stream cipher could be identified after somewhat more than that same amount of data had been enciphered. But it is also perfectly possible for an unsuspected problem to occur in a really-random generator, and then the more sequence generated, the more apparent and useful that problem might be to an Opponent.

Nor does even a theoretical one time pad imply unconditional security: Consider A sending the same message to B and C, using, of course, two different pads. Now, suppose the Opponents can acquire plaintext from B and intercept the ciphertext to C. If the system is using the usual additive combiner, the Opponents can reconstruct the pad between A and C. Now they can send C any message they want, and encipher it under the correct pad. And C will never question such a message, since everyone knows that a one time pad provides "absolute" security as long as the pad is kept secure. Note that both A and C have done this, and they are the only ones who had that pad.

Various companies offer one time pad programs, and sometimes also the keying or "pad" material.

Injective. A mapping f: X -> Y where no two values x in X produce the same result f(x) in Y. A one-to-one mapping is invertible for those values of X which produce unique results f(x), but there may not be a full inverse mapping g: Y -> X.

One Way Diffusion
In the context of a block cipher, a one way diffusion layer will carry any changes in the data block in a direction from one side of the block to the other, but not in the opposite direction. This is the usual situation for fast, effective diffusion layer realizations.

Surjective. A mapping f: X -> Y where f(x) covers all elements in Y. Not necessarily invertible, since multiple elements x in X could produce the same f(x) in Y.

   +----------+          +----------+

   |          |   ONTO   |          |

   |    X     |          | Y = f(X) |

   |          |    f     |          |

   |          |   --->   |          |

   +----------+          +----------+

Operation code: a value which selects one operation from among a set of possible operations. This is an encoding of functions as values. These values may be interpreted by a computer to perform the selected operations in their given sequence and produce a desired result. Also see: software and hardware.

Operating Mode
With respect to block ciphers, a way to handle messages which are larger than the defined block size. Usually this means one of the four block cipher "applications" defined for use with DES:
  • ECB or Electronic CodeBook;
  • CBC or Cipher Block Chaining;
  • CFB or Ciphertext FeedBack; and
  • OFB or Output FeedBack.

It can be argued that block cipher operating modes are stream "meta-ciphers" in which the streamed transformation is of full block cipher width, instead of the usual stream cipher bit- or byte-width transformations.

A term used by some cryptographers to refer to the opposing cryptanalyst or opposing team. Sometimes used in preference to "the enemy."

A Boolean logic function which is also nonlinear under mod 2 addition.

In mathematics, typically the number of elements in a structure, or the number of steps required to traverse a cyclic structure.

In statistics, measurements which are ordered from smallest to largest. Also see: nominal, and interval.

At right angles; on an independent dimension. Two structures which each express an independent dimension.

Orthogonal Latin Squares
Two Latin squares of order n, which, when superimposed, form each of the n2 possible ordered pairs of n symbols exactly once. At most, n-1 Latin squares may be mutually orthogonal.

   3 1 2 0   0 3 2 1       30  13  22  01

   0 2 1 3   2 1 0 3   =   02  21  10  33

   1 3 0 2   1 2 3 0       11  32  03  20

   2 0 3 1   3 0 1 2       23  00  31  12

Also see Balanced Block Mixing.

One Time Pad.

Overall Diffusion
That property of an ideal block cipher in which a change of even a single message or plaintext bit will change every ciphertext bit with probability 0.5. In practice, a good block cipher will approach this ideal. This means that about half of the output bits should change for any possible change to the input block.

Overall diffusion means that the ciphertext will appear to change at random even between related message blocks, thus hiding message relationships which might be used to attack the cipher.

Overall diffusion can be measured statistically in a realized cipher and used to differentiate between better and worse designs. Overall diffusion does not, by itself, define a good cipher, but it is required in a good block cipher.

Also see diffusion, avalanche, strict avalanche criterion and complete.


In classical cryptography, random data added to the start and end of messages so as to conceal the length of the message, and the position where coding actually starts.

In more conventional computing, some additional data needed to fill-out a fixed-size data structure. This meaning also exists in cryptography, where the last block of a fixed-size block cipher often must be padded to fill the block.

A key, in the form of a word. Also "pass phrase," for multiple-word keys. See: user authentication.

The legal right, formally granted by a government, to exclude others from making, selling or using the particular invention described in the patent deed. (The term "selling" is generally understood to cover free distribution.) Note that a patent is not the right to make the invention, if it is covered by other unexpired patents. A patent constitutes the open publication of an invention, in return for a limited-term monopoly on its use. A patent is said to protect the application of an idea (as opposed to the idea itself), and is distinct from copyright, which protects the expression of an idea.

The concept behind patenting is to establish intellectual property in a way somewhat related to a mining claim or real estate. An inventor of a machine or process can file a claim on the innovation, provided that it is not previously published, and that someone else does not already have such a claim. Actual patents normally do not claim an overall machine, but just the newly-innovative part, and wherever that part is used, it must be licensed from the inventor. It is common for an inventor to refine earlier work patented by someone else, but if the earlier patent has not expired, the resulting patent often cannot be practiced without a license from the earlier patent holder.

Someone who comes up with a patentable invention and wishes to give up their rights can simply publish a full description of the invention. Simple publication should prevent an application from anyone who has not already established legal proof that they previously came up with the same invention. In the U.S., publication also apparently sets a 1-year clock running for an application to be filed by anyone who does have such proof. But coming up with an invention does not take away someone else's rights if they came up with the same thing first, they may have a year to file, and their case might take several years to prosecute and issue.

In the U.S., a patent is a non-renewable grant, previously lasting 17 years from issue date, now lasting 20 years from application date. Both an application fee and an issue fee are required, as are periodic "maintenance" fees throughout the life of the patent. There are four main requirements:

  1. Statutory Class (35 USC 101): The invention must be either:
    • a process,
    • a machine,
    • a manufacture,
    • a composition of materials, or
    • a new use for one of the above.

  2. Utility (35 USC 101): The invention must be of some use.

  3. Novelty (35 USC 102): The invention must have some aspect which is different from all previous inventions and public knowledge.

    A U.S. patent is not available if -- before the invention date -- the invention was:

    • Publicly known or used the United States of America, or
    • Described in a printed publication (e.g., available at a public library) anywhere
    (35 USC 102(a)).

    A U.S. patent is not available if -- more than a year before the application date -- the invention was:

    • In public use or on sale in the United States of America, or
    • Described in a printed publication (e.g., available at a public library) anywhere
    (35 USC 102(b)).

  4. Unobviousness (35 USC 103): The invention must have not been obvious to someone of ordinary skill in the field of the invention at the time of the invention. Unobviousness has various general arguments, such as:
    • Unexpected Results,
    • Unappreciated Advantage.
    • Solution of Long-Felt and Unsolved Need, and
    • Contrarian Invention (contrary to teachings of the prior art),
    among many others.

When the same invention is claimed by different inventors, deciding who has "priority" to be awarded the patent can require legally provable dates for both "conception" and "reduction to practice":

  • Conception can be proven by disclosure to others, preferably in documents which can be signed and dated as having been read and understood. The readers can then testify as to exactly what was known and when it was known.
  • Reduction to Practice may be the patent application itself, or requires others either to watch the invention operate or to make it operate on behalf of the inventor. These events also should be carefully recorded in written documents with signatures and dates.
"In determining priority of invention, there shall be considered not only the respective dates of conception and reduction to practice of the invention, but also the reasonable diligence of one who was first to conceive and last to reduce to practice . . ." (35 USC 102(g)).

Also see: prior art and our claims tutorial.

In practice, a patent is rarely the intrusive prohibitive right that it may at first appear to be, because patents are really about money and respect. Ideally, a patent rewards the inventor for doing research and development, and then disclosing an invention to the public; it is also a legal recognition of a contribution to society. If someone infringes a patent in a way which affects sales, or which implies that the inventor cannot do anything about it, the patent holder can be expected to show some interest. But when little or no money is involved, a patent can be infringed repeatedly with little or no response, and typically this will have no effect on future legal action.

This simple introduction cannot begin to describe the complexity involved in filing and prosecuting a patent application. Your author does not recommend going it alone, unless one is willing to put far more time into learning about it and doing it than one could possibly imagine.

Patent Infringement
Patent infringement occurs when someone makes, sells, or uses a patented invention without license from the patent holder.

Normally the offender will be contacted, and there may be a settlement and proper licensing, or the offender may be able to design around the patent, or offender may simply stop infringing. Should none of these things occur, the appropriate eventual response is a patent infringement lawsuit in federal court.

Perfect Secrecy
The unbreakable strength delivered by a cipher in which all possible ciphertexts may be key-selected with equal probability given any possible plaintext. This means that no ciphertext can imply any particular plaintext any more than any other. This sort of cipher needs as much keying information as there is message information to be protected. A cipher with perfect secrecy has at least as many keys as messages, and may be seen as a (huge) Latin square.

There are some examples:

  • (Theoretically) the one-time pad with a perfectly random pad generator.
  • The Dynamic Transposition cipher approaches perfect secrecy in that every ciphertext is a bit-permuted balanced block. Thus, every possible plaintext block is just a particular permutation of any ciphertext block. Since the permutation is created by a keyed RNG, we expect any particular permutation to "never" re-occur, and be easily protected from defined plaintext attack with the usual message key. We also expect that the RNG itself will be protected by the vast number of different sequences which could produce the exact same bit-pattern for any ciphertext result.

Also see: ideal secrecy. From Claude Shannon.

The mathematical term for a particular arrangement of symbols, objects, or other elements. With n symbols, there are

   P(n) = n*(n-1)*(n-2)*...*2*1 = n!

or n- factorial possible permutations. The number of permutations of n things taken k at a time is:

   P(n,k) = n! / (n-k)!

See the permutations section of the Ciphers By Ritter / JavaScript computation pages. Also see combination and symmetric group.

A block cipher can be seen as a transformation between plaintext block values and ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Both plaintext and ciphertext have the same set of possible block values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, which is a permutation of the plaintext values. A block cipher is keyed by constructing a particular permutation of ciphertext values.

Within an explicit table, an arbitrary permutation (one of the set of all possible permutations) can be produced by shuffling the elements under the control of a random number generator. If, as usual, the random number generator has been initialized from a key, a particular permutation can be produced for each particular key; thus, each key selects a particular permutation.

Also, the second part of substitution-permutation block ciphers: First, substitution operations diffuse information across the width of each substitutions. Next, "permutation" operations act to re-arrange the bits of the substituted result (more clearly described as a set of transpositions); this ends a single round. In subsequent rounds, further substitutions and transpositions occur until the block is thoroughly mixed and overall diffusion hopefully achieved.

A popular public key cipher system using both RSA and IDEA ciphers. RSA is used to tranfer a random key; IDEA is used to actually protect the message.

One problem with PGP is a relatively unworkable facility for authenticating public keys. While the users can compare a cryptographic hash of a key, this requires communication through a different channel, which is more than most users are willing to do. The result is a system which generally supports man-in-the-middle attacks, and these do not require "breaking" either of the ciphers.

Physically Random
A random value or sequence derived from a physical source, typically thermal-electrical noise. Also called really random and truly random.

Pink Noise
A random-like signal in which the magnitude of the spectrum at each frequency is proportional to the inverse of the frequency, or 1/f. At twice the frequency, we have half the energy, which is -3 dB. This is a frequency-response slope of -3 dB / octave, or -10 dB / decade. As opposed to white noise, which has the same energy at all frequencies, pink noise has more low-frequency or "red" components, and so is called "pink."

A common frequency response has half the output voltage at twice the frequency. But this is actually one-quarter the power and so is a -6 dB / octave drop. For pink noise, the desired voltage drop per octave is 0.707.

Plaintext is the original, readable message. It is convenient to think of plaintext as being actual language characters, but may be any other symbols or values (such as arbitrary computer data) which need to be protected.

Poisson Distribution
In statistics, a simplified form of the binomial distribution, justified when we have:

  1. a large number of trials n,
  2. a small probability of success p, and
  3. an expectation np much smaller than SQRT(n).

The probability of finding exactly k successes when we have expectation u is:

             k  -u

   P(k,u) = u  e    /  k!

where e is the base of natural logarithms:

   e = 2.71828...

and u is:

   u = n p

again for n independent trials, when each trial has success probability p. In the Poisson distribution, u is also both the mean and the variance

The ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.

If we have an experiment which we think should produce a Poisson distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact Poisson.

Also see the Poisson section of the Ciphers By Ritter / JavaScript computation pages.

Polyalphabetic Combiner
A combining mechanism in which one input selects a substitution alphabet (or table), and another input selects a value from within the selected alphabet, said value becoming the combining result. Also called a Table Selection Combiner.

Polyalphabetic Substitution
A type of substitution in which multiple distinct simple substitution alphabets are used.

Polygram Substitution
A type of substitution in which one or more symbols are substituted for one or more symbols. The most general possible substitution.

Greek for "multiple letters." A cipher which translates multiple plaintext symbols at a time into ciphertext. As opposed to monographic; also see homophonic and polyphonic.

Mathematically, an expression in the standard form of:

     cnxn + . . . + c1x + c0

The c's or coefficients are elements of some field F. The degree n is the value of the exponent of the highest power term. A mod 2 polynomial of degree n has n+1 bits representing the coefficients for each power: n, n-1, ..., 1, 0.

Perhaps the most insightful part of this is that the addition of coefficients for a particular power does not "carry" into other coefficients or columns.

Greek for "multiple sounds." The concept of having a letter sequence which is pronounced in distinctly different ways, depending on context. In cryptography, a cipher which uses a single ciphertext symbol to represent multiple different plaintext symbols. Also see homophonic, polygraphic and monographic.

In statistics, the size, or the number of distinct elements in the possibly hidden universe of elements which we can only know by sampling.

Population Estimation
In statistics, techniques used to predict the population based only on information from random samples on that population. See augmented repetitions.

In statistics, the probability of rejecting a false null hypothesis, and thus accepting a true alternative hypothesis.

In DC electronics, simply voltage times current. In AC electronics, the instantaneous product of voltage times current, integrated over a repetitive cycle. In either case the result is in watts, denoted W.

A value within a finite field which, when taken to increasing powers, produces all field values except zero. A primitive binary polynomial will be irreducible, but not all irreducibles are necessarily primitive.

Primitive Polynomial
An irreducible polynomial, primitive within a given field, which generates a maximal length sequence in linear feedback shift register (LFSR) applications.

All primitive polynomials are irreducible, but irreducibles are not necessarily primitive, unless the degree of the polynomial is a Mersenne prime. One way to find a primitive polynomial is to select an appropriate Mersenne prime degree and find an irreducible using Algorithm A of Ben Or:

     1.  Generate a monic random polynomial gx of degree n over GF(q);

     2.  ux := x;

     3.  for k := 1 to (n DIV 2) do

     4.     ux := ux^q mod gx;

     5.     if GCD(gx, ux-x) <> 1 then go to 1 fi;

     6.     od

Ben-Or, M. 1981. Probabilistic algorithms in finite fields. Proceedings of the 22nd IEEE Foundations of Computer Science Symposium. 394-398.

The result is a certified irreducible. GF(q) represents the Galois Field to the prime base q; for mod 2 polynomials, q is 2. These computations require mod 2 polynomial arithmetic operations for polynomials of large degree; "uxq" is a polynomial squared, and "mod gx" is a polynomial division. A "monic" polynomial has a leading coefficient of 1; this is a natural consequence of mod 2 polynomials of any degree. The first step assigns the polynomial "x" to the variable ux; the polynomial "x" is x1, otherwise known as "10".

To get primitives of non-Mersenne prime degree n, we certify irreducibles P of degree n. To do this, we must factor the value 2n - 1 (which can be a difficult problem, in general). Then, for each factor d of 2n - 1 we create the polynomial T(d) which is xd + 1; this is a polynomial with just two bits set: bit d and bit 0. If P evenly divides T(d) for some divisor d, P cannot be primitive. So if P does not divide any T(d) for all distinct divisors d of 2n - 1, P is primitive.

In general, a positive integer which is evenly divisible only by itself and 1.

Small primes can be found though the ever-popular Sieve of Eratosthenes, which can also be used to develop a list of small primes used for testing individual values. A potential prime need only be divided by each prime equal to or less than the square-root of the value of interest; if any remainder is zero, the number is not prime.

Large primes can be found by probabilistic tests.

Prior Art
In patents, the knowledge published or otherwise available to the public as of some date. Traditionally, this "knowledge" is in ink-on-paper articles or patents, both of which have provable release dates. Private "in house" journals available only within a company generally would not be prior art, nor would information which has been kept secret. Normally, we expect prior art information to be available in a public library.

In a U.S. application for patent, we are interested in the state of the open or public art as it existed as of the invention date, and also one year prior to the filing date. It is that art -- and not something hidden or something later -- against which the new application must be judged. Many things which seem "obvious" in retrospect were really quite innovative at the time they were done.

Pseudo Random Number Generator. In general, pseudorandomness is the norm. Any computer random number generator which is not explicitly labeled as physically random, really random, or other such description, is almost certainly pseudorandom.

In statistics, a sequence of values; a source or generator of such a sequence; a function.

A value or sequence of values typically produced by a random number generator, a deterministic computational mechanism. As opposed to really random. Also see random.

The usual random number generator is actually pseudorandom. Given the initial state, the entire subsequent sequence is completely pre-determined, but nevertheless exhibits many of the expected characteristics of a random sequence. Pseudorandomness supports generating the exact same cryptographic sequence repeatedly at different times or locations. Pseudorandomness is generally produced by a mathematical process, which may provide good assurances as to the resulting statistics, assurances which a really random generator generally cannot provide.

Public Key Cipher
Also called an asymmetric cipher or a two-key cpher. A cipher which uses one key to encipher a message, and a different key to decipher the resulting ciphertext. This allows the enciphering key to be exposed, without exposing the message. As opposed to a secret key cipher.

Either key can be used for enciphering or deciphering. Usually the exposed key is called the "public" key, and the retained hidden key is called the "private" key. The public key is distributed widely, so anyone can use it to encipher a message which presumably can only be deciphered by the hidden private key on the other end. Note that the enciphering end normally does not possess a key which will decipher a message which was just enciphered.

The whole scheme of course depends upon the idea that the private key cannot be developed from knowledge of the public key. The cipher also must resist both known-plaintext and defined-plaintext attack (since anyone can generate any amount of plaintext and encipher it). A public key cipher is vastly slower than a secret key cipher, and so is normally used simply to deliver the message key or session key for a conventional or secret key cipher.

Although at first proclaimed as a solution to the key distribution problem, it soon became apparent that someone could pretend to be someone else, and send out a "spoofed" public key. When people use that key, the spoofer could receive the message, decipher and read it, then re-encipher the message under the correct key and send it to the correct destination. This is known as a man-in-the-middle (MITM) attack.

A MITM attack is unusual in that it can penetrate cipher security without "breaking" either the public key cipher or the internal secret key cipher, and takes almost no computational effort. This is extremely serious because it means that the use of even "unbreakable" ciphers is not sufficient to guarantee privacy. All the effort spent on proving the strength of either cipher is simply wasted when a MITM attack is possible, and MITM attacks are only possible with public key ciphers.

To prevent spoofing, public keys must be authenticated (or validated or certified) as representing who they claim to represent. This can be almost as difficult as the conventional key distribution problem and generally requires complex protocols. And a failure in a key certification protocol can expose a system which uses "unbreakable" ciphers. In contrast, the simple use of an "unbreakable" secret key cipher (with hand-delivered keys) is sufficient to guarantee security. This is a real, vital difference between ciphering models.


A process which selects unpredictably, each time independent of all previous times, from among multiple possible results; or a result from such a process. Ideally, an arbitrary stateless selection from among equiprobable outcomes, thus producing a uniform distribution of values. The absence of pattern. Also see pseudorandom.

Randomness is an attribute of the process which generates or selects "random" numbers rather than the numbers themselves. But the numbers do carry the ghost of their creation: If values really are randomly generated with the same probability, we expect to find almost the same number of occurrences of each value or each sequence of the same length. Over many values and many sequences we expect to see results form in distributions which accord with our understanding of random processes. So if we do not find these expectations in the resulting numbers, we may have reason to suspect that the generating process is not random. Unfortunately, any such suspicion is necessarily statistical in nature, and cannot produce absolute proof in either direction: Randomness can produce any relationship between values, including apparent correlations (or their lack) which do not in fact represent the systematic production of the generator. (Also see the discussions of randomness testing in Statistics and Null Hypothesis, the article Chi-Square Bias in Runs-Up/Down RNG Tests, also Randomness Tests: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.)

From one point of view, there are no "less random" or "more random" sequences, since any sequence can be produced by a random process. And any sequence (at least any particular sequence) also can be produced by a deterministic computational random number generator. (We note that such generators are specifically designed to and do pass statistical randomness tests.) So the difference is not in the sequences, per se, but instead in the generators: For one thing, an RNG sequence is deterministic and therefore may somehow be predicted. But, in practice, extensive analysis could show deviations from randomness in either the deterministic RNG designs or the nondeterministic really random generation equipment, and this could make even a nondeterministic generator somewhat predictable.

There are "more complex" and "less complex" sequences according to various measures. For example:

  • Linear complexity grades sequences on the size of the minimum shift-register state needed to produce the sequence.
  • Kolmogorov-Chaitin complexity grades sequences on the size of the description of the algorithm needed to produce the sequence.
These measures produce values related to the amount of pattern in a sequence, or the extent to which a sequence can be predicted by some algorithmic model. Such values describe the uncertainty of a sequence, and are in this way related to entropy.

We should note that the subset of sequences which have a high linear complexity leaves a substantial subset which does not. So if we avoid sequences with low linear complexity, any sequence we do accept must be more probable than it would be in the unfiltered set of all possible sequences. In this case, the expected higher uncertainty of the sequence itself is at least partly offset by the certainty that such a sequence will be used. Similar logic applies to S-box measurement and selection.

Oddly -- and much like strength in ciphers -- the "unpredictable" part of randomness is contextual and subjective, rather than the absolute and objective qualities we like in Science. While the sequence from a complex RNG can appear random, if we know the secret of the generator construction, and its state, we can predict the sequence exactly. But often we are in the position of seeing the sequence alone, without knowing the source, the construction, or the internal state. So while we might see a sequence as "random," that same sequence might be absolutely predictable (and thus not random) to someone who knows "the secret."

Random Number Generator
A random number generator is a standard computational tool which creates a sequence of apparently unrelated numbers which are often used in statistics and other computations.

In practice, most random number generators are deterministic computational mechanisms, and each number is directly determined from the previous state of the mechanism. Such a sequence is often called pseudo-random, to distinguish it from a really random, sequence somehow composed of actually unrelated values.

A computational random number generator will always generate the same sequence if it is started in the same state. So if we initialize the state from a key, we can use the random number generator to shuffle a table into a particular order which we can reconstruct any time we have the same key. (See, for example: A Keyed Shuffling System for Block Cipher Cryptography.)

Note that random number generators are designed to pass the many statistical tests of randomness; clearly, such tests do not indicate a really random sequence. Moreover, if we define "random" as "the absence of any pattern," the only way we could validate such a sequence is by checking for every possible pattern. But there are too many patterns, so "real" randomness would seem to be impossible to check experimentally. (Also see the discussions of randomness testing in Statistics and Null Hypothesis.)

Also see the article: The Efficient Generation of Cryptographic Confusion Sequences, plus RNG Implementations: A Literature Survey, RNG Surveys: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.

Random Variable
In statistics, a term or label for an unknown value. Also used when each of the possible values have some known probability.

A discrete random variable takes on a finite set of values. The probability of each value is the frequency function or probability density function, and the graph of the frequency function is the frequency distribution.

The set of the results from a mapping for all possible arguments. Also see: domain.

Really Random
A random value or sequence derived from a source which is expected to produce no predictable or repeatable relationship between values.

Examples of a really random source might include radioactive decay, Johnson or thermal noise, shot noise from a Zener diode or reverse-biased junction in breakdown, etc. Clearly, some sort of circuitry will be required to detect these generally low-level events, and the quality of the result is often directly related to the design of the electronic processing. Other sources of randomness might be precise keystroke timing, and the accumulated hash of text of substantial size. Also called physically random and truly random. As opposed to pseudorandom (see random number generator).

Really random values are particularly important as message key objects, or as a sequence for use in a realized one-time pad.

Also see: Random Number Machines: A Literature Survey and Random Electrical Noise: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page, and Randomness Links, in Ritter's Net Links page.

Classically, an electro-mechanical component consisting of a mechanical switch operated by the magnetic force produced by an electromagnet, a conductor wound around an iron dowel or core. A relay is at least potentially a sort of mechanical (slow) and nonlinear amplifier which is well-suited to power control.

Research Hypothesis
In statistics, the statement formulated so that the logically contrary statement, the null hypothesis H0 has a test statistic with a known distribution for the case when there is nothing unusual to detect. Also called the alternative hypothesis H1, and logically identical to "NOT-H0" or "H0 is not true."

A basic electronic component in which voltage and current are linearly related by Ohm's Law: E = IR. Resistors can thus be used to limit current I given voltage E: (I = E/R), or to produce voltage E from current I: (E = IR). Two resistors in series can divide voltage Ein to produce the output voltage Eo: ( Eo = Ein(R1/(R1+R2)) ).

Also see capacitor and inductor.

In abstract algebra, a nonempty set R with two dyadic (two-input, one-output) operations which we choose to call "addition" and "multiplication" and denote + and * as usual. If elements (not necessarily numbers) a, b are in R, then a+b is in R, and ab (or a*b) are also in R. The following properties hold:
  1. Addition is commutative: a + b = b + a
  2. Addition is associative: (a + b) + c = a + (b + c)
  3. There is a "zero" or additive identity: a + 0 = a
  4. There is an additive inverse: for any a there is an x in R such that a + x = 0
  5. Multiplication is associative: (ab)c = a(bc)
  6. Multiplication is distributive: a(b + c) = ab + ac and (b + c)a = ba + ca
  7. In a commutative ring, multiplication is commutative: ab = ba
  8. In a ring with unity, there is a multiplicative identity: for e in R, ea = ae = a

A solution: A value which, when substituted for a variable in a mathematical equation, makes the statement true.

root mean square.

Root Mean Square
The square root of the integral of instantaneous values squared. Thus, when measuring voltage or current, a value proportional to the average power in watts, even in a complex waveform.

Random Number Generator.

In the context of block cipher design, a term often associated with a Feistel block cipher such as DES. A round is the set of operations which are repeated multiple times to produce the final data. For example, DES uses 16 generally identical rounds, each of which performs a number of operations. As opposed to a layer, which is not applied repeatedly.

The name of an algorithm published by Ron Rivest, Adi Shamir, and Len Adleman (thus, R.S.A.). The first major public key system.

Based on number-theoretic concepts and using huge numerical values, a RSA key must be perhaps ten times or more as long as a secret key for similar security.

Running Key
The confusion sequence in a stream cipher.


An unnecessarily cute and sadly non-descriptive name for an arbitrary value, unique to a particular computer or installation, prepended to a password before hash authentication. The "salt" acts to complicate attacks on the password user-identification process by giving the same password different hash results on different systems. Ideally, this would be a sort of keying for a secure hash.

In statistics, one or more elements, typically drawn at random from some population.

Normally, we cannot hope to examine the full population, and so must instead investigate samples of the population, with the hope that they represent the larger whole. Often, random sampling occurs "without replacement"; effectively, each individual sample is returned to the population before the next sample is drawn.

Substitution box or table; typically a component of a cryptographic system. "S-box" is a rather non-specific term, however, since S-boxes can have more inputs than outputs, or more outputs than inputs, each of which makes a single invertible table impossible. The S-boxes used in DES contain multiple invertible substitution tables, with the particular table used at any time being data-selected.

One possible S-box is the identity transformation (0->0, 1->1, 2->2, ...) which clearly has no effect at all, while every other transformation has at least some effect. So different S-boxes obviously can contain different amounts of some qualities. Qualities often mentioned include avalanche and Boolean function nonlinearity. However, one might expect that different ciphering structures will need different table characteristics to a greater or less degree. So the discussion of S-box strength always occurs within the context of a particular cipher construction.

S-Box Avalanche

With respect to avalanche, any input change -- even one bit -- will select a different table entry. Over all possible input values and changes, the number of output bits changed will have a binomial distribution. (See the bit changes section of the Ciphers By Ritter / JavaScript computation pages.) So, in this respect, all tables are equal.

On the other hand, it is possible to arrange tables so that single-bit input changes are guaranteed to produce at least two-bit output changes, and this would seem to improve avalanche. But we note that this is probable even with a randomly-constructed table, so we have to ask just how much this guarantee has improved things. In a Feistel cipher, it seems like this might reduce the number of needed rounds by one. But in actual operation, the plaintext block is generally randomized, as in CBC-mode. This means that the probability of getting a single-bit change in operation is very low anyway.

It is true that cipher avalanche is tested using single-bit input changes, and that is the way avalanche is defined. The point of this is to assure that every output bit is "affected" by every input bit. But I see this as more of an experimental requirement than an operational issue that need be optimized.

S-Box Nonlinearity

With respect to Boolean function nonlinearity, as tables get larger it becomes very difficult -- and essentially impossible -- to find tables with ideal nonlinearity values. This means that we are always accepting a compromise value, and this is especially the case if the table must also have high values of other S-box qualities.

Even randomly-constructed tables tend to have reasonable nonlinearity values. We might expect an 8-bit table to have a nonlinearity of about 100 (that is, 100 bits must change in one of the eight 256-bit output functions to reach the closest affine Boolean function). Experimental measurement of the nonlinearity of 1,000,000 random 8-bit tables shows exactly one table with a nonlinearity as low as 78, and the computed probability of an actually linear table (nonlinearity zero) is something like 10-72 or 2-242.

The NSA-designed 8-bit table in Skipjack cipher has a computed nonlinearity of 104. While not quite the highest value we could find, it is in the top 2.5 percent of the distribution, and it seems improbable that this occurred by accident. We might assume that this table is representative of the modern understanding of the needs of a Feistel design with a fixed table. If so, we might conclude that good nonlinearity (or something very much like it) is a necessary, if not quite sufficient, part of the design.

Keyed S-Boxes

It is "easy" to construct keyed S-boxes, by shuffling under the control of a keyed cryptographic random number generator. (See, for example: A Keyed Shuffling System for Block Cipher Cryptography.) This has the significant advantage of providing no fixed tables for The Opponent to understand and attack.

One question is whether one should attempt to measure and discard tables with poorer qualities than others. My personal feeling is that the ciphering structure should be strong enough to handle the expected random table distribution without added measurement and selection.

Also see: S-Box Design: A Literature Survey, in the Literature Surveys and Reviews section of the Ciphers By Ritter page.

A cipher design which can produce both large real ciphers and tiny experimental versions from the exact same construction rules. Scalability is about more than just variable size: Scalability is about establishing a uniform structural identity which is size-independent, so that we achieve a strong cipher near the top, and a tiny but accurate model that we can investigate near the bottom.

While full-size ciphers can never be exhaustively tested, tiny cipher models can be approached experimentally, and any flaws in them probably will be present in the full-scale versions we propose to use. Just as mathematics works the same for numbers large or small, a backdoor cipher built from fixed construction rules must have the same sort of backdoor, whether built large or small.

For block ciphers, the real block size must be at least 128 bits, and the experimental block size probably should be between 8 and 16 bits. Such tiny ciphers can be directly compared to keyed substitution tables of the same size, which are the ideal theoretical model of a block cipher.

Potentially, scalability does far more than just simplify testing: Scalability is an enabling technology that supports experimental analysis which is otherwise impossible.

One of the objectives of cryptography: Keeping private information private. Also see: trust.

In a secret key cipher, secrecy implies the use of a strong cipher. Secrecy in communication requires the secure distribution of secret keys to both ends (this is the key distribution problem).

In a public key cipher, the ability to expose keys apparently solves the key distribution problem. But communications secrecy requires that public keys be authenticated (certified) as belonging to their supposed owner. This must occur to cryptographic levels of assurance, because failure leads to immediate vulnerability under a man-in-the-middle attack. The possibility of this sort of attack is very disturbing, because it needs little computation, and does not involve breaking any cipher, which makes all discussion of cipher strength simply irrelevant.

Secret Code
A coding in which the correspondence between symbol and code value is kept secret.

Secret Key Cipher
Also called a symmetric cipher or conventional cipher. A cipher in which the exact same key is used to encipher a message, and then decipher the resulting ciphertext. As opposed to a public key cipher.

Protection of a vital quality (such as secrecy, or safety, or even wealth) from infringement, and the resulting relief from fear and anxiety. The ability to engage and defeat attempts to damage, weaken, or destroy a vital quality. Security, in the form of assuring the secrecy of information while in storage or transit, is the fundamental role of cryptography.

A secure cryptosystem physically or logically prevents unauthorized disclosure of its protected data. This is independent of whether the attacker is a government agent, a criminal, a private detective, some corporate security person, or a friend of an ex-lover. Real security does not care who the attacker is, or what their motive may be, but instead protects against the threat itself. Limited security, on the other hand, often seeks to guess the identity, capabilities and motives of the attacker, and concentrates resources at those points.

There is, of course, no absolute security. But we can have real security against particular, defined threats. Also see: strength.

Security Through Obscurity
A phrase which normally refers to inventing a new cipher which is supposedly strong, then keeping the cipher secret so it "cannot be attacked." One problem with this strategy is that it prevents public review of the cipher design, which means that the cipher may have serious weaknesses. And it may be much easier for The Opponent to obtain the supposedly secret ciphering program than it would be to break a serious cipher (see Kerckhoff's second requirement).

On the other hand, it can be a mistake to use even a public and well-reviewed cipher, if the cipher protects enough valuable information to support a substantial investment in analysis and equipment to break the cipher. A reasonable alternative is to select from among a wide variety of conceptually different ciphers, each of which thus carries far less information of far less value and so may not warrant a substantial attack investment.

A material which is between conductor and insulator with respect to ease of electron flow. The obvious examples are silicon and germanium.

A set with an associative dyadic operation which happens to be closed.

Session Key
A key which lasts for the period of a work "session." A message key used for multiple messages.

A collection of distinguishable elements, usually, but not necessarily, numbers.

Shift Register
An array of storage elements in which the values in each element may be "shifted" into an adjacent element. (A new value is shifted into the "first" element, and the value in the "last" element is normally lost, or perhaps captured off-chip.) (See LFSR.)

   Right-Shifting Shift Register (SR)

               +----+  +----+         +----+

   Carry In -->| A0 |->| A1 |-> ... ->| An |--> Carry Out

               +----+  +----+         +----+

In digital hardware versions, elements are generally bits, and the stored values actually move from element to element in response to a clock. Analog hardware versions include the charge-coupled devices (CCD's) used in cameras, where the analog values from lines of sensors are sampled in parallel, then serialized and stepped off the chip to be digitized and processed.

In software versions, elements are often bytes or larger values, and the values may not actually move during stepping. Instead, the values may reside in a circular array, and one or more offsets into that array may step. In this way, even huge amounts of state can be "shifted" by changing a single index or pointer.

Generally, the concept of "mixing up" a set of objects, symbols or elements, as in shuffling cards. Mathematically, each possible arrangement of elements is a particular permutation.

Within a computer environment, it is easy to shuffle an arbitrary number of symbols using a random number generator, and the algorithm of Durstenfeld, which is described in Knuth II:

Durstenfeld, R. 1964. Algorithm 235, Random Permutation, Procedure SHUFFLE. Communications of the ACM. 7: 420.

Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 139. Reading, Mass: Addison-Wesley.

Sieve of Eratosthenes
A way to find relatively small primes. Although small primes are less commonly useful in cryptography than large (say, 100+ digit) primes, they can at least help to validate implementations of the procedures used to find large primes.

Basically, the "Sieve of Eratosthenes" starts out with a table of numbers from 1 to some limit, all of which are potential primes, and the knowledge that 2 is a prime. Since 2 is a prime, no other prime can have 2 as a factor, so we run though the table discarding all multiples of 2. The next remaining number above 2 is 3, which we accept as a prime, and then run through the table crossing off all multiples of 3. The next remaining is 5, so we cross off all multiples of 5, and so on. After we cross-off each prime up to the square-root of the highest value in the table, the table will contain only primes.

A similar process works with small polynomials, and small polynomial fields, to find irreducible polynomials.

In statistics, the probability of committing a type I error, the rejection of a true null hypothesis. Given the probability distribution of the test statistic for the case "nothing unusual found," that area which is sufficiently unlikely that values in this critical region would lead to rejecting the null hypothesis, and thus accepting the alternative hypothesis.

Simple Substitution
A type of substitution in which each possible symbol is given a unique replacement symbol.

Perhaps the original classical form of cipher, in which each plaintext character is enciphered as some different character. In essence, the order of the alphabet is scrambled or permuted, and the particular scrambled order (or the scrambling process which creates that particular order) is the cipher key. Normally we think of scrambling alphabetic letters, but any computer coding can be scrambled similarly.

Small, practical examples of simple substitution are easily realized in hardware or software. In software, we can have a table of values each of which can be indexed or selected by element number. In hardware, we can simply have addressable memory. Given an index value, we can select the element at the index location, and read or change the value of the selected element.

A substitution table will be initialized to contain exactly one occurrence of each possible symbol or character. This allows enciphering to be reversed and the ciphertext deciphered. For example, suppose we substitute a two-bit quantity, thus a value 0..3, in a particular table as follows:

   2  3  1  0.

The above substitution table takes an input value to an output value by selecting a particular element. For example, an input of 0 selects 2 for output, and an input of 2 selects 1. If this is our enciphering, we can decipher with an inverse table. Since 0 is enciphered as 2, 2 must be deciphered as 0, and since 2 is enciphered as 1, 1 must be deciphered as 2, with the whole table as follows:

   3  2  0  1.

Mathematically, a simple substitution is a mapping (from input to output) which is one-to-one and onto, and is therefore invertible.

The description of a logic machine. The original textual composition is called source code, the file of compiled opcode values is called object code, and the final linked result is pure "machine code" or machine language Note that, by itself, software does not and can not function; but instead relies upon hardware for all functionality. When "software" is running, there is no software there: there is only hardware memory, with hardware bits which can be sensed and stored, hardware counters and registers, and hardware digital logic to make decisions. See: computer, system, system design, and debug.

Source Code
The textual representation of a computer program as it is written by a programmer. Nowadays, source is typically in a high-level language like C, C++ or Pascal, but inevitably some programmers must work "close to the machine" in assembly language. The "code" part of this is presumably an extension of the idea that, ultimately, all computer programs are executed as "machine code" or machine language. This consists of numeric values or "operation codes" ("opcodes") which select the instruction to be executed, and so represent a very public code for those instructions. Also see object code.

Information storage, or "memory." In abstract machine theory, retained information, generally used to influence future events.

In statistics, the current symbol from a sequence, or a value which selects or conditions possible outcomes (see: Markov process).

We normally measure "state" in units of information or bits, and 8 bits of "state" can support 28 or 256 different state-value combinations or states.

Also see: deterministic and keyspace.

Stationary Process
In statistics, a stochastic (random) process (function) whose general statistics do not change over time; in which every sub-sequence is representative of the whole; a homogenous process. This may not be true of a Markov process. Also see: ergodic.

A computation or process intended to reduce diverse results into a one-dimensional ordering of values for better understanding and comparison. Also the value result of such a computation. See statistics.

A useful statistic will have some known (or at least explorable) probability distribution for the case "nothing unusual found." This allows the statistic value to be interpreted as the probability of finding that value or less, for the case "nothing unusual found." Then, if improbable statistic values occur repeatedly and systematically, we can infer that something unusual is being found, leading to the rejection of the null hypothesis.

It is also possible to explore different distributions for the same statistic under different conditions. This can provide a way to guess which condition was in force when the data were obtained.

The mathematical science of interpreting probability to extract meaning from diverse results. Also the analysis of a large population based on a limited number of random samples from that population; this is also the ability to state probability bounds for the correctness of certain types of inductive reasoning. See statistic and random variable.

The usual role of statistics is to identify particular systematic events in the context of expected random variations that may conceal such events. This often occurs in a context of difficult and costly experimentation, and there is a premium on results which are so good that they stand above the noise; it may be that not much is lost if a weak positive is ignored.

In contrast, cryptography and randomness generally support vast amounts of testing at low cost, and we seek weak indications. In this context, we may find it more useful to conduct many tests and collect many statistic values, then visually and mathematically compare the experimental distribution to the ideal for that statistic.

A statistical distribution usually represents what we should expect from random data or random sampling. If we have random data, statistic values exceeding 95% of the distribution (often called failure) should occur about 1 time in 20. And since that one time may happen on the very first test, it is only prudent to conduct many tests and accumulate results which are more likely to represent reality than any one result from a single test.

In statistical randomness testing, "failure" should and must occur with the appropriate frequency. Thus, the failure to fail is itself a failure! This means that the very concept of statistical "failure" often may be inappropriate for cryptographic use. Grading a result as "pass" or "fail" discards all but one bit of information. Further, a pass / fail result is a Bernoulli trial, which would take many, many similar tests to properly characterize. So it may be more appropriate to collect 20 or more statistic probability values, and then compare the accumulation to the expected distribution for that statistic. This will provide a substantial basis for asserting that the sampled process either did or did not produce the same statistic distribution as a random process.

Due to random sampling, any statistical result is necessarily a probability, rather than certainty. An "unlucky" sampling can produce statistical results which imply the opposite of reality. In general, statistics simply cannot provide the 100 percent certainty which is traditionally expected of mathematical "proof."

Greek for "sheltered writing." Methods of cryptology which seek to conceal the existence of a message. As opposed to cryptography which seeks to hide the information in the message, even if the message itself is completely exposed.

In statistics, random; involving a random variable.

Stream Cipher
A cipher which directly handles messages of arbitrary size, by ciphering individual elements, such as bits or bytes. This avoids the need to accumulate data into a block before ciphering, as is necessary in a conventional block cipher. But note that a stream cipher can be seen as an operating mode, a "streaming" of a tiny block transformation. Stream ciphers can be called "combiner-style" ciphers.

Stream Cipher Diffusion

In a conventional stream cipher, each element (for example, each byte) of the message is ciphered independently, and does not affect any other element.

In a few stream cipher designs, the value of one message byte may change the enciphering of subsequent message bytes; this is forward data diffusion. But a stream cipher cannot change the enciphering of previous message bytes. In contrast, changing even the last bit in a block cipher block will generally change about half of the earlier bits within that same block. Changing a bit in one block may even affect later blocks if we have some sort of stream meta-cipher composed of block cipher transformations, like CBC.

Note that a stream cipher generally does not need data diffusion for strength, as does a block cipher. In a block cipher, it may be possible to separate individual components of the cipher if their separate effects are not hidden by diffusion. But a stream cipher generally re-uses the same transformation, and has no multiple data components to hide.

Stream Cipher Construction

The classic stream cipher is very simple, consisting of a keyed random number generator which produces a random-like confusion sequence or running key. That sequence is then combined with plaintext data in a simple additive combiner to produce ciphertext.

When an exclusive-OR combiner is used, exactly the same construction will also decipher the ciphertext. But if The Opponents have some known-plaintext and associated ciphertext, they can easily produce the original confusion sequence. This, along with their expected knowledge of the cipher design, may allow them to attack and expose the confusion generator. If this is successful, it will, of course, break the system until the RNG is re-keyed.

The ultimate stream cipher is the one-time pad, in which a really random sequence is never re-used. But if a sequence is re-used, The Opponent can generally combine the two ciphertexts, eliminating the confusion sequence, and producing the combined result of two plaintexts. Such a combination is normally easy to attack and penetrate.

The re-use of confusion sequence is extremely dangerous in a stream cipher design. In general, all stream cipher designs must use a message key to assure that the cipher is keyed with a random value for every new ciphering. This does, of course, expand the ciphertext by the size of the message key.

Another alternative in stream cipher design is to use a stronger combiner, such as Latin square or Dynamic Substitution combining. This can drastically reduce the complexity required in the confusion generator, which normally provides all stream cipher strength. Each of these stronger combiners is nonlinear, with substantial internal state, and the designer may elect to use multiple combinings in sequence, or a selection among different combiners. Neither of these approaches make much sense with an additive combiner.

The ability of a cipher to resist attack and maintain secrecy. The overall "strength" of a cipher is the minimum effort required to break the cipher, by any possible attack. But our knowledge of cipher "strength" is necessarily contextual and subjective, much like unpredictability in random sequences. Although "strength" would seem to be the entire point of using a cipher, cryptography has no way to measure strength.

Cipher "strength" is often taken as an absolute universal negative, the simple non-existence of any attack which could succeed, assuming some level of attack resources. But this means that overall "strength" may be forever impossible to measure, because there is no hope of enumerating and evaluating every possible attack.

Strength and Cryptanalysis

Because we have no tools for the discussion of strength under all possible attacks, cipher "strength" is normally discussed in the context of particular attacks. Each known attack approach can be elaborated for a particular cipher, and a value calculated for the effort required to break the cipher in that way; this may set an "upper bound" on the unknown strength of the cipher (although some "elaborations" are clearly better than others). And while this is certainly better than not knowing the strength with respect to known attacks, such attacks may not represent the actual threat to the cipher in the field. (A cipher may even be said to have different "contextual strengths," depending on the knowledge available to different Opponents.) In general, we never know the "lower bound" or "true" strength of a cipher. So, unless a cipher is shown to be weaker than we can accept, cryptanalysis provides no useful information about cipher strength.

It is sometimes argued that "our guys" are just as good as the Opponents, who thus could not break a cipher with less effort than we know. Or it is said that if a better break were known, that secret necessarily would get out. When viewed in isolation such statements are clearly false reasoning, yet these are the sort of assumptions that are often implicitly used to assert strength after cryptanalysis.

Since we cannot know the true situation, for a proper security analysis we must instead assume that our Opponents have more time, are better trained, are better equipped, and may even be smarter than our guys. Further, the Opponents are quite likely to function as a well-motivated group with a common goal and which can keep secrets; clearly, this is a far different situation than the usual academic cryptanalysis. So, again, cryptanalysis by "our guys" provides no information about the strength of the cipher as seen by our Opponents.

Increasing Probable Strength and Reducing Possible Loss

Technical strength is just one of the many possibilities for weakness in a cipher system, and perhaps even the least likely. It is surprisingly difficult to construct a cipher system without "holes," despite using good ciphers, and The Opponents get to exploit any overlooked problems. Users must be educated in security, and must actively keep secrets or there will be nothing to protect. In contrast, cryptanalysis is very expensive, success is never assured, and even many of the known attacks are essentially impossible in practice.

Nevertheless, it is a disturbing fact that we do not know and cannot guarantee a "true" strength for any cipher. But there are approaches which may reduce the probability of technical weakness and the extent of any loss:

  1. We can extrapolate various attacks beyond weakness levels actually shown, and thus possibly avoid some weak ciphers.
  2. We can use systems that change ciphers periodically. This will reduce the amount of information under any one cipher, and so limit the damage if that cipher is weak.
  3. We can use multiple encryption with different keys and different ciphers as our standard mode. In this way, not just one but multiple ciphers must each be penetrated simultaneously to expose the protected data.
  4. We can use systems that allow us to stop using ciphers when they are shown weak, and switch to others.

Kinds of Cipher Strength

In general, we can consider a cipher to be a large key-selected transformation between plaintext and ciphertext, with two main types of strength:

  • One type of "strength" is an inability to extrapolate from known parts of the transformation (e.g., known plaintext) to model -- or even approximate -- the transformation at new points of interest (message ciphertexts).
  • Another type of "strength" is an inability to develop a particular key, given the known cipher and a large number of known transformation points.

Views of Strength

Strength is the effectiveness of fixed defense in the cryptography war. In real war, a strong defense might be a fortification at the top of a mountain which could only be approached on a single long and narrow path. Unfortunately, in real military action, time after time, making assumptions about what the opponent "could not" do turned out to be deadly mistakes. In cryptography we can at least imagine that someday we might prove that all approaches but one are actually impossible, and then guard that last approach; see mathematical cryptography.

The Future of Strength

It is sometimes convenient to see security as a fence around a restricted compound: We can beef up the front gate, and in some way measure that increase in "strength." But none of that matters if someone cuts through elsewhere, or tunnels under, or jumps over. Until we can produce a cipher design which reduces all the possible avenues of attack to exactly one, it will be very difficult to measure "strength."

One possibility might be to construct ciphers in layers of different puzzles: Now, the obvious point of having multiple puzzles is to require multiple solutions before the cipher is broken. But a perhaps less obvious point is to set up the design so that the solution to one puzzle requires The Opponent to commit (in an information sense) in a way that prevents the solution to the next puzzle.

Also see design strength, perfect secrecy, ideal secrecy, and security.

Strict Avalanche Criterion (SAC)
A term used in S-box analysis to describe the contents of an invertible substitution or, equivalently, a block cipher. If we have some input value, and then change one bit in that value, we expect about half the output bits to change; this is the avalanche effect, and is caused by an avalanche process. The Strict Avalanche Criterion requires that each output bit change with probability one-half (over all possible input starting values). This is stricter than avalanche, since if a particular half of the output bits changed all the time, a strict interpretationist might call that "avalanche." Also see complete.

As introduced in Webster and Tavares:

"If a cryptographic function is to satisfy the strict avalanche criterion, then each output bit should change with a probability of one half whenever a single input bit is complemented." [p.524]

Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85. 523-534.

Although the SAC has tightened the understanding of "avalanche," even SAC can be taken too literally. Consider the scaled-down block cipher model of a small invertible keyed substitution table: Any input bit-change thus selects a different table element, and so produces a random new value (over all possible keys). But when we compare the new value with the old, we find that typically half the bits change, and sometimes all the bits change, but never is there no change at all. This is a tiny bias toward change.

If we have a 2-bit (4-element) table, there are 4 values, but after we take one as the original, there are only 3 changed values, not 4. We will see changes of 1 bit, 1 bit, and 2 bits. But this is a change expectation of 2/3 for each output bit, instead of exactly 1/2 as one might interpret from SAC. Although this bias is clearly size-related, its source is invertibility and the definition of change. Thus, even a large block cipher must have some bias, though it is unlikely that we could measure enough cases to see it. The point is that one can extend some of these definitions well beyond their intended role.

In the study of logic, a particular interpretation of reality, rather than objective reality itself.

The concept of replacing one symbol with another symbol. This might be as simple as a grade-school lined sheet with the alphabet down the left side, and a substitute listed for each letter. In computer science this might be a simple array of values, any one of which can be selected by indexing from the start of the array. See substitution table.

Cryptography recognizes four types of substitution:

A method of constructing block ciphers in which block elements are substituted, and the resulting bits typically transposed or scrambled into a new arrangement. This would be one round of many.

One of the advantages of S-P construction is that the "permutation" stage can be simply a re-arrangement of wires, taking almost no time. Such a stage is more clearly described as a limited set of "transpositions," rather than the more general "permutation" term. Since substitutions are also permutations (albeit with completely different costs and effects), one might fairly describe such a cipher as a "permutation-permutation cipher," which is not particularly helpful.

A disadvantage of the S-P construction is the need for special substitution patterns which support diffusion. S-P ciphers diffuse bit-changes across the block round-by-round; if one of the substitution table output bits does not change, then no change can be conducted to one of the tables in the next round, which has the effect of reducing the complexity of the cipher. Consequently, special tables are required in S-P designs, but even special tables can only reduce and not eliminate the effect. See Complete.

Substitution Table
(Also S-box.) A linear array of values, indexed by position, which includes any value at most once. In cryptographic service, we normally use binary-power invertible tables with the same input and output range. For example, a byte-substitution table will have 256 elements, and will contain each of the values 0..255 exactly once. Any value 0..255 into that table will select some element for output which will also be in the range 0..255.

For the same range of input and output values, two invertible substitution tables differ only in the order or permutation of the values in the table. There are 256 factorial different byte-substitution tables, which is a keyspace of 1648 bits.

A keyed simple substitution table of sufficient size is the ideal block cipher. Unfortunately, with 128-bit blocks being the modern minimum for strength, there would be 2128 entries in that table, which is completely out of the question.

A keyed substitution table of practical size can only be thought of as a weak block cipher by itself, but it can be part of a combination of components which produce a stronger cipher. And since an invertible substitution table is the ideal tiny block cipher, it can be used for direct experimental comparison to a scalable block cipher of that same tiny size.

Usually the outer-level encryption of a multiple encryption. Often relatively weak, relying upon the text randomization effect of the lower-level encryption.

Onto. A mapping f: X -> Y where f(x) covers all elements in Y. Not necessarily invertible, since multiple elements x in X could produce the same f(x) in Y.

Classically, an electro-mechanical device which physically presses two conductors together at a contact point, thus "making" a circuit, and also pulls the conductors apart, thus allowing air to insulate them and thus "breaking" the circuit. More generally, something which exhibits a significant change in some parameter between "ON" and "OFF."

Switching Function
A logic function.

Symmetric Cipher
A secret key cipher.

Symmetric Group
The symmetric group is the set of all one-to-one mappings from a set into itself. The collection of all permutations of some set.

Suppose we consider a block cipher to be a key-selected permutation of the block values: One question of interest is whether our cipher construction could, if necessary, reach every possible permutation, the symmetric group.

An interconnecting network of components which coordinate to perform a larger function. Also a system of ideas. See system design.

System Design
The design of potentially complex systems.

It is now easy to construct large hardware or software systems which are almost unmanageably complex and never error-free. But a good design and development approach can produce systems with far fewer problems. One such approach is:

  1. Decompose the system into small, testable components.
  2. Construct and then actually test each of the components individually.

This is both easier and harder than it looks: there are many ways to decompose a large system, and finding an effective and efficient decomposition can take both experience and trial-and-error. But many of the possible decompositions define components which are less testable or even untestable, so the testability criterion greatly reduces the search.

Testing is no panacea: we cannot hope to find all possible bugs this way. But in practice we can hope to find 90 percent or more of the bugs simply by actually testing each component. (Component testing means that we are forced to think about what each component does, and about its requirements and limits. Then we have to make the realized component conform to those tests, which were based on our theoretical concepts. This will often expose problems, whether in the implementation, the tests, or the concepts.) By testing all components, when we put the system together, we can hope to avoid having to debug multiple independent problems simultaneously.

Other important system design concepts include:

  • Build in test points and switches to facilitate run-time inspection, control, and analysis.
  • Use repeatable comprehensive tests at all levels, and when a component is "fixed," run those tests again.
  • Start with the most basic system and fewest components, make that "work" (pass appropriate system tests), then "add features" one-by-one. Try not to get too far before making the expanded system work again.

Table Selection Combiner

A combining mechanism in which one input selects a table or substitution alphabet, and another input selects a value from within the selected table, said value becoming the combined result. Also called a Polyalphabetic Combiner.

Supposedly the acronym for "Transient Electromagnetic Pulse Emanation Surveillance Technology." Originally, the potential insecurity due to the electromagnetic radiation which inherently occurs when a current flow changes in a conductor. Thus, pulses from digital circuitry might be picked up by a receiver, and the plaintext data reconstructed. The general concept can be extended to the idea that plaintext data pulses may escape on power lines, or as a faint background signal to encrypted data, or in any other unexpected electronic way.

Some amount of current change seems inevitable when switching occurs, and modern digital computation is based on such switching. But the amount of electromagnetic radiation emitted depends upon the amount of current switched, the length of the conductor, and the speed of the switching (that is, dI/dt, or the rate-of-change in current). In normal processing the amount of radiated energy is very small, but the value can be much larger when fast power drivers are used to send signals across cables of some length. This typically results in broadband noise which can be sensed with a shortwave receiver, a television, or an AM portable radio. Such receivers can be used to monitor attempts at improving the shielding.

Ideally, equipment would be fully enclosed in an electrically unbroken conducting surface. In practice, the conductive enclosure may be sheet metal or screening, with holes for shielded cables. Shielding occurs not primarily from metal per se, but instead from the flow of electrical current in that metal. When an electromagnetic wave passes through a conductive surface, it induces a current, and that current change creates a similar but opposing electromagnetic wave which nearly cancels the original. The metallic surface must conduct in all directions to properly neutralize waves at every location and from every direction.

Stock computer enclosures often have huge unshielded openings which are hidden by a plastic cover. These should be covered with metal plates or screening, making sure that good electrical contact occurs at all places around the edges. Note that assuring good electrical connections can be difficult with aluminum, which naturally forms a thin but hard and non-conductive surface oxide. It is important to actually monitor emission levels with receivers both before and after any change, and extreme success can be very difficult. We can at least make sure that the shielding is tight (that it electrically conducts to all the surrounding metal), that it is as complete as possible, and that external cables are effectively shielded.

Cable shielding extends the conductive envelope around signal wires and into the envelope surrounding the equipment the wire goes to. Any electromagnetic radiation from within a shield will tend to produce an opposing current in the shield conductor which will "cancel" the original radiation. But if a cable shield is not connected at both ends, no opposing current can flow, and no electromagnetic shielding will occur, despite having a metallic "shield" around the cable. It is thus necessary to assure that each external cable has a shield, and that the shield is connected to a conductive enclosure at both ends. (Note that some equipment may have an isolating capacitor between the shield and chassis ground to minimize "ground loop" effects when the equipment at each end of the cable connects to different AC sockets.) When shielding is impossible, it can be useful to place ferrite beads or rings around cables to promote a balanced and therefore essentially non-radiating signal flow.

Perhaps the most worrisome emitter on a personal computer is the display cathode ray tube (CRT). Here we have a bundle of three electron beams, serially modulated, with reasonable current, switching quickly, and repeatedly tracing the exact same picture typically 60 times a second. This produces a recognizable substantial signal, and the repetition allows each display point to be compared across many different receptions, thus removing noise and increasing the effective range of the unintended communication. All things being equal, a liquid-crystal display should radiate a far smaller and also more-complex signal than a desktop CRT.

A passive electrical component composed of magnetically-coupled coils of wire. When AC flows through one coil or "primary," it creates a changing magnetic field which induces power in another coil. A transformer thus isolates power or signal, and also can change the voltage-to-current ratio, for example to "step down" line voltage for low-voltage use, or to "step up" low voltages for high-voltage devices (such as tubes or plasma devices).

An active semiconductor component which performs analog amplification.

Originally, a bipolar version with three terminals: Emitter (e), Collector (c), and Base (b). Current flow through the base-emitter junction (Ibe) is amplified by the current gain or beta (B) of the device in allowing current to flow through the collector-base junction and on through the emitter (Ice).

In a sense, a bipolar transistor consists of two back-to-back diodes: the base-collector junction (operated in reverse bias) and the base-emitter junction (operated in forward bias) which influence each other. Current through the base-emitter junction releases either electrons or "holes" which are then drawn to the collector junction by the higher potential there, thus increasing collector current. The current ratio between the base input and the collector output is amplification.

Field-Effect Transistors (FET's, as in MOSFET, etc.) have an extremely high input impedence, taking essentially no input current, and may be more easily fabricated in integrated circuits than bipolars. In an FET, Drain (d) and Source (s) contacts connect to a "doped" semiconductor channel. Extremely close to that channel, but still insulated from it, is a conductive area connected to a Gate (g) contact. Voltage on the gate creates an electrostatic field which interacts with current flowing in the drain-source channel, and can act to turn that current ON or OFF, depending on channel material (P or N), doping (enhancement or depletion), and gate polarity. Sometimes the drain and source terminals are interchangeable, and sometimes the source is connected to the substrate. Instead of an insulated gate, we can also have a reverse-biased diode junction, as in a JFET.

N-channel FET's generally work better than p-channel devices. JFET's can only have "depletion mode," which means that, with the gate grounded to the source, they are ON. N-channel JFET devices go OFF with a negative voltage on the gate. Normally, MOSFET devices are "enhancement mode" and are OFF with their gate grounded. N-channel MOSFET devices go ON with a positive voltage (0.5 to 5v) on the gate. Depletion mode n-channel MOSFET devices are possible, but not common.

The exchange in position of two elements. The most primitive possible permutation or re-ordering of elements. Any possible permutation can be constructed from a sequence of transpositions.

Trap Door
A cipher design feature, presumably planned, which allows the apparent strength of the design to be easily avoided by those who know the trick. Similar to back door.

Triple DES
The particular block cipher which is the U.S. Data Encryption Standard or DES, performed three times, with two or three different keys.

Truly Random
A random value or sequence derived from a physical source. Also called really random and physically random.

The assumption of a particular outcome in a dependence upon someone else. Trust is the basis for communications secrecy: While secrecy can involve keeping one's own secrets, communications secrecy almost inevitably involves at least a second party. We thus necessarily "trust" that party with the secret itself, to say nothing of cryptographic keys. It makes little sense to talk about secrecy in the absence of trust.

In a true security sense, it is impossible to fully trust anyone: Everyone has their weaknesses, their oversights, their own agendas. But normally "trust" involves some form of commitment by the other party to keep any secrets that occur. Normally the other party is constrained in some way, either by their own self-interest, or by contractual, legal, or other consequences of the failure of trust. The idea that there can be any realistic trust between two people who have never met, are not related, have no close friends in common, are not in the same employ, and are not contractually bound, can be a very dangerous delusion. It is important to recognize that no trust is without limit, and those limits are precisely the commitment of the other party, bolstered by the consequences of betrayal. Trust without consequences is necessarily a very weak trust.

Truth Table
Typically, a Boolean function expressed as the table of the value it will produce for each possible combination of input values.

Type I Error
In statistics, the rejection of a true null hypothesis.

Type II Error
In statistics, the acceptance of a false null hypothesis.


From the Latin for "one kind." Sometimes used to describe functions with a single argument, such as "the unary -" (the minus-sign), as opposed to subtraction, which presumably would be "binary," and that could get very confusing very fast. Thus, monadic may be a better choice. Also see: binary and dyadic.

Unexpected Distance
The values computed by a fast Walsh transform when calculating Boolean function nonlinearity as often used in S-box analysis.

Given any two random Boolean sequences of the same length, we "expect" to find about half of the bits the same, and about half different. This means that the expected Hamming distance between two sequences is half their length.

With respect to Boolean function nonlinearity, the expected distance is not only what we expect, it is also the best we can possibly do, because each affine Boolean function comes in both complemented and uncomplemented versions. So if more than half the bits differ between a random function and one version, then less than half must differ to the other version. This makes the expected distance the ideal reference point for nonlinearity.

Since the FWT automatically produces the difference between the expected distance and the distance to each possible affine Boolean function (of the given length), I call this the unexpected distance. Each term is positive or negative, depending on which version is more correlated to the given sequence, and the absolute value of this is a measure of linearity. But since we generally want nonlinearity, we typically subtract the unexpected value from half the length of the sequence.

Unicity Distance
The amount of ciphertext needed to uniquely identify the correct key and its associated plaintext (assuming a ciphertext-only attack and natural language plaintext). With less ciphertext than the unicity distance, multiple keys may produce decipherings which are each plausible messages, although only one of these would be the correct solution. As we increase the amount of ciphertext, many formerly-plausable keys are eliminated, because the plaintext they produce becomes identifiably different from the structure and redundancy we expect in a natural language.

"If a secrecy system with a finite key is used, and N letters of cryptogram intercepted, there will be, for the enemy, a certain set of messages with certain probabilities, that this cryptogram could represent. As N increases the field usually narrows down until eventually there is a unique 'solution' to the cryptogram; one message with probability essentially unity while all others are practically zero. A quantity H(N) is defined, called the equivocation, which measures in a statistical way how near the average cryptogram of N letters is to a unique solution; that is, how uncertain the enemy is of the original message after intercepting a cryptogram of N letters." [p.659]

"This gives a way of calculating approximately how much intercepted material is required to obtain a solution to the secrecy system. It appears . . . that with ordinary languages and the usual types of ciphers (not codes) this 'unicity distance' is approximately H(K)/D. Here H(K) is a number measuring the 'size' of the key space. If all keys are a priori equally likely, H(K) is the logarithm of the number of possible keys. D is the redundancy of the language . . . ." "In simple substitution with a random key H(K) is log10 26! or about 20 and D (in decimal digits per letter) is about .7 for English. Thus unicity occurs at about 30 letters." [p.660]

Shannon, C. 1949. Communication Theory of Secrecy Systems. Bell System Technical Journal. 28: 656-715.

Uniform Distribution
A probability distribution in which each possible value is equally likely. Also a "flat" or "even" distribution.

A uniform distribution is the most important distribution in cryptography. For example, a cryptographer strives to make every possible plaintext an equally likely interpretation of any ciphertext (see ideal secrecy). A cryptographer also strives to make every possible key equally likely, given any amount of known plaintext.

On the other hand, a uniform distribution with respect to one quality is not necessarily uniform with respect to another. For example, while keyed shuffling can provably produce any possible permutation with equal probability (a uniform distribution of different tables), those tables will have a Boolean function nonlinearity distribution which is decidedly not uniform. And we might well expect a different non-uniform distribution for every different quality we measure.

Variable Size Block Cipher

The ciphering concept described in U.S. Patent 5,727,062 (see the VSBC articles on the Ciphers By Ritter page).

A block cipher which supports ciphering in blocks of dynamically variable size. The block size may vary only in steps of some element size (for example, a byte), but blocks could be arbitrarily large.

Three characteristics distinguish a true variable size block cipher from designs which are merely imprecise about the size of block or element they support or the degree to which they support overall diffusion:

  1. A variable size block cipher is indefinitely extensible and has no theoretical block size limitation;

  2. A variable size block cipher can approach overall diffusion, such that each bit in the output block is a function of every bit in the input block; and

  3. A true variable size block cipher does not require additional steps (rounds) or layers to approach overall diffusion as the block size is expanded.

Also see Dynamic Substitution Combiner and Balanced Block Mixing.

The measure of electron "potential" in volts. Voltage is analogous to water pressure, as opposed to flow or current.

Walsh Functions

Walsh Functions are essentially the affine Boolean functions, although they are often represented with values {+1,-1). There are three different canonical orderings for these functions. The worth of these functions largely rests on their being a complete set of orthogonal functions. This allows any function to be represented as a correlation to each of the Walsh functions. This is a transform into an alternate basis which may be more useful for analysis or construction.

Also see: Fast Walsh-Hadamard Transform.

The weight of Boolean Function f is the number of 1's in the truth table of f.

An overly-cute description of making a signal or data more like white noise, with an equal amount of energy in each frequency. To make data more random-like.

White Noise
A random-like signal with a flat frequency spectrum, in which each frequency has the same magnitude. As opposed to pink noise, in which the frequency spectrum drops off with frequency. White noise is analogous to white light, which contains every possible color.

White noise is normally described as a relative power density in volts squared per hertz. White noise power varies directly with bandwidth, so white noise would have twice as much power in the next higher octave as in the current one. The introduction of a white noise audio signal can destroy high-frequency loudspeakers.

A thin, long conductor, often considered "ideally conductive" compared to other parts of a circuit.


Exclusive-OR. A Boolean logic function which is also mod 2 addition.

TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2024 AOH