TUCoPS :: Crypto :: cryptol1.txt

TUCoPS :: Crypto :: cryptol1.txt
Cryptology Part 1

 Cryptology - Part 1 - Phrost Byte

 - Introduction -

 Cryptology came from the need to hide and conceal information from prying
 eyes, be it war tactics, army commands, secrets, or even directions to hidden
 treasure (see the famous Beale ciphers).

 The name cryptology is a combination of the Greek words cruptos (hidden) and
 logos (study, science). Crytology comprises both the enciphering (turning
 readable text into unreadable text to those who dont know the key) and
 deciphering (turning the unreable text to readable text for those with the
 key) of data. Crytology can be split into two seperate areas;
 cyrtography - dealing with techniques of concealing data based on a key, and
 cryptanalysis - the deciphering of data into readable text without knowing
 the key.

 When party A wants to send a message to party B without party C knowing,
 they hide the message by means of encryption (also called encipher / 
 encipherment). When party B recieves the message they decrypt it (also
 called decipher / decipherment) to read it's contents. The message before
 encryption is known as the plaintext, and the message after encryption is
 known as the ciphertext.

   Plaintext             Ciphertext             Original Plaintext
   ---------> Encryption ----------> Decryption ------------------>

 The method of encryption and decryption is carried out using a crytographic
 algorithm, which is also known as a cipher. A cipher is a mathematical
 function that both encrypts and decrypts a message with the known (secret)
 key. 


 - Classical Ciphers -
 
 Classical ciphers have been used long throughout history, and were most
 popular during the second world war. With the invention of the computer,
 their effectiveness and usefullness diminished, and were replaced with far
 superiour number based ciphers. Classical ciphers are character based. They
 involve the substitution of one character with another, or the transposition
 of characters with one another. Even with the advent of computers, classical
 ciphers can still be used effectively. They are often incorperated in more
 modern crypto-systems, or combined in succession on data.


 - Substitution Ciphers -

 As mentioned before, a substitution cipher is one in which each character in
 the plaintext is replaced with another character in the ciphertext. There are
 four basic types of substitution ciphers: 

  Monoalphabetic Substitution - a character in the plaintext is substituted
  with a character from a corresponding ciphertext. The cipher alphabet is
  fixed throughout encryption. eg. Caesar Cipher, ROT13.

  Homophonic Substitution - a single character in the plaintext can be
  represented by one or several characters in the corresponding ciphertext.
  eg 'e' in the plaintext can be represented by six characters in the
  ciphertext. This is a type of monoalphabetic cipher.

  Polyalphabetic Substitution - one in which the cipher alphabet changes
  during encryption. The alphabet used can depend on the position of each
  character of the plaintext, and the key. eg. Vigenere Cipher.

  Polygram Substitution - blocks of characters are encrypted in groups. eg.
  in the Playfair Cipher, characters are grouped into twos, and then
  encrypted.
 

 - Monoalphabetic Substitution Ciphers -

 Monoalphabetic ciphers are the easiest to implement and to cryptoanalise. One
 of the simplest and most common is the Caesar cipher, which was named after
 Julis Caesar. The whole alphabet is simple shifted a few positions, and in
 the case of the Caesar cipher it was by three places:

   Plain Alphabet:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   Cipher Alphabet: D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

   Sample Plaintext:  may the force be with you
   Sample Ciphertext: PDB WKH IRUFI EH ZLWK BRX
   (The common standard is to have ciphertext in uppercase.)

 ROT13 is similar to the Caesar cipher, but the letters are shifted (ROTated)
 thirteen places. The order of characters in the Caesar and ROT13 ciphers do
 not change, so there is only 25 possible keys. Due to the small number of
 possible keys, the Caesar cipher is open to a brute force attack. It would
 not take long to cycle through the various 25 keys until intelligable text
 is given. A superior method would be to create a random cipher alphabet. This
 further increases the possible keys to more than 400 000 000 000 000 000 000
 000 000. For example:

   Plain Alphabet:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   Cipher Alphabet: S C E J T I Q L P D V B N W Z O X K J U A F H G M R

   Sample Plaintext:  may the force be with you
   Sample Ciphertext: NSM ULT IZKET CT HPUL MZA

 A random cipher alphabet like this is hard to remember, and encryption keys
 should never be written down. An easier method is to use a key-sentence. A
 simple phrase or even a word is used, and each time a new letter appears in
 the key-sentence, it is written down. Once all letters in the phrase or word
 have been used, the remaining letters of the alphabet are added onto the 
 cipher alphabet. For example:

   Key-Phrase: I am Queeg. Red Dwarf backup computer.
   Removing Duplicates: IAMQUEGRDWFBCKPOT
   Cipher Alphabet: I A M Q U E G R D W F B C K P O T H J L N S V X Y Z

 With the large number of possible keys that can be used, a monoalphabetic
 cipher is relativley easy to break, and even easier if the plain text
 language is known. A technique known as frequency analysis is used, and is
 the foundation in which most ciphers are broken. 


 - Frequency Analysis -

 Simply, the most frequent occuring letter in the cipher text will represent
 one of the most frequent occuring letters in the plaintext's language
 alphabet. The second most frequent letter in the cipher text will represent
 one of the second most frequent occuring letter in the plaintext language
 alphabet and so on.

   Table of relative frequencies for English
   (compiled by H. Beker and F. Piper, using various passages)

   Letter 	Percentage			Letter	Percentage
   -------------------			----------------------
   a		8.2				n		6.7
   b		1.5				o		7.5
   c		2.8				p		1.9
   d		4.3				q		0.1
   e		12.7				r		6.0
   f		2.2				s		6.3
   g		2.0				t		9.1
   h		6.1				u		2.8
   i		7.0				v		1.0
   j		0.2				w		2.4
   k		0.8				x		0.2
   l		4.0				y		2.0
   m		2.4				z		0.1

 The letters should not be taken as is (ie, the most frequent letter in the 
 cipher text IS the most frequent letter in the plain text). The surroundings
 of the letters in question should be examined. The can be done by looking at
 how letters interact with one another. For example, in English the letter Q
 is pretty much garunteed to be followed by a U, and the letter H frequently
 follows the letter E (the, then, they), but rarely after. Most commonly
 occuring letter combinations is also something to look at such as repeated
 letters, diagrams (two letter combinations) and trigrams (three letter
 combinations).

   Repeats Order: SS, EE, TT, FF, LL, MM, OO
   Digram Order: TH, HE, AN, IN, ER, RE, ES, ON, EA, TI, AT, ST, EN, ND, OR
   Trigram Order: THE, AND, THA, ENT, ION, TIO, FOR, NDE

 If the cipher text contains spaces between words, plaintext words can easily
 be obtained using frequency analysis, and knowledge of common words.

   Single Letter Words: A and I (these are the only two in English)
   Double Letter Words: OF, TO, IN, IT, IS, BE, AS, AT, SO, WE, HE, BY, OR...
   Three Letter Words: THE, AND

 Once a few letter have been picked out, and partial words start to form,
 decipherment proceeds rapidly.


 - Conclusion -

 That ends the first part to Cryptology. Next issue I will explain Homophonic
 ciphers and how to crack them, and possibly Polyalphabetic ciphers. Until
 then try your hand at cracking the ciphers at the end of this ezine which 
 incorperate various tricks to make them progresivly harder.


 - References -

 Applied Cryptography - Bruce Schneier 
 Basic Method of Cryptography - Van Der Ludde
 The Code Book - Simon Singh
 Xenos - Lanaki