|
.oO Phrack 49 Oo. Volume Seven, Issue Forty-Nine 10 of 16 A Steganography Implementation Improvement Proposal by: cjm1@concentric.net [ For those of you who do not know, steganography is cryptographic technique that simply hides messages inside of messages. The sender composes an innocuous message and then, using one of many tactics, injects the secret message into it. Some techniques involve: invisible inks, character distortion, handwriting differences, word/letter frequency doping, bit flipping, etc... The method the author discusses hinges upon a well known steganographic implementation, low-order bit flipping in graphic images. -d9 ] Steganography is a technique for hiding data in other data. The general method is to flip bits so that reading the low-order bit of each of 8-bytes gets one a character. This allows one to use a picture or a sound file and hide data, resulting in a small bit of hopefully unnoticeable noise in the data and a safely hidden cache of data that can later be extracted. This paper details a method for making steganographically hidden data more safe, by using pseudo-random dispersion. Ordinarily, if someone suspects that you have data hidden in, say, a GIF file, they can simply run the appropriate extractor and find the data. If the data is not encrypted, it will be plain for anyone to see. This can be ameliorated by using a simple password protection scheme, hiding the password in the GIF as a header, encrypting it first with itself. If someone does not know the password, they cannot extract the data. This is of course reasonably safe, depending on the encryption scheme used, and I recommend it. But, the hidden data can be made even safer. Pseudo-random dispersion works by hiding a password, and a seed for a random-number-generator in the encrypted header. then, a random number of bytes are passed by, before a low-order bit is flipped. To do this, one must first calculate how many bytes a bit can take up for itself. For instance, to hide an 800 character message in a GIF would mean each character needs 8 bytes (8 bits per character, 1 byte per low-order bit), so you need 6,400 bytes of data to hide the message in, 8 bytes per character. Let's say we have a GIF that is 10 times this size: 64,000 bytes. Thus we have 80 bytes per character to hide data in. Since each bit takes a byte, we have 10 bytes per bit to hide data in! Therefore, if we take a pseudo-random number between 1 and 10, and use that byte to hide our low-order bit in, we have achieved a message dispersed through the GIF in a pseudo-random fashion, much harder to extract. A message in which each byte has a bit which is significant to the steganographically hidden message can be extracted with ease relative to a message in which there are 10 possible bytes for each bit of each character. The later is exponentially harder to extract, given no esoteric knowledge. A slight improvement can be made to this algorithm. By re-calculating the number of available bytes left for each bit after each bit is hidden, the data is dispersed more evenly throughout the file, instead of being bunched up at the start, which would be a normal occurrence. If you use pseudo-random number generator, picking numbers from 0-9, over time, the values will smooth to 5. This will cause the hidden message to be clustered at the beginning of the GIF. By re-calculating each time the number of available bytes left we spread the data out throughout the file, with the added bonus that later bits will be further spread apart than earlier ones, resulting in possible search spaces of 20, 30, 100, or even 1,000 possible bytes per bit. This too serves to make the data much harder to extract. I recommend a header large enough for an 8 character ASCII password, an integral random-number seed, an integral version number, and an place holder left for future uses. The version number allows us to tweak the algorithm and still be able to be compatible with past versions of the program. The header should be encrypted and undispersed (ie: 1 byte per bit of data) since we haven't seeded the random-number generator yet for dispersion purposes. It is useful to make the extractor in such a way that it always extracts something, regardless of the password being correct or not. Doing this means that it is impossible to tell if you have guessed a correct password and gotten encrypted data out, or merely gotten out garbage that looks like encrypted data. Use of a password can also be made optional, so that none is necessary for extraction. A simple default password can be used in these cases. When hiding encrypted data, there is no difference to the naked eye between what is extracted and what is garbage, so no password is strictly necessary. This means no password has to be remembered, or transmitted to other parties. A third party cannot tell if a real password has been used or not. It is important for safety purposes to not hide the default password in the header if no password is used. Otherwise, a simple match can be made by anyone who knows the default password.