|
Newsgroups: sci.crypt Path: netcom.com!grady From: grady@netcom.com (Grady Ward) Subject: Passphrase proto-FAQ Message-ID: <gradyCBIx4n.6n8@netcom.com> Organization: Moby lexicons X-Newsreader: TIN [version 1.1 PL8] Date: Tue, 10 Aug 1993 03:17:11 GMT Lines: 224 FAQ: How do I choose a good password or phrase? ANS: shocking nonsense makes the most sense With the intrinsic strength of some of the modern encryption, authentication, and message digest algorithms such as RSA, MD5, SHS and IDEA the user password or phrase is becoming more and more the focus of vulnerability. Considering even the early PGP 1.0 application for example, a Deputy with the Los Angeles Country Sheriff's Department admitted in early 1993 that both they and the FBI despaired of breaking the system except through a successful dictionary attack (trying many possible passwords or phrases from lists of probable choices and their variations) rather than "breaking" the underlying cryptographic algorithm mathematically. The fundamental reason why attacking or trying to guess the user's password or phrase will increasingly be the focus of cryptanalysis is that the user's choice of password may represent a much simpler cryptographic key than optimal for the encryption algorithm. This weakness of the user's password choice provides the cryptanalytic wedge. For example, suppose a user chooses the password 'david.' On the surface the entropy of this key (or the number of different equiprobable key states) appears to be five characters chosen from a set of twenty-six with replacements: 26^5 or 1.188 x 10^7. But since the user is apparently biased toward common given names, which a majority appear in lists numbering only 6,000-7,000 entries, the true entropy is undoubtedly much closer to 6.5 x 10^3, or about four orders of magnitude smaller than the raw length might suggest. (In fact this password probably possesses a much smaller entropy than even this for the very common name "david" would be one of the first names to be checked by an optimized dictionary attack program.) In other words, "entropy" is not a fixed physical quantity: the cryptanalyst can exploit whole meanings and contexts, not just byte frequencies, digraphs, or even whole-word correlations to reduce the entropy of the key space he or she is trying to explore. To thwart this avenue of attack we would like to discover a method of selecting passwords or phrases that have at least as many bits of entropy (or "hard-to- guessness") as the entropy of the cryptographic key of the underlying algorithm being used. To compare, DES (Data Encryption Standard) is believed to have about 54-55 bits (~4 x 10 ^16) of entropy while the IDEA algorithm is believed to have about 128 bits (~3.5 x 10^38) of entropy. The closer the entropy of the user's password or phrase is to the intrinsic entropy of the cryptographic key of the underlying algorithm being used, the more likely an attacker would need to search a substantially larger portion of the algorithm's key space in order to discover it. Unfortunately many documents suggest choosing passwords or phrases that are distinctly inferior to the latest methods. For example, one white paper widely archived on the internet suggests selecting an original password by constructing an acronym from a popular song lyric or from a line of script from, for example, the SF movie "Star Wars". Both of these ideas turn out to be weak because both the entire script to Stars Wars and entire sets of song lyrics to thousands of popular songs are available on-line to everyone and, in some case, are already embedded into "crack" dictionary attack programs. However the conflict between choosing an easy-to- remember key and choosing a key with a high level of entropy is not a hopeless task if we exploit mnemonic devices that have been known for a long time outside the field of cryptography. With the goal of making up a passphrase not included in any existing corpus yet very easy to remember, an effective technique the one known as "shocking nonsense." "Shocking nonsense" means to make up a short phrase or sentence that is both nonsensical and shocking in the culture of the user, that is, it contains grossly obscene, racist or impossible or other extreme juxtaposition of ideas. This technique is permissable because the passphrase, by its nature, ought never to be revealed to anyone with sensibilities to be offended. Further, shocking nonsense is unlikely to be duplicated anywhere because it does not describe a matter- of-fact that could be accidentally rediscovered by anyone else and the emotional evocation makes it difficult for the creator to forget. A relatively mild example of such shocking nonsense might be: "mollusks peck my galloping genitals ." The reader can undoubtedly make up many far more shocking examples for himself or herself... Even relatively short phrases offer acceptable entropy because the far larger "alphabet" pool of word symbols that may be chosen than characters form the Roman alphabet. Even choosing from a vocabulary of a few thousand words a five word phrase might have on the order of 58 to 60 bits of entropy -- more than what is needed for the DES algorithm, for example. If in the case an entire phrase cannot be used because the password is restricted to, say, eight alphanumeric characters, concatenating the first letters of a suitable shocking nonsense passphrase should usually give a better than reasonable starting point if followed by adding numeric and non-alphabetic characters. When you are permitted to use passphrases of arbitrary length (in PGP for example) it is not necessary to further perturb your 'shocking nonsense' passphrase to include numbers or special symbols because the pool of word choices is already very high. Not needing those special symbols or numbers (that are not intrinsically meaningful) makes the shocking nonsense passphrase that much easier to remember. Appendix A. For software developers For software developers designing "front-ends" or user interfaces to conventional short-password applications, very good results will come from permitting the user arbitrary length passphrases that are then "crunched" or processed using a strong digest algorithm such as the 160- bit SHS (Secure Hash Standard) or the 128-bit MD5 (Message Digest rev. 5). The interface program then chooses the appropriate number of bits from the digest and supplies them to the engine enforcing a short password. This 'key crunching' technique will assure the developer that even the short password key space will have a far greater opportunity of being fully exploited by the user. Appendix B. A tool to experimentally investigate entropy A practical Unix tool for investigating the entropy of typical user keys can be found in Wu and Manber's 'agrep' (approximate grep) similarity pattern matching tool available in C source from cs.arizona.edu [192.12.69.5]. This tool can determine the "edit distance," that is, the number of insertions, substitutions, or deletions that would be required of an arbitrary pattern in order for it to match any of a large corpus of words or phrases, say the usr/dict word list, or over the set of Star Trek trivia archives. The user can then adjust the pattern to give an arbitrary high threshold difference between it and common words and phrases in the corpus to make crack programs that systematically vary known strings less likely to succeed. It is often surprising to discover that a substring pattern like "hxirtes" is only of edit distance two from as many as forty separate words ranging from "bushfires" to "whitest." Certainly no password or phrase ought to be chosen as a working password or phrase that is within two or fewer edit distance from a known string or substring in any on-line collection. select references [selection and of passwords in differing threat environments] Department of Defense Password Management Guideline CSC-STD-002-85 published by the Computer Security Center of the Department of Defense Fort George G. Meade, MD 20755 [discovering weak passwords] The COPS Security Checker System by D. Farmer, E. Spafford Purdue University Technical Report CSD-TR-993 West Lafayette, IN 47907 [an example of automated key cracking] With Microscope and Tweezers: An Analysis of the Internet Virus of 1988 by M. Eichin, J. Rochlis, Massachusetts Institute of Technology Cambridge, MA 02139 [password vulnerabilities in distributed systems] Computer Emergency Response - An International Problem by R. Pethia, K. van Wyk CERT/Software Engineering Institute Carnegie Mellon University, Pittsburgh, PA 15213 [key metrics and the MD5 message digest algorithm] Answers to Frequently Asked Questions About Today's Cryptography by Paul Fahn RSA Laboratories, Redwood City, CA 94065 (available through anonymous FTP from rsa.com) [implementation details of the MD5 message digest algorithm] RFC-1321 ('request for comments') The MD5 algorithm by R. Rivest MIT Center for Computer Science (available on the internet from gatekeeper.dec.com) [implementation details of the NIST Secure Hash Standard] The Secure Hash Standard (SHS) Specification, Jan 1992 DRAFT Federal Information Processing Standards Publication YY Director, Computer Systems Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899 (The SHS was approved as a Federal Standard in May, 1993) [other possible approaches to password generation] Automated Password Generator, NIST publication ???? Director, Computer Systems Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899 (a pronounceable password algorithm using DES) v 1.0 alpha comments on this FAQ are solicited; e-mail grady@netcom.com -- grady@netcom.com voice/fax (707) 826-7715 compiler of Moby lexical databases, including Moby Part-of-Speech, second edition: 230,000 entries, priority marked finger grady@netcom.com or e-mail for more information. --