Theoretical and Practical Password Entropy

Published: 2011-08-10
Last Updated: 2011-08-10 20:08:07 UTC
by Johannes Ullrich (Version: 1)
16 comment(s)

We got a number of submissions pointing to today's XKCD cartoon [1] . I think the cartoon is great, and illustrates a nice dilemma in password security. Yes, I know passwords don't work, but we still all use them and we still have to come up with reasonable passwords.

Even if you are using a password safe tool that comes up with new random passwords for each application and website, you still need to remember the password for the password safe, and there are a few applications (e.g. logging in to your system) that can't be covered by a password safe.

The basic dilemma is that you need to come up with a password that is hard to guess for others but easy enough for you to remember. Most password policies try to enforce a hard to guess password by forcing you to extend the range of characters from which you pick (different case letters, numbers, special characters). However, in real life, this may actually reduce the space of "memorable" passwords, or the total number of possible passwords.

Pass phrases, as suggested by the cartoon, are one solution. But once an attacker knows that you use a pass phrase, the key space is all for sudden limited again. There has been some research showing that a library of 3 word phrases pulled from wikipedia makes a decent dictionary to crack these passwords.

The qualify of a password is usually expressed in "bits of entropy". The "bits of entropy" are calculated by the number of bits it would take to represent all possible passwords. Lets look at some common schemes:

a 4 digit PIN: 10,000 possible passwords, or 13.3 bits (ln2(10,000)=13.3)
12 characters using the full 95 characters ASCII set: 5.4 10^23, or 78.8 bits. (this is the current NIST recommendation)

Pass phrases are harder to evaluate. It depends on the size of the vocabulary of the user, and of course the constraints of grammar. People will likely not choose some random words, but a phrase that makes some sense to them. One model that can be used to obtain a passphrase is called "Diceware", but it assumes random phrases from 6^5 words (7,776).If you consider Diceware's 7,776 words, you would need 6 words to arrive at the same 77.5 bits, close to strength that NIST asks for.

 What it all comes down to: How are people actually selecting passwords? People make pretty bad random number generators, in particular if you ask them to remember the result. A good password cracking algorithm takes this into account and tailors the password list based on password requirements and the targets background. For example, for web application pen testing, the simple ruby script "cewl" will create a custom password list from words it finds on the targets website. In past tests, I was easily able to double my password cracking success using this technique if compared to normal dictionaries.

In order to solve this, we need to figure out what passwords people really use. How about asking them for their password and offering them a candy bar in return :). And then there is always another XKCD cartoon for you [2]



Johannes B. Ullrich, Ph.D.
SANS Technology Institute

Keywords: passwords xkcd
16 comment(s)


but does the diceware list really provide that much entropy - if you have a cracking util that uses the same vocabulary as dicewire and basically combines the same 7776 words into pass phrases and hashes them out you should be able to match hashes pretty quick. A Hydra type attack on a login might be a different story somewhat.
How about using phrases but intentionally misspelling a word? I'm not talking about l33t speaking it, I'm talking about making "invisible astronaut hamburger" into "invasible astronaut hamburger" - that would break dictionary attacks.
I doubt that passphrase scheme (or any other) could work in practice -- because we need to create and recall a unique passphrase for every account on every website, of course ;)
Oh, and of course passphrases don't work with a ridiculous max. password length like 8 or 10 characters (I've even seen restrictions on character set, case-insensitivity, or even numeric-only). Or when a password can be reset just using one or two 'personal questions' that everyone knows the answers to, or a phone call to customer services saying 'I can't log in'
Steven, you're probably running up against a system with a mainframe backend. Despite what the mainframe proponents say about "mainframe security", the reality is that their password controls in general suck. The 2-year old mainframe my company uses can only accept alphanumeric passwords with a maximum length of eight characters. Thanks, Unisys. We get beat up regularly by our customers because your security controls suck.
"But once an attacker knows that you use a pass phrase, the key space is all for sudden limited again."
there are perhaps ~1,000 common words

If you pick 4 random ones, there are
1000 * 1000 * 1000 * 1000 = 1000 ^ 4 = 10^12 possible choices.
That's approximately 2^39 or about 39 bits worth of 'choice'.

The problem is, that amount is only the entropy if your selection actually _IS_ truly random, which means you use a computer to make a truly random selection of 4 words; don't pick words off the top of your head.

As a human, there is likely to be some bias, in the words you would choose, reducing the entropy; if you pick them without a randomizer, there will be less than 39 bits of entropy.

A punctuation mark added, and a random letter capitalized somewhere in the passphrase also helps.

A combination of _both_ approaches is possible, without going overboard and making the password hard to remember.
I'm not sure any of this is of any real world relevance.

1. If you implement account lockouts, even if for ten or twenty attempts, entropy doesn't matter.

2. If they've swiped a password file, they've got all the time in the world without any lockout concerns and a graphics card password cracker can try billions of permutations per second. If it's taking too long, add in a few more graphics cards.

Passwords are passé. The only reason companies use them is because they're free, and you get what you pay for.
I've got a spreadsheet that calculates permutations out to 25 characters. I've just added in entropy calculations, and NIST's recommendations actually do not often pan out to 78.8 bits in a practical environment because there will usually be a requirement of using more than one class of characters.

For using all of the character types, the max entropy is 70.3 bits at 12 characters. There are 1.48E21 possible permutations here. Assuming a random distribution, the random time to brute force at 3.3 billion passwords per second (achievable with current GPUs) is an average of about 24.8 GPU-years. To get that down to a month, you would need about 27,500 cards--not impossible, but not trivial, either. This increases to an average of 17,900 GPU-years at 13 characters.

I don't yet have complete subsets (e.g., 2 of 4 character classes) as I've not worked out the spreadsheet structure, but it still won't reach the 78.8 bits recommended by NIST. A quick calculation suggests 72.2 bits, but I'm not swearing on that number.

I've thought about doing some research into passphrase strengths where spelling is correct, but the calculations get daunting. The Oxford English Dictionary discusses word count just for English[1] and comes up with 250,000 to 750,000 words, depending on context. Then you get into sentence structure, vocabulary statistics, and regional variations, and it becomes clear why, when no other methods are available, passphrases are by far the better solution.
I agree with Jarrod and Jason. You could use passwords like "123456890isQUITEeasyTOremember!" (note the missing 7).

Have you added one word per word? I mean; There are permutations of words, like lowercase, uppercase, first letter uppercase which would at least tripple the amount. Have you added whitespaces? What if someone replaces Whitespace with lets say dots or 'x' or whatever. You see, the complexity is even higher!
The number of words in the Oxford English Dictionary is not relevant here. What is relevant here is the number of words in the average person vocabulary. That is only a few thousand at best.

Next you can decimate the permutations, because most people will choose a grammatically correct phrase, e.g. subject transitive-verb object, or subject intransitive-verb adjective. Also you might need to ban "I", "The" and "and".

Diary Archives