Hashing Passwords

Published: 2011-06-28
Last Updated: 2011-06-28 13:15:45 UTC
by Johannes Ullrich (Version: 1)
9 comment(s)

After talking about SQL Injection, this is the second part of the mini series to help you protect yourself from simple persistent attacks as we have seen them in the last couple months. A common MO employed in these attacks is to steal passwords from a database via sql injection. Later, the attacker will try to use these passwords to break into other sites for which users may choose the same password. Of course, part of the problem is password reuse. But for now, we will focus on the hashing of passwords to make it harder for an attacker to retrieve a users plain text password.

First of all: What is hashing? According to NIST, "A hash algorithm (alternatively, hash "function") takes binary data, called the message, and produces a condensed representation, called the message digest. A cryptographic hash algorithm is a hash algorithm that is designed to achieve certain security properties." [1] A good cryptographic hash will make it hard to find two messages with the same hash, or find clear text for a specific hash value. Common hashing algorithms are MD-5 (old), or the "Secure Hashing Algorithms" (SHA) family of hashing algorithms (SHA-1, SHA-256...). Another common but less popular algorithm is RIPE-MD.

Storing a password as a hash will make it difficult to figure out the actual password a user used. In order to verify the password, it is first hashed, then compared to the stored hash in the database.

A hash isn't fool proof. All hashes are vulnerable to brute forcing. If I can get the hash, I can try various passwords, hash them and check if they match. I may actually end up with a different password then the correct one. Hashes do have collisions (same hash for two different plain text values).  A good hash is slow enough to calculate to make brute forcing difficult. Brute forcing can be improved by using databases with pre-calculated hash values, or so called rainbow tables. These systems reduce the brute forcing to a simple database look-up, but require storage space. Rainbow tables are practical for strings up to 10 characters in length (lower case alpha numeric). Of course, the size of a rainbow table will increase fast as the length of the plain text increases or the complexity of the plain text increases .

Probably the most important defense against rainbow tables is the idea of introducing a "salt". First of all a "salt" will ensure that two users who happen to use the same password, will end up with a different hash. A salt can also be used to increase the length of the plain text beyond the point where rainbow tables become practical.

In order to use a "salt", the salt value and the users password are first concatenated, then the string is hashed.

Another trick to harden a hash is to just apply the same algorithm multiple times. For example, if we take the SHA-1 algorithm, and apply it 100 times, we will slow down a brute force attempt by a factor of 100. However, the delay in validating an individual password will be hard to notice.

When selecting an algorithm to hash passwords, it is important to select carefully as it is difficult to change the algorithm later. You will have to ask users to change their password if you do as you no longer know what password they picked.

Here a proposal to create difficult to reverse hashes with salt:

- as salt, do use a complex string that is different for each user. I like to use the username or email address (of course, this means the user will have to enter their password whenever they change their e-mail address, but that is usually a good idea anyway). You could just create a salt for each user and store it with the hash (similar to what the unix /etc/shadow file does).

- first, hash the salt (e-mail address) and the password by themselves. This way, we end up with simple fixed length strings. We no longer have to worry about odd characters in either.

- concatenate the two hashes, and hash them again.

So the complete formula to create our password hash would look like (using sha1 as an example, you could also mix and match hashing algorithms):

sha1(sha1(password)+sha1(username))

You could also add a secret, in addition to the salt. If the secret is not stored in the database, it would not be easily reachable via a SQL injection exploit (yes, you can use them to read files, but it requires sufficient privileges). The formula would now look like:

sha1(sha1(password)+sha1(username)+secret)

Introducing minimum password strength requirements may also help, but can also lead to annoyed users. The best defense against brute forcing a password hash is a long password using a diverse character set. Even if you do not require strong passwords, you should at least not restrict the length or the character set. Hashing the password as soon as received (for example as part of input validation) will help mitigate any risks due to odd characters a user may use. Passwords should never be echoed back to the user.

As a user, how do you know if your password is hashed? There isn't a bullet proof way to figure it out. But there are some indications that it is not hashed:

- the length or characters you are allowed to use is limited. If the password is hashed, it doesn't matter how long the original password was.
- As part of the password recovery, the site is returning your old password (very bad... and well, proof that the password was not hashed)

For the paranoid, you may want to do the hashing on the client side (javascript) . This way, the server never receives the plain text password. We do this here for the ISC website on our login form [2]. This implementation falls back to server side hashed passwords if the user does not enable javascript.

If you want to read more about this, Heise.de recently published a nice article about password hashing [3]. This is of course also a topic we cover in our secure coding / defending web application classes like DEV522.

NIST is currently finalizing a competition to come up with a new hashing standard, that will be known as SHA-3. The "winner" should be announced in 2012. Until then, SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 are suggested. I usually recommend to stick to these standards. As programming languages change, it is likely that they will keep supporting these standard algorithms. Other less popular algorithms may on the other hand be dropped.

[1] http://csrc.nist.gov/groups/ST/hash/index.html
[2] https://isc.sans.org/login.html
[3] http://www.h-online.com/security/features/Storing-passwords-in-uncrackable-form-1255576.html

 

------
Johannes B. Ullrich, Ph.D.
SANS Technology Institute
Twitter

Keywords: hash passwords
9 comment(s)

Comments

I would like to point out that using a hashing algorithm multiple times does not actually strengthen the hash but weakens (which often seems counter-intuitive). The problem is that the probability of collisions compounds at each step making collisions far more likely after 100 hashings than just one. The easiest explanation is take two different messages and if they collide on the same step at any point then every follow on step will also be the same because they have in essence become the same message for the next iteration. The best way to guarantee more security is to use a hash with a longer digest.
Hashing is also very useful when passing messages through URLs, the recent banking exploit whereby attackers were able to traverse accounts simply by changing the plaintext account numbers in the URL string is a good example.

Security through obscurity/obfuscation and all that. :)
I'm not sure why recommending using SHA-xxx a few times; it's not really a defense against brute forcing.

I would recommend using a PBKDF2 key derivation function, with AES256 as the cryptographic function, and a bare minimum of 1500 rounds, on all stored password hashes.

Password Based Key Derivation Function 2 (PBKDF2) is an excellent suggestion. This algorithm is typically used to "mix up" a user provided password to create better encryption keys. The function has a parameter specifying how many iterations should be performed. PBKDF2 is also known as PKCS #5 and implmeent in commonly used crypto libraries
Honestly, it does not make a lot of sense to simply hash a password on the client side and then transmit it. The point of hashing is to prevent an attacker from being able to replay stolen authentication data, particularly when stolen from a database of passwords. If you hash on the client side, then an attacker doesn't need to crack the hash to log in - he just needs a custom client-side script that replays a stolen hash.

If, on the other hand, the client hashes the password and then uses it in a one-time password protocol (typically a challenge-response), then you have some good security cooking.

Rick at cryptosmith.com.
In this case, a "digest authentication like" algorithm is used. The server is sending a unique random nonce, and the client includes it in the hash. That way, the hash is different each time. The main point is to protect the password itself as we all know it is likely being reused on other sites.
Yeah. "challenge response" equals "digest authentication like" - but I've also seen implementations where they literally did nothing more than hash the password and transmit it as the credential.
John the Ripper can be run with "john -test" to compare the relative ease of cracking common types of password hash on your particular hardware. On my system, OpenBSD's Blowfish password hashes (with 32 rounds) are the most computationally expensive, owing to the complex key schedule of that cipher. I think NetBSD now uses it with 128 rounds for its password hashes.

"john"'s comment above about the increasing risk of collision after many rounds is interesting, but still, surely a collision must occur at a specific round of hashing in order for the end result to be the same as for some other value.
Ironically, the digest authentication protects against interception attacks - the same as SSL - but the shared secret is vulnerable to database theft. If the attacker retrieves the shared secret from the database, then the attacker can perform digest authentication. If the attacker retrieves a hashed password then the attacker still needs to guess the unhashed password to perform a successful masquerade.

Diary Archives