Monthly Archives: June 2012

Securing Passwords, One Way Hashes, PBKDF2, PHP and You

Plain text passwords and simple one way hashes are not enough to protect your users. You need salt, pepper, and peanut butter. Am I crazy you ask? Maybe, but read on.

It happens to big huge companies (LinkedIn, Last.fm, eHarmony), the little guys, and everything in between. Databases get breached and passwords get hacked. It always surprises me when I hear about how many thousands of users had the password “password”, or that the target’s password hashes were cracked in a matter of hours or days- or worse, their passwords were plain text. At this point, it is so easy to make passwords pretty secure with just basic knowledge of cryptography and hashing. As a matter of fact, as a competent developer, you don’t need to know much at all about the how’s and why’s of crypto to secure your users’ data.

First, do not think you are safe because you run your passwords through MD5 or SHA-256. MD5 has been cracked and SHA-256 is barely better than storing their passwords in plain text. Cryptograhic hash functions are NOT password hash functions!

One Way Hashing

A one way hash performs a bunch of mathematical operations that transform input into a (mostly) unique output, called a digest. Because these operations are one way, you cannot ‘decrypt’ the output- you can’t turn a digest into the original input. Good cryptographic hash functions should not generate digests that are the same for different input. Additionally, when the input is changed, just slightly, the resulting digest should be very different.

A typical use case would be when a user signs up for a website and creates a password. The conscientious developer takes the plain text password, runs it through a hashing function (let’s say, MD5) and stores the result in the database. When the user goes to log in the next time they enter their password and the authentication mechanism runs it through MD5 and compares the result against what is stored in the database.

That sounds pretty safe, right? Wrong. It’s akin to locking the door and leaving the window open. If the database was stolen it might make it harder to infer anything about the passwords just by looking at the data, but it doesn’t really make it any harder to guess or “crack” the password.

Password Hash Functions

… are not the same as cryptographic hash functions

Just using a cryptographic function on a plain text password doesn’t defend it very well. There a number of major problems and threats that are not being avoided. The two biggest are speed and recognizability of hashes.

Hashing Speed

Cryptographic hash functions are used for lots of things, most of them have to do with fingerprinting and verifying data. They are designed to be very fast so that the encryption processes isn’t slowed down. This presents a big problem for password hashing. Speed. The faster a function creates a digest, the more frequently an attacker can guess the password and compare the output. MD5, for instance, is so fast that on basic hardware you could guess over 5 billion times per second. Think about it for a second, do you need that speed to allow your users to log in? When it takes 15 seconds to enter your username and password, a few second to log in, and a few seconds of perceived page load time, will they notice the difference between .000001 seconds or 1 second for the authentication mechanism? The answer is no, not to enough of a degree to degrade their experience. For password hashing, slower is good.

Recognizability of Hashes

What happens when 10,000 people all use “password” as their password? Their hashes are all the same! If you just get one account cracked, you automatically crack everyone else with the same hash. If an attacker has a huge, precomputed list of hashes (called a rainbow table), they can scan your database looking for any hashes that match. They’ve already cracked accounts without even guessing a password yet! They could have a huge percentage of your system’s passwords before ever once making a guess.

Fortunately though, there are a few relatively easy things you can do to make their life harder. You don’t need to do anything heroic and the code isn’t even that tricky. Heck, most of it already exists and is free to use.

Salting

Talk about low hanging fruit. All you have to do is add some random characters to their password (and keep track of them). A salt is a random sequence of data which is added to the hash function or the password string itself. Say you generated a salt “12345” and had a password “password”, you could put them together “password12345″ and run that through your hash function to produce a digest that wouldn’t be so easily given up. Every password should have its own salt and should be at minimum, 32 characters or more to make it harder to guess the digest.

This is a basic salt generation algorithm. Do NOT use this function for generating salts where you are trying to protect details like credit card numbers, or even email addresses for that matter. It’s a pretty poor implementation, really.

When we create a user password we’ll generate a salt, add it to the password string, hash the password to get a digest, then store the salt and digest in the database. To log the user in subsequently we could use functions like the following:

Password Stretching

Stretching is creating a digest of a digest (of a digest of a digest … of a digest … you get it.) If you create a digest of a password, then create a digest of that X number of times you can no longer simply create a digest (from a rainbow table or otherwise) and compare it directly to the digest that is stored in the database. To compare passwords you’ll have to run the exact same number of iterations if hashing digests to compare passwords. This is useful on multiple fronts: it slows things down and (in conjunction with salted passwords) your hashes no longer look the same as everyone else’s. It stands to reason that if hashing a password once takes X amount of time, hashing it twice will take approximately 2X. You’ve just cut in half the number of times an attacker can guess your passwords. Congratulations! A good system takes so long to process a single digest that guessing a password using brute force will take more than a lifetime.

Let’s modify our password hashing function:

Notice that I have re-salted every hash to add extra randomization to the digest… just another wrinkle to throw at an attacker.

Pepper

Additionally, you can have an application wide salt, called a pepper. Think of it as a salt for the salt, except this salt is unique only to the application, server, environment, or database.
You could use it like that hash('sha256', $pepper . $password . $salt);

Adaptive Key Derivation

Adaptive key derivation functions generate digests from passwords while applying salts and stretching. They implement many more wrinkles and are tested against attack vectors you may never think of- which is the important part. They are tested against attack vectors. Rolling your own cryptographic functions introduce a lot of unnecessary exposure and take more time than using generally accepted libraries, implementations and functions. I’m going to focus on the one I know best, PBKDF2. There are others such as bcrypt and mcrypt

Peanut Butter Keeps Dogs Friendly Too

PBKDF2 (Password-Based Key Derivation Function) is probably the most widely used derivation function. It is a container for a hash function, e.g. SHA-1 or RIPEMD,. For each input it applies a salt and iterates the hash many times in such a way that not much entropy (length and randomness) is lost. Primarily, it is done in such a way that it is SLOW to generate a single digest. The US government and NSA use this for generating strong encryption keys.

Adaptive keys are great first step, but remember, this is one tiny piece of securing user data.

Below is a very basic class I created that can be used for generating salts and digests through a variety of ways. You can download it here. This file will be updated regularly, so stay in touch!