Securing Passwords, One Way Hashes, PBKDF2, PHP and You
Plain text passwords and simple one way hashes are not enough to protect your users. You need salt, pepper, and peanut butter. Am I crazy you ask? Maybe, but read on.
It happens to big huge companies (LinkedIn, Last.fm, eHarmony), the little guys, and everything in between. Databases get breached and passwords get hacked. It always surprises me when I hear about how many thousands of users had the password “password”, or that the target’s password hashes were cracked in a matter of hours or days- or worse, their passwords were plain text. At this point, it is so easy to make passwords pretty secure with just basic knowledge of cryptography and hashing. As a matter of fact, as a competent developer, you don’t need to know much at all about the how’s and why’s of crypto to secure your users’ data.
First, do not think you are safe because you run your passwords through MD5 or SHA-256. MD5 has been cracked and SHA-256 is barely better than storing their passwords in plain text. Cryptograhic hash functions are NOT password hash functions!
One Way Hashing
A one way hash performs a bunch of mathematical operations that transform input into a (mostly) unique output, called a digest. Because these operations are one way, you cannot ‘decrypt’ the output- you can’t turn a digest into the original input. Good cryptographic hash functions should not generate digests that are the same for different input. Additionally, when the input is changed, just slightly, the resulting digest should be very different.
A typical use case would be when a user signs up for a website and creates a password. The conscientious developer takes the plain text password, runs it through a hashing function (let’s say, MD5) and stores the result in the database. When the user goes to log in the next time they enter their password and the authentication mechanism runs it through MD5 and compares the result against what is stored in the database.
That sounds pretty safe, right? Wrong. It’s akin to locking the door and leaving the window open. If the database was stolen it might make it harder to infer anything about the passwords just by looking at the data, but it doesn’t really make it any harder to guess or “crack” the password.
Password Hash Functions
… are not the same as cryptographic hash functions
Just using a cryptographic function on a plain text password doesn’t defend it very well. There a number of major problems and threats that are not being avoided. The two biggest are speed and recognizability of hashes.
Hashing Speed
Cryptographic hash functions are used for lots of things, most of them have to do with fingerprinting and verifying data. They are designed to be very fast so that the encryption processes isn’t slowed down. This presents a big problem for password hashing. Speed. The faster a function creates a digest, the more frequently an attacker can guess the password and compare the output. MD5, for instance, is so fast that on basic hardware you could guess over 5 billion times per second. Think about it for a second, do you need that speed to allow your users to log in? When it takes 15 seconds to enter your username and password, a few second to log in, and a few seconds of perceived page load time, will they notice the difference between .000001 seconds or 1 second for the authentication mechanism? The answer is no, not to enough of a degree to degrade their experience. For password hashing, slower is good.
Recognizability of Hashes
What happens when 10,000 people all use “password” as their password? Their hashes are all the same! If you just get one account cracked, you automatically crack everyone else with the same hash. If an attacker has a huge, precomputed list of hashes (called a rainbow table), they can scan your database looking for any hashes that match. They’ve already cracked accounts without even guessing a password yet! They could have a huge percentage of your system’s passwords before ever once making a guess.
Fortunately though, there are a few relatively easy things you can do to make their life harder. You don’t need to do anything heroic and the code isn’t even that tricky. Heck, most of it already exists and is free to use.
Salting
Talk about low hanging fruit. All you have to do is add some random characters to their password (and keep track of them). A salt is a random sequence of data which is added to the hash function or the password string itself. Say you generated a salt “12345″ and had a password “password”, you could put them together “password12345″ and run that through your hash function to produce a digest that wouldn’t be so easily given up. Every password should have its own salt and should be at minimum, 32 characters or more to make it harder to guess the digest.
This is a basic salt generation algorithm. Do NOT use this function for generating salts where you are trying to protect details like credit card numbers, or even email addresses for that matter. It’s a pretty poor implementation, really.
public static function CreateSalt($length = 128, $validChars = null)
{
$salt = '';
list($usec, $sec) = explode(' ', microtime());
$seed = ((float)$sec + ((float)$usec * 100000)) * ((float)microtime() * 1000000);
mt_srand($seed);
$inputs = array_merge(range('z','a'), range(0,9), range('A','Z'), array('@','!','#','%','&','*','+','_','-','~','?','.'));
$inputsLength = count($inputs) - 1;
for($i = 0; $i < $length; $i++)
$salt .= $inputs{mt_rand(0, $inputsLength)};
return $salt;
}
When we create a user password we’ll generate a salt, add it to the password string, hash the password to get a digest, then store the salt and digest in the database. To log the user in subsequently we could use functions like the following:
function HashPassword($password, $salt)
{
return hash('sha256', $password . $salt);
}
function IsValidPassword($password, $salt, $digest)
{
return (HashPassword($password, $salt) == $digest);
}
Password Stretching
Stretching is creating a digest of a digest (of a digest of a digest … of a digest … you get it.) If you create a digest of a password, then create a digest of that X number of times you can no longer simply create a digest (from a rainbow table or otherwise) and compare it directly to the digest that is stored in the database. To compare passwords you’ll have to run the exact same number of iterations if hashing digests to compare passwords. This is useful on multiple fronts: it slows things down and (in conjunction with salted passwords) your hashes no longer look the same as everyone else’s. It stands to reason that if hashing a password once takes X amount of time, hashing it twice will take approximately 2X. You’ve just cut in half the number of times an attacker can guess your passwords. Congratulations! A good system takes so long to process a single digest that guessing a password using brute force will take more than a lifetime.
Let’s modify our password hashing function:
function HashPassword($password, $salt, $iterations = 1024)
{
// Create the first digest
$output = hash('sha256', $password . $salt);
for($i = 0; $i < $iterations; $i++)
{
// Re-salt every hash for extra randomization
$output = hash('sha256', $output . $salt);
}
}
Notice that I have re-salted every hash to add extra randomization to the digest… just another wrinkle to throw at an attacker.
Pepper
Additionally, you can have an application wide salt, called a pepper. Think of it as a salt for the salt, except this salt is unique only to the application, server, environment, or database.
You could use it like that hash('sha256', $pepper . $password . $salt);
Adaptive Key Derivation
Adaptive key derivation functions generate digests from passwords while applying salts and stretching. They implement many more wrinkles and are tested against attack vectors you may never think of- which is the important part. They are tested against attack vectors. Rolling your own cryptographic functions introduce a lot of unnecessary exposure and take more time than using generally accepted libraries, implementations and functions. I’m going to focus on the one I know best, PBKDF2. There are others such as bcrypt and mcrypt
Peanut Butter Keeps Dogs Friendly Too
PBKDF2 (Password-Based Key Derivation Function) is probably the most widely used derivation function. It is a container for a hash function, e.g. SHA-1 or RIPEMD,. For each input it applies a salt and iterates the hash many times in such a way that not much entropy (length and randomness) is lost. Primarily, it is done in such a way that it is SLOW to generate a single digest. The US government and NSA use this for generating strong encryption keys.
Adaptive keys are great first step, but remember, this is one tiny piece of securing user data.
Below is a very basic class I created that can be used for generating salts and digests through a variety of ways. You can download it here. This file will be updated regularly, so stay in touch!
<?php /** * PasswordUtil.php * @package AC */ /** * AC_PasswordUtil * * Password hashing and generation utilities * @package AC * @category Security * @version $Id:$ * @author Mustafa Ashurex <[email protected]> */ class AC_PasswordUtil { /** * @var int Default hash key length */ const DEFAULT_KEY_LENGTH = 256; /** * @var int Default number of times to iterate a hash */ const DEFAULT_ITERATIONS = 1024; /** * @var string Default hash algorithm to use for PBKDF2 */ const DEFAULT_PBKDF2_ALGO = 'SHA256'; /** * @var string PBKDF2 algorithm name */ const ALGO_PBKDF2 = 'PBKDF2'; /** * @var string Whirlpool algorithm name */ const ALGO_WHIRLPOOL = 'WHIRLPOOL'; /** * Return the default characters to use for generating salts. * @static * @return string[] Default characters to use for generating salts. */ public static function DefaultSaltChars() { return array_merge(range('z','a'), range(0,9), range('A','Z'), array('@','!','#','%','&amp;amp;amp;amp;','*','+','_','-','~','?','.')); } /** * Returns the supported text hashing algorithm names * @static * @return string[] Supported text hashing algorithm names */ public static function PasswordHashAlgorithms() { return array( self::ALGO_PBKDF2, self::ALGO_WHIRLPOOL, ); } /** * Hashes a plaintext password using the parameters defined. If provided, $pepper * will be appended to the beginning of $password and $salt will be used in every hash * iteration in various ways (depending on the hash method used). * @static * @param string $password Plaintext password to hash. * @param string $salt A random sequence of bytes to add to the hash function. * @param string $pepper Another random sequence of bytes to add an extra secret to the hash generation. * @param string $algorithm Password hashing algorithm to use. * @param int $keyLength The number of bytes to return. * @param int $iterations The number of times to hash the text before returning the value. * @return string Returns $keyLength bytes of hashed $password. */ public static function HashPassword($password, $salt, $pepper = null, $algorithm = self::ALGO_PBKDF2, $keyLength = self::DEFAULT_KEY_LENGTH, $iterations = self::DEFAULT_ITERATIONS) { if(strlen(trim($pepper)) > 0) $password = $pepper . $password; switch($algorithm) { case self::ALGO_WHIRLPOOL: return AC_PasswordUtil::WhirlpoolHash($password, $salt, $keyLength, $iterations); case self::ALGO_PBKDF2: // Base64 encode the output of PBKDF2 because it's binary return base64_encode(AC_PasswordUtil::PBKDF2($password, $salt, $keyLength, $iterations)); default: throw new Exception('Unknown hash algorithm (' . $algorithm . ')!'); } } /** * Create a random salt string * @static * @param int $length Number of bytes to return. * @param string[] $validChars Array of characters to use for the salt, overrides the default. * @return string Randomized salt string of $length bytes. */ public static function CreateSalt($length = 128, $validChars = null) { $salt = ''; list($usec, $sec) = explode(' ', microtime()); $seed = ((float)$sec + ((float)$usec * 100000)) * ((float)microtime() * 1000000); mt_srand($seed); if(is_array($validChars)) $inputs = $validChars; else $inputs = self::DefaultSaltChars(); $inputsLength = count($inputs) - 1; for($i = 0; $i < $length; $i++) $salt .= $inputs{mt_rand(0, $inputsLength)}; return $salt; } /** * Hashes the provided plaintext password using Whirlpool hash and provided parameters. * If the Whirlpool algorithm is not present on the system, it will fall back to MD5 if allowed which * is not nearly as effective. If not allowed, an exception will be thrown. * @static * @param string $password * @param string $salt * @param int $keyLength * @param int $iterations * @param bool $fallBack */ public static function WhirlpoolHash($password, $salt, $keyLength = self::DEFAULT_KEY_LENGTH, $iterations = self::DEFAULT_ITERATIONS, $fallBack = false) { $hashMethod = 'whirlpool'; if($iterations <= 0) throw new Exception('Iterations must be greater than 0.'); elseif($keyLength <= 0) throw new Exception('Key length must be greater than 0.'); elseif(!in_array($hashMethod,hash_algos(),true)) $hashMethod = 'md5'; if((!$fallBack)&amp;amp;amp;amp;&amp;amp;amp;amp;($hashMethod == 'md5')) throw new Exception('Whirlpool hash algorithm not found! Either allow for fallback to MD5 or install Whirlpool.'); // First thing, stretch the password // md5 is used because it is the only hashing function that can be guaranteed to be on a majority of systems $output = md5($password . $salt); // Hash the output repeatedly for($i = 0; $i < $iterations; $i++) $output = hash($hashMethod, $output . $salt); // If the requested key length is too long, shrink the requested key length if(strlen($output) < $keyLength) $keyLength = strlen($output); return substr($output, 0, $keyLength); } /** * Password-Based Key Derivation Function using PBKDF2 * as described by RSA's PKCS #5: https://www.ietf.org/rfc/rfc2898.txt * Note: You will want to run base64_encode on the output of this method to use it * as text as the output is binary. * @static * @param string $password The plaintext password to hash. * @param string $salt A salt that is unique to the password. * @param int $keyLength The length of the derived key in bytes. * @param string $iterations The number of times to hash the password before returning. * @param string $algorithm The hash algorithm to use. * @return Binary string of $keyLength bytes, derived from the provided $password and $salt. */ public static function PBKDF2($password, $salt, $keyLength, $iterations = self::DEFAULT_ITERATIONS, $algorithm = self::DEFAULT_PBKDF2_ALGO) { $algorithm = strtolower($algorithm); if(!in_array($algorithm, hash_algos(), true)) throw new Exception($algorithm . ' is not found.'); elseif($iterations <= 0) throw new Exception('Iterations must be greater than 0.'); elseif($keyLength <= 0) throw new Exception('Key length must be greater than 0.'); // Determine the length of the specified hash $hashLength = strlen(hash($algorithm, null, true)); // The number of iterations of the hash necessary to fill $keyLength characters // IE: If $keyLength is 256 but $hashLength is only 128, we'd need 2 blocks // to fill our $keyLength. If $keyLength was 128 and $hashLength is 256, we'd just // take a subset of $output when we're done. $blockCount = ceil($keyLength / $hashLength); $output = ''; for($i = 1; $i <= $blockCount; $i++) { // Beginning hash for this block/iteration $iterate = $block = hash_hmac($algorithm, $salt . pack('N', $i), $password, true); // Hash each block the specified number of times for($j = 1; $j < $iterations; $j++) { // XOR each iterate $iterate ^= ($block = hash_hmac($algorithm, $block, $password, true)); } // Block is completed, append to the output and move on to the next $output .= $iterate; } // Return up to $keyLength characters return substr($output, 0, $keyLength); } }
Thanks for this post. It's good to know what options are available out there for hashing.
On a related note, I think it's also important to consider the tradeoffs each option will give. For example, if we're hashing a hash, whenever we need to authenticate a user we need to consider that it's going to take just a tiny bit longer for that user to log in. For one user its not a big deal, but multiply that by thousands or millions and then throw in environment and server conditions, it could be a problem.
Brett,
That's a valid concern, however before this would really be a problem, an architect or developer would have already had to have dealt with scaling issues around more intensive parts of their application/site. If you created a service layer for the login, then you could handle many more concurrent logins per second.