This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.

I am using Sun DS as my password store, which stores passwords in SHA1 hashed form. I need to use the retrieved password (SHA1 hashed) to generate an AES-128 key. This is not possible with SHA1 as it is 20 byte long whereas I need a 16 byte hash to generate the key. Is rehashing my SHA1 password with MD5(16 bytes) going to make the generated key less secure? Unfortunately Sun DS doesn't support MD5 hashing of passwords hence my problem.

Hashing your SHA1 hash with MD5 will give you MD5-level security for your SHA1 hash, but your initial password will still have SHA1 protection, so I think it should be ok for you to use this approach.

Henry Wong wrote:Why not just use 16 out of the 20 bytes?

Henry

I believe that would affect the collision resistance (thus the security), since the algorithms are designed to be collision resistant for their full length of output bytes, not just parts of them. Can't provide mathematical proofs though

Henry Wong wrote:Why not just use 16 out of the 20 bytes?

Can't provide mathematical proofs though

The math is described in the "Birthday paradox", google or wikipedia have links.

If you only need a limited range of hash values, just pull them out of the SHA results. You can truncate the return string, or pick a substring out of the center, makes no difference. There are many cases where a long hash value is not necessary

There is zero reason to do a MD5 of the SHA. Its bad form.

Henry Wong wrote:Why not just use 16 out of the 20 bytes?

Can't provide mathematical proofs though

The math is described in the "Birthday paradox", google or wikipedia have links.

If you only need a limited range of hash values, just pull them out of the SHA results. You can truncate the return string, or pick a substring out of the center, makes no difference. There are many cases where a long hash value is not necessary

There is zero reason to do a MD5 of the SHA. Its bad form.

I did not get it, the Birthday Paradox is in fact contrary to what you said about getting substring from the hash. Lets say checksum is a 'birthday' of the input string(or if you think other way around, birthday is log365 bit checksum of a person), so probability of getting collision is higher than it may first seem.

Elchin Asgarli wrote:I did not get it, the Birthday Paradox is in fact contrary to what you said about getting substring from the hash. Lets say checksum is a 'birthday' of the input string(or if you think other way around, birthday is log365 bit checksum of a person), so probability of getting collision is higher than it may first seem.

No, the birthday paradox explains exactly the math.

Then you have an engineering decision, what probability of collision is important to your application? You may not need the low probability of collision that the full SHA provides. If so, you can substring the results. And you don't have to guess, you can calculate the length you need for your requirements.

Doing a MD5 of a SHA1 yields only the MD5 collision strength, and MD5 is considered too weak for serious work. Doing the two step process is a waste of execution and programmer time.

Use a SHA, these days SHA256. Then truncate to suit your needs.

Elchin Asgarli wrote:I did not get it, the Birthday Paradox is in fact contrary to what you said about getting substring from the hash. Lets say checksum is a 'birthday' of the input string(or if you think other way around, birthday is log365 bit checksum of a person), so probability of getting collision is higher than it may first seem.

No, the birthday paradox explains exactly the math.

Then you have an engineering decision, what probability of collision is important to your application? You may not need the low probability of collision that the full SHA provides. If so, you can substring the results. And you don't have to guess, you can calculate the length you need for your requirements.

Doing a MD5 of a SHA1 yields only the MD5 collision strength, and MD5 is considered too weak for serious work. Doing the two step process is a waste of execution and programmer time.

Use a SHA, these days SHA256. Then truncate to suit your needs.

The birthday paradox says that there is roughly an even chance of getting a collision when you have a sample size of about the square root of the number of possible outcomes. If you use N bytes from a hash (obviously N less than or equal to the hash length) whether they are a sub-string from an MD5 or a SHA1 or a SHA2 does not matter - the number of possible outcomes is 256^N so the approximately even chance comes with 256^(N/2) samples.

Even if one takes the MD5(SHA1(SHA256(RIPE16(SHA128(messages))))) or just MD5(message) one still ends up with 16 bytes so one still has an even chance with about 256^8 samples as long as no stage in the chain results in less than 16 bytes. To illustrate this restriction, consider a very limited hashing that produced only one byte of output. This means that after this stage all input messages result in only one byte so only 256 possible values. All the following stages therefore can only produce one value for each of these 256 possible outcomes. One would there for get an even chance of a collision with about 16 input messages even though the resulting output has 16 bytes.

So, doing an MD5 of the SHA1 hash is a waste of time. If one is going to use a hash to generate the key like this (I think a more formalized PKCS12 PBE approach would be better) Pat is correct when he say use a 16 byte sub-string of a sha (sha1, sha256, sha384, sha512 etc) .

Retired horse trader.
Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.

My original response was merely a "common sense" statement. And I didn't intend for it to get into a long discussion about "collision resistence".... but since we are here, let me thrown my two cents in.

Arguing the merits of MD5 vs SHA1 vs MD5 of SHA1 seems silly to me. It is like trying to put an extra deadbolt on the security door, when the window has a simple lock. Keep in mind that, if lucky, an average password has about 7 or 8 ascii letters. At 5 relevant bits per letter, this adds up to about 40 bits. Arguing about keeping the 160 bits vs dropping the security level down to 128 bits, is kinda silly, when the password is only about 40 relevant bits long.

IMHO, the purpose of the SHA1 is more to generate a strong password for the AES, as sha1 output generally don't have long lengths of 0's or 1's. However, if I were to attack the security, I would attack it at the weakest point, and that would be a brute force attack at the password itself -- and not the key used for the AES.

Henry Wong wrote:Arguing the merits of MD5 vs SHA1 vs MD5 of SHA1 seems silly to me.

I don't think anyone is seriously arguing for hash(hash()). And using MD5 for anything modern is just silly.

Henry Wong wrote: Keep in mind that, if lucky, an average password has about 7 or 8 ascii letters. At 5 relevant bits per letter, this adds up to about 40 bits. Arguing about keeping the 160 bits vs dropping the security level down to 128 bits, is kinda silly, when the password is only about 40 relevant bits long.

I will argue that you are being overly generous in claiming that an average password has as many as 40 bits of entropy. A recent study of leaked passwords from the social networking site RockYou's (32 million passwords) showed "one fifth of users had selected passwords which were amongst the 50,000 most common on the web." This is less than 16 bits of entropy. Further, "Nearly 300,000 of the compromised accounts used "123456" as a password, while an additional 79,078 selected "12345" and 76,790 users had "123456789" as their password. Other commonly used passwords were "password", "iloveyou", and " princess"."

In the CERT list of weak passwords, there are only about 200 entries, and these very popular passwords have essentially zero entropy.