The second conundrum is the password value. A string of unicode characters is specified. So say my password is "abc", this would be "\u0062\u0062\u0063". Would anyone know if this is the correct format to put them in as.
Cryptography is done on binary data, so the salt and password identifiers used in the article refer to byte arrays. That means you first need to convert the password "abc" to a byte array using some sort of character encoding.
Concatenation (The '+' operator) refers to "pasting two byte arrays together" so they form one with the combined size.
The salt looks like gobbledygook when you try to encode it to ASCII, because it's random binary data, and never intended to be read as text.
If the specification says that salts must be 16 bytes, then obviously the file you have does not match the specification.
Honestly, the article is really poor because they specify the password to be an array of Unicode characters, while hashing is done on binary data. They don't specify whether the data is UTF-8, UTF-16 or something else still. They also use the identifier block without specifying what it means. I wouldn't be confident writing an algorithm based on this article alone. You may actually have to dive into the ECMA-376 specification.
Thanks Stephan. The saltValue is in the XML field next to the salt size so it's definitely correct. Saly size is specified at 16 while the size of the base64 decoded string is 32 bytes. Like I said if this was hex it would make sense. I understand what you're saying about the unicode characters. In the documentation it specifically mentions that the password be broken down to unicode before being hashed. My SHA512 algorithm was getting a byte array from the string already so this threw me off.
I formatted your XML a little bit so it's more readable.
There are more inconsistencies with the article. The spin count is 100000 while the article specifies 50000. The cipher mode is CBC while the article specifies ECB (why I'm not sure, ECB is horrible). The hashing algorithm is SHA-512 while the article specifies SHA-1.
This file simply does not seem to conform to what the article prescribes.
The article, although it says is updated, seems to be based on office 2007(I think). There are a few other things like the SpinCount=50,000 mentioned in the documentation but as per the XML you can see it's 100,000. There are a few other slight changes made for the latest version (2013), like the final iteration of the hash in the key generation previously used SHA(H(max),0) but instead of 0 there is a specific array of bytes (0xfe, 0xa7, 0xd2, 0x76, 0x3b, 0x4b, 0x9e, and 0x79.). Strangely enough these made their way into the article.
I'm trying to put together what sections are up to date and what are not, I'm relying on two references
I have the section of my project related to this thread finished. It does everything I think it should but I can't seem to get a match in the output of the two values that confirm the correct password. If I were to post my code here would anyone mind having a look to see if they can spot a glaring problem in the code? There's a little bug in there somewhere I reckon but I can't find it.
Krispin, I deleted the Apache CryptoFunctions class from your post. Its source and documentation are readily available online.
I don't know the specification and don't know exactly what you're trying to prove in your main class. You have a bunch of magical constants that you pulled from somewhere, and it's not clear what you're doing with them or why.
Furthermore, you're instantiating instances of CryptoFunctions, SHA512 and Hex, while these are utility classes and you can call methods on the classes directly.
You're also not supposed to pass 0 as the cipher mode, but Cipher.DECRYPT_MODE.
Sorry, I did a poor job of explaining anything. The constants were taken from my encrypted documented that I'm trying to open by verifying the hard coded password. The salt, hashValue and hashInput are decoded from base64 then used to create a key with a function from the crypto class. The decoded salt and blockKey are use to create an IV key. Both these byte arrays (IV and key) are used to create a cipher which in turn is used to decrypt the decoded hashInput and hashValue values. The input is then hashed and they are compared. If the password was correct they should match.
The Hex and SHA512 classes were my own which is why I included them. I instantiated the cryptoClass as I had initially found the code online before discovering apache.poi library whivh they are in. The class asked for an int in the getCipher parameter which corresponds to Cipher.DECRYPT_MODE or Cipher.ENCRYPT_MODE. I didn't know what int corresponds to what so I edited the getCipher method in the cryptoClass to just use Cipher.DECRYPT_MODE regardless of what int was passed in.