• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Better way to process flat file and encrypt

 
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello EveryOne,
Could you please let me know how to read the data files ( around 300 million) in better way and encrypt using SHA-512? I currently reading file record by record and each record contains around 20 fields. Each field will be encrypted one by one and finally the record will be written to output file. Which is currently running for more than 40 hours. is there any better way to process them? threading concepts will be helpful? Thanks in advance

Bala
 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you may get a better answer in the performance forum so I'll move the thread there for you.

The first thing to do though is to profile the application (record some timings) to find out which area(s) are taking the most time because until you do that you don't know which bit(s) you need to optimize.
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you. i think the applying SHA-512 taking much time thanks
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I already have moved it there for you, but unless you provide some profiling information people are unlikely to be able to provide definitive answers.
Just noticed you have edited your last post, so ignore the above.

I would imagine you are correct about the encryption being the bottleneck but without actual timings you don't know that for sure and I've been wrong many times before in where I guessed a bottleneck was occurring. Also without actual timings you don't know if any changes you make are improving the situation or not.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What are you trying to accomplish? SHA-512 is a hash (or digest), not a cipher, so you won't be able to decrypt it. If you're trying to create a checksum for the file then there are existing tools that are much better suited (much faster).
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks much for your replies. As i said earlier i need to process the file which contains around 450 to 500 millions with 30 to 40 columns. Now these columns have to be encrypted using SHA-512. As of now we loop through entire flat file read the record by record and column by column. Now each column will be digested using SHA which return 32 bytes and again some process using those bytes.
Finally record will be written to output file and the process will continue till last record.

Now I am trying to process the file using thread concept rather than one by one to improve the permonance.... What is the best way to process this file? is thread method will be helpful? any other better way? Please explain me... your help will be much appreciated

Thanks
Bala
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

That does not make sense. "Encryption" implies that you will later want to "decrypt" it. This is not possible.

You can think of it as like taking the sine of a number. If I said "The sign of the number is 0.5. Tell me the original.", you can't. Or if I said "I am a traveler who is now in St. Louis. Where did I come from?" - again, you can't. There are lots of ways to get from A to B, so if you only know B, you can't work back to A
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One thing here is that we are not going to encrypt all columns and only few columns will be encrypted and those doesn't need to decrypt as per my requirement. I was just thinling to improve the processing speed...

Thanks much for your reply...
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I suggest that the set-up time for encryption of millions of small chunks is what is time consuming.

Is there some reason you can't just encrypt the whole file?

One thing here is that we are not going to encrypt all columns and only few columns will be encrypted and those doesn't need to decrypt as per my requirement.



Encrypting data that will never be decrypted why not just throw it away?

Bill
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
these column will be again loaded to some table research purpose
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No, they won't. As 3 people have said by now, you can't decrypt the data. Please read my previous post.
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you got me wrong.....I never wanted to decrypt them. I just want to improve the performance while processing file and encrypting..Thanks again
 
fred rosenberger
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you are never going to decrypt them, then why spend the time encrypting them? Isn't that just a waste of time/processor power? Why not just null them out? Or make them all literally "XXXXXXXXXXXXXXXXXXXXXXXX"?

What do you think you gain by running them through the hash?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please read this wikipedia entry on the SHA family of hash functions to understand why people keep telling you this is a bad idea.

Bill
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks much. i look for some alternatives. thanks
 
fred rosenberger
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Folks around here will be happy to give you alternatives, but only if you tell us what you are really trying to accomplish.
 
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think if I had to encrypt that large of a file, I might explore ECC (Ellliptic Curve Cryptography).
 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

R. Grimes wrote:I think if I had to encrypt that large of a file, I might explore ECC (Ellliptic Curve Cryptography).



I wouldn't ! Ellliptic Curve Cryptography is slow compared to AES which is the industry standard. If I were going to encrypt I might use a hybrid approach using ECC in conjunction with AES where a random session key is used for the AES and ECC is used to encrypt the session key. Even then I would probably use RSA rather than ECC since RSA is ubiquitous and ECC is not (at this time) . Also, the hybrid approach would require a different session key for each cleartext or database row or column or whatever unit of encryption is required.

On a more general point - I do know that some medical data is made anonymous before being distributed for research purposes by in essence hashing fields that might identify people. This allows researchers to identify common entities but not actual individuals. There has been some bad press over this since on it's own it is sufficient to hide identities but taken with other publicly available data and a knowledge of the hash algorithms used some but not all individuals can be identified. Some further anonymisation can be achieved by using a keyed hash and the key kept very secret but this is not considered enough to completely protect identities.
 
R. Grimes
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Richard Tookey wrote:

R. Grimes wrote:I think if I had to encrypt that large of a file, I might explore ECC (Ellliptic Curve Cryptography).



I wouldn't ! Ellliptic Curve Cryptography is slow compared to AES which is the industry standard.



Well, readers can refer to this document from Oracle, who apparently has a different view, and decide which is best. See link.

A couple of noteworthy quotes:

"The Elliptic Curve Cryptosystem
(ECC), off ers the highest strength per bit of any known
public-key cryptosystem today."


"We repeated these experiments using 2048-bit RSA keys
and 193-bit ECC keys. We found ECC to perform better
than RSA without any exceptions, "

For a bit more recent document, if the above is too dated for you, I would refer to this 2010 abstract.

A noteworthy quote from this document is:

"From the above we conclude that, computationally speaking,
cracking 160-bit ECC is at least three orders of magnitude
harder than cracking 1024-bit RSA. "

Or, perhaps this presentation, given by QualComm in Nov 2012. See page 10 for speed comparisons.
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

R. Grimes wrote:

Richard Tookey wrote:

R. Grimes wrote:I think if I had to encrypt that large of a file, I might explore ECC (Ellliptic Curve Cryptography).



I wouldn't ! Ellliptic Curve Cryptography is slow compared to AES which is the industry standard.



Well, readers can refer to this document from Oracle, who apparently has a different view, and decide which is best. See link.
A couple of noteworthy quotes:

"The Elliptic Curve Cryptosystem
(ECC), off ers the highest strength per bit of any known
public-key cryptosystem today."


"We repeated these experiments using 2048-bit RSA keys
and 193-bit ECC keys. We found ECC to perform better
than RSA without any exceptions, "

For a bit more recent document, if the above is too dated for you, I would refer to this 2010 abstract.

A noteworthy quote from this document is:

"From the above we conclude that, computationally speaking,
cracking 160-bit ECC is at least three orders of magnitude
harder than cracking 1024-bit RSA. "

Or, perhaps this presentation, given by QualComm in Nov 2012. See page 10 for speed comparisons.



None of your references compare secret key encryption using AES with public key encryption using ECC; they compare ECC with other public key crypto systems such as RSA. I am not arguing for RSA or any other public key algorithm; I'm arguing for using the industry standard for secret key encryption i.e. AES . Whether or not the OP should use AES on its own (see note 1) or in a hybrid system using one of the public key encryption algorithms depends very much on the sort of data he is encrypting. The OPs obvious confusion between encrypting and digesting is making it difficult for me to understand his requirements but as I hinted at in my previous post I suspect he is trying to 'annonymise' data rather than encrypt it but I am just guessing.

Note 1 - using AES will pretty much always require the use one of the feedback modes and for most feedback modes one also needs to add padding.


 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Richard Tookey wrote:None of your references compare secret key encryption using AES with public key encryption using ECC; they compare ECC with other public key crypto systems such as RSA.


I can't imagine anyone trying to compare any symmetric key encryption system (I assume that's what you mean by 'secret') with an asymmetric one, either for strength or speed, since they are likely to be orders of magnitude different.

As I recall, all government restrictions on key sizes are based on symmetric lengths; I don't even know if there are any such rules for asymmetric ones (though no doubt the bureaucrats have them tucked up their sleeve somewhere ).

Winston
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The point I was trying to make, Winston, is that I was advocating usage of AES rather than any public key system whether ECC, RSA or whatever. R. Grimes presented an argument that ECC was superior to RSA but since I was not advocating RSA as the primary encryption algorithm the argument was irrelevant. There may be strong case for using a hybrid system (see section 13.6 in Practical Cryptography by Ferguson and Schneier) but the OP has not presented a use case so it is difficult for me to judge. It is unusual for a public key system to be used for bulk encryption (that is what (symmetric) secret key ciphers are designed and optimized for) and a hybrid system whether it be ECC+AES or RSA+ AES is the norm when encrypting files.

Yes - there are US restriction on the size of the RSA modulus (it used to be 1024 bits but I'm not sure of the current size) and without the "Unlimited Strength" files installed the JCE enforces it. Presumably NSA has also placed similar arbitrary and irrelevant restrictions on ECC key sizes.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Richard Tookey wrote:Presumably NSA has also placed similar arbitrary and irrelevant restrictions on ECC key sizes.


No doubt. I remember back when I was "security administrating" for the first time (more than 10 years ago now) being amazed that encryption laws came under the heading of "Weaponry", and fine limits, even then, were in the hundreds of millions of dollars.

Winston
 
R. Grimes
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Richard Tookey wrote:R. Grimes presented an argument that ECC was superior to RSA but since I was not advocating RSA as the primary encryption algorithm the argument was irrelevant.



Oh, I'm sorry. I thought that, in your post I was responding to, you said:

"Even then I would probably use RSA rather than ECC since RSA is ubiquitous and ECC is not (at this time) ."
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

R. Grimes wrote:Oh, I'm sorry. I thought that, in your post I was responding to, you said:
"Even then I would probably use RSA rather than ECC since RSA is ubiquitous and ECC is not (at this time) ."


I think what we're probably both saying is that most PK encryption systems don't actually use PKE all the time; they use it for the "handshake" (ie, source verification and symmetric key-exchange) and then hand over to a symmetric algorithm; so the efficiency of the PK algorithm is unlikely to make a huge amount of difference to overall throughput.

Winston
reply
    Bookmark Topic Watch Topic
  • New Topic