Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

CRC for a file

 
Dave Jones
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all y' ranchers !

I have to compare a few files to each other, the best solution is running each file through some kind of a hash algorithm, so the result will be either a hash, or a long or anything that is 16 bit long...
I though of using the CRC32 but it works with 32 bit (as implied by the name...)

Any suggestion will help. I need something simple and reliable.

Thanks a lot.
Dave
 
Joe Ess
Bartender
Posts: 9280
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Dave Jones:
. . .or a long or anything that is 16 bit long...


You do realize that an int in Java is 32 bits and a long is 64?
Any reason for the 16 bit requirement?
 
Henry Wong
author
Marshal
Pie
Posts: 21122
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Dave Jones:
I have to compare a few files to each other, the best solution is running each file through some kind of a hash algorithm, so the result will be either a hash, or a long or anything that is 16 bit long...
I though of using the CRC32 but it works with 32 bit (as implied by the name...)

Any suggestion will help. I need something simple and reliable.


Well, quite frankly, CRC32 is not reliable either -- as it is very possible to have two completely different files match because their CRC are the same.

This is true for any hash that you use. To represent, what is potentially an unlimited amount of data, in 32 bits, and expect it to be completely unique is ridiculous. The purpose of the hash is for the hash to drastically change, when only small changes are encountered -- in effect, to detect small corruptions in the file.

Anyway, if you only want 16 bits, then use the first 16 bits, or the last 16 bits, or every other bit. You'll get more different files to match, but you'll get that with any hash algorithm that you use.

Henry
 
Dave Jones
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello, and thank you for your answer.
I can't just take the firat/last 64 bits since they will probably be identicle althou the files are different (the files are similar but not equal) so I need some kind of CRC.
 
Henry Wong
author
Marshal
Pie
Posts: 21122
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Dave Jones:
I can't just take the firat/last 64 bits since they will probably be identicle althou the files are different (the files are similar but not equal) so I need some kind of CRC.


Well, I was trying to save you some time, by suggesting that half of a CRC-32 is about as good as a CRC-16.

But if you disagree, then you already answered your question -- use a CRC-16 instead. It's not too hard to implement a CRC-16. I recall I implemented two different CRC-16 routines (many many years ago), in only a couple of hours.

Henry
 
Dave Jones
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thenk you Henry !

I already used the CRC32 class, it returns a long and that is good for my uses.
But now, a question has risen:
Why is it called CRC32 if the 'getValue' method returns a long value ??
Common logic says it should be called CRC64. or did I miss something here ?

Thanks again,
Dave
 
Henry Wong
author
Marshal
Pie
Posts: 21122
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Dave Jones:
I already used the CRC32 class, it returns a long and that is good for my uses.
But now, a question has risen:
Why is it called CRC32 if the 'getValue' method returns a long value ??
Common logic says it should be called CRC64. or did I miss something here ?


A CRC-32 is a 32 bit *unsigned* value. A java int holds a 32-bit *signed* value. I would venture a guess, that only the lower 32-bits of the long variable is used.

Henry
 
Dave Jones
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree, it does seem logical
Thanks again Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic