aspose file tools*
The moose likes Beginning Java and the fly likes CRC for a file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "CRC for a file" Watch "CRC for a file" New topic
Author

CRC for a file

Dave Jones
Ranch Hand

Joined: Feb 20, 2005
Posts: 77
Hello all y' ranchers !

I have to compare a few files to each other, the best solution is running each file through some kind of a hash algorithm, so the result will be either a hash, or a long or anything that is 16 bit long...
I though of using the CRC32 but it works with 32 bit (as implied by the name...)

Any suggestion will help. I need something simple and reliable.

Thanks a lot.
Dave
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8718
    
    6

Originally posted by Dave Jones:
. . .or a long or anything that is 16 bit long...


You do realize that an int in Java is 32 bits and a long is 64?
Any reason for the 16 bit requirement?


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18141
    
  39

Originally posted by Dave Jones:
I have to compare a few files to each other, the best solution is running each file through some kind of a hash algorithm, so the result will be either a hash, or a long or anything that is 16 bit long...
I though of using the CRC32 but it works with 32 bit (as implied by the name...)

Any suggestion will help. I need something simple and reliable.


Well, quite frankly, CRC32 is not reliable either -- as it is very possible to have two completely different files match because their CRC are the same.

This is true for any hash that you use. To represent, what is potentially an unlimited amount of data, in 32 bits, and expect it to be completely unique is ridiculous. The purpose of the hash is for the hash to drastically change, when only small changes are encountered -- in effect, to detect small corruptions in the file.

Anyway, if you only want 16 bits, then use the first 16 bits, or the last 16 bits, or every other bit. You'll get more different files to match, but you'll get that with any hash algorithm that you use.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Dave Jones
Ranch Hand

Joined: Feb 20, 2005
Posts: 77
Hello, and thank you for your answer.
I can't just take the firat/last 64 bits since they will probably be identicle althou the files are different (the files are similar but not equal) so I need some kind of CRC.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18141
    
  39

Originally posted by Dave Jones:
I can't just take the firat/last 64 bits since they will probably be identicle althou the files are different (the files are similar but not equal) so I need some kind of CRC.


Well, I was trying to save you some time, by suggesting that half of a CRC-32 is about as good as a CRC-16.

But if you disagree, then you already answered your question -- use a CRC-16 instead. It's not too hard to implement a CRC-16. I recall I implemented two different CRC-16 routines (many many years ago), in only a couple of hours.

Henry
Dave Jones
Ranch Hand

Joined: Feb 20, 2005
Posts: 77
Thenk you Henry !

I already used the CRC32 class, it returns a long and that is good for my uses.
But now, a question has risen:
Why is it called CRC32 if the 'getValue' method returns a long value ??
Common logic says it should be called CRC64. or did I miss something here ?

Thanks again,
Dave
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18141
    
  39

Originally posted by Dave Jones:
I already used the CRC32 class, it returns a long and that is good for my uses.
But now, a question has risen:
Why is it called CRC32 if the 'getValue' method returns a long value ??
Common logic says it should be called CRC64. or did I miss something here ?


A CRC-32 is a 32 bit *unsigned* value. A java int holds a 32-bit *signed* value. I would venture a guess, that only the lower 32-bits of the long variable is used.

Henry
Dave Jones
Ranch Hand

Joined: Feb 20, 2005
Posts: 77
I agree, it does seem logical
Thanks again Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: CRC for a file
 
Similar Threads
Multiple set of data mapping
file management
Problem in Files (plz help me in???)
Need help in understanding the internal working of HashMap and HashTable.
Which one is better? Hash Map or Properties File