wood burning stoves 2.0*
The moose likes Java in General and the fly likes Question about file comparison Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Question about file comparison" Watch "Question about file comparison" New topic
Author

Question about file comparison

jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
Hello there,

I have 500k files in GFS, and every time I add a new file into system, I need compare the new one with the other 500k files to see whether it exists or not, if no, add it to the system.

here is my question: How I can design a effective method to make comparison? using database? or some other way? please help me out, any suggestions will be much appreciated.
jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
ps: I only need compare the name of files...
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42268
    
  64
What is a "GFS"?


Ping & DNS - my free Android networking tools app
jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
the files are stored in Global File System
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3646
    
  15
Will File.exists() do what you want ?


Joanne
jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
sure it can, but how much time dose it cost, I need handle a huge number of articles per day, so I'm looking for the most effective way to achieve the goal
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3646
    
  15
Originally posted by jonathan:
sure it can, but how much time dose it cost, I need handle a huge number of articles per day, so I'm looking for the most effective way to achieve the goal


I would imagine that will depend on your file system. Try it with File.exists() and see if it meets your performance requirements. If it doesn't then try something else. As long as you design your system correctly it should be straightforward to change the code that does the check without having to modify the rest of your system.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42268
    
  64
Could you spare 10 or 20MB to cache the file names in memory, say, in a List?
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14268
    
  21

"jonathan", please check your private messages. You can see them by clicking My Profile.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
Originally posted by Ulf Dittmer:
Could you spare 10 or 20MB to cache the file names in memory, say, in a List?

I know that's the way to solve the problem, the key point here is which way is more effective: using File.Exist() or hashtable
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I would expect that a HashSet would be fastest. (I would never use a List for this.) But any Collection or Map will take some memory, so as Ulf indicated, you need to determine if the amount of memory required is acceptable, and if it's worth the increased speed. In general I would think that File.exists() is pretty fast in the first place, but there's really no way for us to determine if it will be fast enough for you. Seems like it would be pretty easy to just write the code yourself and see how fast it is. Using File.exists() is very simple, and using a HashSet is only a little more complex. It should be easy to change your code from using one to using the other, if you need to. Asking people here won't really answer your question, I think. Try it and see.


"I'm not back." - Bill Harding, Twister
jonathan ford
Greenhorn

Joined: Nov 07, 2007
Posts: 9
Originally posted by Jim Yingst:
I would expect that a HashSet would be fastest. (I would never use a List for this.) But any Collection or Map will take some memory, so as Ulf indicated, you need to determine if the amount of memory required is acceptable, and if it's worth the increased speed. In general I would think that File.exists() is pretty fast in the first place, but there's really no way for us to determine if it will be fast enough for you. Seems like it would be pretty easy to just write the code yourself and see how fast it is. Using File.exists() is very simple, and using a HashSet is only a little more complex. It should be easy to change your code from using one to using the other, if you need to. Asking people here won't really answer your question, I think. Try it and see.


it helps a lot, I'll try it
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Question about file comparison