aspose file tools*
The moose likes Java in General and the fly likes Duplicate Files Remover Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Duplicate Files Remover" Watch "Duplicate Files Remover" New topic
Author

Duplicate Files Remover

Imad Ali
Greenhorn

Joined: Jan 04, 2009
Posts: 21
first of all LOL at this forum


Anyway, Im interested in GUI tweaking, mainly user friendliness and related principles.


I need to demonstrate this on a duplicate files remover/scanner program


Ive done some work into it. Some information you need to know if you want to help me is:
The program needs to be a Java application
It will run in a JRE
its a standalone program

Ok, let me go through the intended or expected end user phases:
User loads program, its comes on screen
User clicks can, application then scans local disk
Files are deleted automatically (hopefully displated)

Please let me in on some information about how to better incorporate a Md5 checksum to a GUI

I want to learn my exaple or snippet ideally, nothing too complicted please because i want to know exactly what each code chunk does.

thanks

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

First thing is determining how you can figure out file equality. A basic approach is checking the file length first; if they are equal compare the full contents.

You can use File.listFiles for browsing through your hard disk.


Now I've done something similar, and here's my approach:
- use a Map<Long,List><File>> that stores the unique files per file size.
- when you encounter a file, get the List<File> for its length.
- compare the file with all elements of the List<File> (if not null)
- if it is equal to any of the files process it, otherwise add it to the list (create it and add it to the list if it was null)

The latter part in pseudo code:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12789
    
    5
Please let me in on some information about how to better incorporate a Md5 checksum to a GUI


I dont see what the use of MD5 or any other checksum to determine file equality has to do with a GUI. Perhaps you could have a dialog which gives a choice of equality checking methods but thats about it. Surely nobody needs to know the numeric value.

Bill
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1059
    
  10

Many years ago I wrote a program to deal with duplicate files on a disk and I still use it. My basic approach is to process a collection of 'roots' that will be scanned looking for duplicates. Starting at the roots I recursively visit the file tree creating a map using the SHA1 digest (MD5 will do just as well) of the file content as key with a Set of file names as values. Files with the same content will produce the same digest but of course it is possible that two files with different content will also have the same digest BUT in the 10 years or so I have used the program it has never found two different files with the same digest.

In my first version I too wanted to automatically delete duplicates but I soon found that there are difficulties with doing this. First, given two or more files that have the same content which one(s) do you delete? Second, especially in HTML, it is frequently better to have duplicate files than to try to cross link the HTML sources. To get round these problems I allow the user to specify the minimum file length to consider and to be presented with a list of duplicate files and the user selects which one(s), if any, to delete.

Obviously taking the SHA1 digest means one has in principle to read the whole file. I found this to be unnecessary and ended up taking the digest of the first 1,000,000 bytes. I do have a paranoid check; if I find two file with the same digest I then check for absolute equality of the whole of the file content. This file content comparison normally takes very little time since one first check for the same length and only if the two files are not the same length does one go further.

The GUI is not complicated. Just a file selection system to select the roots, a JCombo size selector and a JTable to present the results. On selecting a duplicate one is given the option to delete it or move it to a backup area.

 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Duplicate Files Remover