permaculture playing cards*
The moose likes Beginning Java and the fly likes comparing two comma delimited files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "comparing two comma delimited files" Watch "comparing two comma delimited files" New topic
Author

comparing two comma delimited files

Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Hello All!

I have been trying to come up with the best possible way to compare the contents of two comma delimited text files and create possibly a third file with the outputs appending each line from both the files based on the first column value match. With the help of Jeff Albrechtsen (Thanks Jeff!)
I have used HashMap, ArrayList, Set, Iterator, Regex utility and other suggested APIs for JDK 1.4.2. So far, what I have works. But to me the code looks ugly. Wondering if you gurus can advise me with a much better solution for what I want to accomplish. Eventually I will be using doing manipulation (e.g. total sum of each of the column values) of certain column data from the final array list constructed from the two files to create the third file.
Here's code with the usage, input files and the output -

import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.lang.String;


public class CompareFiles {

public static void main(String[] argv){

try {
String basefile = argv[0];
String inputfile = argv[1];
List val = new ArrayList();
List final_list = new ArrayList();

// Calling the method to read files and create a HashMap object
HashMap base_hash = readFile(basefile);
HashMap input_hash = readFile(inputfile);

System.out.println("Base Hash: " + base_hash);
System.out.println("Input Hash: " + input_hash);

// Iterating through the first input (base) file
Set entries = base_hash.entrySet();
for (Iterator it = entries.iterator(); it.hasNext(); )
{
StringBuffer sb = new StringBuffer();
Map.Entry entry = (Map.Entry) it.next();
val = (List) entry.getValue();
String k = (String) entry.getKey();
// Appending the key value to a StringBuffer object
sb.append(k);

Iterator iter = val.iterator();
while (iter.hasNext()) {

String str = (String) iter.next();
//Prepending the value to the same StringBuffer object prefixed with a comma
sb.append(',' + str);

}

// Iterating through the second input file
List l = (List)input_hash.get(entry.getKey());
Iterator i = l.iterator();
while (i.hasNext()) {
String s = (String) i.next();
sb.append(',' + s) ;

}

// Add StringBuffer object into a List object
final_list.add(sb);
}

// Iterate through the the list
Iterator a = final_list.iterator();
while (a.hasNext()) {

System.out.println ("Array List " + a.next());

}

} catch (ArrayIndexOutOfBoundsException e) {

System.out.println("\n" + "You must specify the base file and a file name as argument" + "\n");
System.out.println("USAGE: java CompareFiles <BaseFileName> <InputFileName>" + "\n");

}

}


public static HashMap readFile(String file_name) {

HashMap hm = new HashMap();
try {

// Reading file
BufferedReader reader = new BufferedReader(new FileReader(file_name));

List list = new ArrayList();
String line = reader.readLine();
while(line != null){
list.add(line);
line = reader.readLine();
}

reader.close();

// Setting delimiter
Pattern p = Pattern.compile(",");

// records are in the array one line per element
// also, each was printed to stout as it was read
Iterator iterator = list.iterator();
while(iterator.hasNext()){

String str = (String) iterator.next();
// Parsing each line by delimiter
String[] result = p.split(str);
// Storing the first value from the String array as the key
String key = result[0];
//Set value = new LinkedHashSet();
List value = new ArrayList();
// Rest of the String array will be the value
for (int i=1; i<result.length; i++)
{
value.add(result[i]);
}

hm.put(key, value);
}

} catch(Exception ex){
System.out.println(ex);
}
return hm;
}
}

Input data files -

input1 :

1,abc,cde,efg
2,ghi,jkl, lmn
3,nop,pqr,stw

input2 :

1,111,cde,efg
2,222,jkl, lmn
3,333,stv,lmn

% java CompareFiles input1 input2
Base Hash: {3=[nop, pqr, stw], 2=[ghi, jkl, lmn], 1=[abc, cde, efg]}
Input Hash: {3=[333, stv, lmn], 2=[222, jkl, lmn], 1=[111, cde, efg]}
Array List 3,nop,pqr,stw,333,stv,lmn
Array List 2,ghi,jkl, lmn,222,jkl, lmn
Array List 1,abc,cde,efg,111,cde,efg
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
If the files are sorted like your examples, you can use this method:

You can translate this into almost exactly that many lines of Java (plus a few curly braces) and use it for a million kinds of file compare or merge programs.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Thanks Stan for your reply. I am still learning Java. Would it be possible to show some code using your algorithm?

Thanks again!
Stuart Ash
Ranch Hand

Joined: Oct 07, 2005
Posts: 637
Originally posted by Tariq Ahsan:
Hello All!

I have been trying to come up with the best possible way to compare the contents of two comma delimited text files and create possibly a third file with the outputs appending each line from both the files based on the first column value match. With the help of Jeff Albrechtsen (Thanks Jeff!)



Can you post content of the two text files and how the resultant file must look. I might have an answer then


ASCII silly question, Get a silly ANSI.
Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Stuart,

As I have already mentioned in my initial posting the 2 input files and the output would like something below -


Input data files -

input1 :

1,abc,cde,efg
2,ghi,jkl, lmn
3,nop,pqr,stw

input2 :

1,111,cde,efg
2,222,jkl, lmn
3,333,stv,lmn

% java CompareFiles input1 input2
Base Hash: {3=[nop, pqr, stw], 2=[ghi, jkl, lmn], 1=[abc, cde, efg]}
Input Hash: {3=[333, stv, lmn], 2=[222, jkl, lmn], 1=[111, cde, efg]}
Array List 3,nop,pqr,stw,333,stv,lmn
Array List 2,ghi,jkl, lmn,222,jkl, lmn
Array List 1,abc,cde,efg,111,cde,efg

output :

1,abc,cde,efg,111,cde,efg
2,ghi,jkl, lmn,222,jkl, lmn
3,nop,pqr,stw,333,stv,lmn

Thanks

Tariq
Michael Swierczek
Ranch Hand

Joined: Oct 07, 2005
Posts: 107
    
    1
Tariq Ahsan,

I'm not that experienced with Java myself, but I do have a few minor suggestions.

I find myself coding the following things so often that I put together a seperate Java package with static functions to handle them:
public static HashMap textFileIntoHashMap (String fileName);
public static ArrayList textFileIntoArrayList (String fileName);
public static String textFileIntoString (String fileName);
public static boolean arrayListIntoTextFile(ArrayList al, String fileName);
public static boolean hashMapIntoTextFile(HashMap hm, String fileName);
public static boolean stringIntoTextFile(String fileContents, String fileName);

If you anticipate tackling many projects like this one, you might want to try something similar with your file reading and writing functions. You may also want to do the same thing with the code to split a String into a String [] based upon a delimiter (like ",").
Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Thanks Michael for your suggestion. I will try to incorporate these in the final version of the code once I get to that stage.
Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Wondering if any one would have a better solution than what I have now.
Some code snippets would be much appreciated.

Thanks in advance.

Tariq
 
Don't get me started about those stupid light bulbs.
 
subject: comparing two comma delimited files