This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes Beginning Java and the fly likes Compare Two Text Files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Compare Two Text Files " Watch "Compare Two Text Files " New topic
Author

Compare Two Text Files

Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello Friends,

In the earlier posting i got help as to how to process a logic and write output to text files, Thanks a lot the help. Question i have in mind is i need to now iterate through these two text files and get what common in both these files. Is there an API i can use? or do you suggest me to call Unix compare command via JAVA code?
I have come up an Algorithm too, please suggest would it work?


Comments are appreciated. Thanks a lot.

-Aditya
Siddhesh Deodhar
Ranch Hand

Joined: Mar 05, 2009
Posts: 117
Few corrections

FileInputStream fstream2 = new FileInputStream("textfile1.txt"); -> You are reading same file

if(strLine1 = strLine2) -> You should use .equals() method to compare two object values.

I don't know of any direct aPI whic can be used to compare files directly in java. If you want to find common lines..your above code is fine.

Using Unix compare command via JAVA code is all time best option


Good, Better, Best, Don't take rest until, Good becomes Better, and Better becomes Best.
Sidd : (SCJP 6 [90%] )
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Thanks Siddesh,

I am in the process of implementing it. I wll keep the thread updated.

-Aditya
Hardik Trivedi
Ranch Hand

Joined: Jan 30, 2010
Posts: 252
Hi,
Dear there is very big mistake either side of you or repliers....
I think you want to find such list of words which are common in both files...RIGHT?

Then let me tell you there is no specific method or api for that.
Use your own algorithm.
fetch word and compare that with all other words in second file if it match anywhere put in the array of string
and finally return that array.....
Hardik Trivedi
Ranch Hand

Joined: Jan 30, 2010
Posts: 252
Hi i found a very good prog for you...please refer the link
http://www.sourcecodesworld.com/source/show.asp?ScriptID=836
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41599
    
  55
Hardik Trivedi wrote:http://www.sourcecodesworld.com/source/show.asp?ScriptID=836

Taking a quick look at this, it seems rather unsophisticated. For example, it makes no allowance for missing or extra lines in one of the files. So if the first line of one file is missing, then *all* subsequent lines will be reported as different, even though they may be identical.

This is generally the realm of the "diff" command, which is available on all Unix/Linux boxes (as opposed to "compare", which is not).


Ping & DNS - my free Android networking tools app
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

There are multiple Java implementations of diff, if that's what you actually need. A good diff algorithm goes *way* beyond the code you've posted.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello Everyone,

I wanted to post an update with regard to my question. So the earlier code that goes the way as below does not work perfectly in all the cases. What it does if value in any lines are same it displays that string, which i not the actual output.
For example if the content of the two text files are as below, the actual output should be
"
A
friendly
place
for
Java
greenhorns"
,
where as the out put i get is "place". What changes should i made to the existing code? Comments are appreciated.






David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Why should the output be what you show above? What are you trying to accomplish?

In any case: you're printing the line if the two lines, one from each file, are the same. I'm not even sure why you're getting the line that says "place", since text2.txt has a lot of leading spaces.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello David,

I want to print the lines which are common in both the text files.

-Aditya
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Then if the formatting above is correct, *none* of the lines should print, since they all differ in spacing.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

I am sorry, that is true, it does not print any thing, where it should print the lines common in both the text files.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

But it is--there *aren't* any common lines. Unless you're trying to say that you want to ignore whitespace.

You really need to be specific about your requirements, otherwise we're all just guessing at what you want, and that's not an efficient use of time.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hi David,

Sorry for the inconvenience, i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear.

-Aditya
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

You're not grepping, you're diffing.

So really what you need to do is to keep track of each line in each file and compare them once you've read them in, right?

(That's a hint on how to proceed.)
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11256
    
  16

Aditya Sirohi wrote:i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear.

This is a perfect example of why specs are so important. We have gone from

"get what is common to both files"

to an example (which by itself is fine, but is incomplete)

to "print the lines which are common in both the text files"

to " it should print the lines common in both the text files"

to " i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear."

All these statement could mean slightly different things to different people. What does "consider the white space too" mean exactly? if file 'a' has "fred " and file 'b' has "fred", is that a match or not?

If I am interpreting what you want correctly, and I am not sure I am, I think what you need to do is read a single line from file 'a', and see if it's in file 'b', using whatever restriction you need regarding white space.

The, read the next line of file 'a' and compare against every line again.

You can possibly make your program smarter by checking to make sure that you read from the shorter file, that you don't re-test a line if you've already looked for it (unless you need to know for some reason), and perhaps by using the right data structures to store some info.

But the first thing I would do is nail down EXACTLY what you want in unambiguous terms.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello Fred,

I want to apologize for not being clear with my question.

All these statement could mean slightly different things to different people. What does "consider the white space too" mean exactly? if file 'a' has "fred " and file 'b' has "fred", is that a match or not?


Yes, if file 'a' has word fred and file 'b' has word fred then its a match.

I tried to write a piece of code, but it did not work. Comments are appreciated.

David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

As both Fred and I hinted, if order is not important, then simply looping over the lines isn't going to work--you need to be able to check all previous lines of the first file for each line in the second file. Can you think of some ways you might approach that?
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

I know what the code should be like, but i am finding it harder to implement it.

The pseudo code i have in mind is:

1. Read file 'a' line by line.
2. for each line in file 'a', check whether is present in file 'b', if its there then print the line.


I think that should solve my main problem. If i could get to know what Constructor and method i can use or a skeleton solution to the problem, i can work from there on.

Thanks
Aditya
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Constructor for what?

In any case, I don't need to give you a skeleton--you just defined the skeleton by writing out the steps you need to take. So what's next? What's the easiest way you can think of to implement what you just described?
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

Aditya Sirohi wrote:
2. for each line in file 'a', check whether is present in file 'b', if its there then print the line.


Let me quote that in a better way,

for each line in file 'a', iterate through ALL the lines in file 'b' and check its existence there

if you want a simple optimization,
load all lines of file 'a' and 'b' in two array list A and B

for each element in A, check its existence in B using contains().

CAUTION: The above optimization is not suitable if the file size is great.


My Website: [Salvin.in] Cool your mind:[Salvin.in/painting] My Sally:[Salvin.in/sally]
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

my approach would be to do a simple hashing of every line in A and B and store them in an arraylist as strings
then use the contains method to check existence.

however then the complexity of a hit and a miss comes into picture and thus optimizations (as usual) complicate a simple issue...
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello All,

I had been working whole day today and i made some progress, i can now store all the line of file 'a' into an array. Now i am trying to iterate over each element in the array and check if its present in file 'b'. I wanted to share the code i have till now. My code will look like novice, expert comments are appreciated.

Thanks
Aditya











Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Hello,
I have stored the content of the two files into an array but when i try to compare them, i get a null pointer exception on line :- if(arrayLines1[i].contains(arrayLines2[j]))

Code that i have till now:-












David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

You have the *capability* of reading in a thousand lines, but the files don't necessarily *contain* a thousand lines. So you don't want to check the length of the array--you want to check against how many lines the file actually has.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

I cannot get the common strings in two array i have created in the above code. I have tried to do this till now. But i dont get any output. I get an IO exception. Am i doing any thing wrong?

salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

David Newton pointed out a very grave problem in your solution,

1. you do not know the #lines in the file
2. you have hard coded it to 1000

what if a file contains 10 lines only?

it best at these situations to use a collection since they have the ability to expand themselves as new elements are added to them,
eg an ArrayList.


Secondly in your code:
i <= arrayLines1.length

Should have been:
i < arrayLines1.length

I dont see any reason why those lines of code should throw an IOException,
perphaps you could paste the first 10 lines of the stack trace ?

David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

I told you exactly what the problem was.
Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

Thanks To everyone, Java Ranch is a awesome place to learn. I would say i am a novice in programming, but when i get some feedback i get motivated to solve the problem. So i am posting the code below which give all the lines common in the two files. I still get the exception for line 13 and 63. Comments are appreciated.












Aditya Sirohi
Ranch Hand

Joined: Jan 05, 2010
Posts: 93

I got it i had to do for (int i = 0 ; i < arrayLines1.length ; i++) instead of for (int i = 0 ; i <= arrayLines1.length ; i++) in the displayRecords().

Thanks everyone. Marking the string as resolved.

-Aditya
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

There's no need to read each file twice, but it's a good starting point. Congrats!
salvin francis
Ranch Hand

Joined: Jan 12, 2009
Posts: 928

Glad to be of assistance
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Compare Two Text Files