• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

File content comparison

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
This is my current predicament... I need to compare contents of two files, and store entries that are missing in fileA, but are present in fileB into fileC. How would i go about comparing the file entries? Let's say that the files are in comma delimited CSV format...
Thanks in advance
Dmitry
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, there are a number of possible strategies. I think we first need some more info about your problem.
Does each line have something that serves as a unique key? So that you can look at line 15 of fileA and line 22 of fileB, and tell that they refer to the same record, even though some of the other data on the line has changed?
Are the files sorted according to this key? Or more generally, are they sorted in any way?
Are these files particularly large? Is is feasible to store all the data from at least one of the files in memory somehow (e.g. in a HashMap or TreeMap) while performing the comparison?
You mention the possibility of lines which are in fileB but not fileA. Will you also need to detect lines which are in fileA but not fileB? Or lines which are in both, but some of the data in the line has changed?
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic