aspose file tools*
The moose likes JDBC and the fly likes Removing duplicate records Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Removing duplicate records " Watch "Removing duplicate records " New topic
Author

Removing duplicate records

Rashmi Trivedi
Ranch Hand

Joined: Aug 13, 2001
Posts: 60
Hi All
I am trying to write an java application which has DB2400 database. The aim of this application is following

Find the duplicate records in table
Write them into CSV file
Keep the first records and delete the rest.

eg if you find, 10 duplicate records for given criteria, keep the 1st one from the list and delete 9 records.

I need some help "Keep the first records and delete the rest"..Could you please give me some help with coding..

Thank you in advance..

regards,
Rashmi Trivedi
Maximilian Xavier Stocker
Ranch Hand

Joined: Sep 20, 2005
Posts: 381
Originally posted by Rashmi Trivedi:
Hi All
I am trying to write an java application which has DB2400 database. The aim of this application is following

Find the duplicate records in table
Write them into CSV file
Keep the first records and delete the rest.

eg if you find, 10 duplicate records for given criteria, keep the 1st one from the list and delete 9 records.

I need some help "Keep the first records and delete the rest"..Could you please give me some help with coding..

Thank you in advance..

regards,
Rashmi Trivedi


Simplest way would be like this I think.

Create a new table (copy of old one)

Populate new table with all records of old table.

Truncate old table.

Optional - add primary key to old table. I would highly recommend this as to prevent duplicate records from coming back.

Now add records from new table back to old table but use a DISTINCT on the SELECT this time.

Then you have the old table with the duplicates removed and the new table has all the records and you can do as you like (dump to CSV and drop the table I suppose)
Rashmi Trivedi
Ranch Hand

Joined: Aug 13, 2001
Posts: 60
Hi Maxi
Thank you for the quick response..much appreciated..

in order to follow your suggestion, i have littl issue as table is quiet big and it is virtually impossible to copy the table and use distinct insert in this situation..

I will have to go by record set and determine form the recordset what stays the live table and what goes into CSV file.

once again..thank you so much for your response..

regards,
Rashmi Trivedi
Maximilian Xavier Stocker
Ranch Hand

Joined: Sep 20, 2005
Posts: 381
Well maybe but I would bet heavily that doing it all in SQL will be MUCH faster than processing it yourself in a recordset. Basically the more you can do directly in SQL the better off you will be.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

I'd do this with a stored procedure. Open a cursor (use a select statement which orders the results), read through these results, insert each record into a new temp table unless the values have not changed from the last insert, in which case ignore them. This will be more performant than using a distinct, if the results are ordered the only comparison that is required is the last record, not every record in the table. I can't help you with the syntax of the procedure I'm afraid (I have almost no experience with DB2) but that's the logic I'd use.

Maximilian ofers some very good advice:

Optional - add primary key to old table. I would highly recommend this as to prevent duplicate records from coming back.

I'd add to this by saying its not optional. An entity without a primary key is not valid relational data, so has no place in your database.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Removing duplicate records