This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Removing duplicate records

 
Rashmi Trivedi
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All
I am trying to write an java application which has DB2400 database. The aim of this application is following

Find the duplicate records in table
Write them into CSV file
Keep the first records and delete the rest.

eg if you find, 10 duplicate records for given criteria, keep the 1st one from the list and delete 9 records.

I need some help "Keep the first records and delete the rest"..Could you please give me some help with coding..

Thank you in advance..

regards,
Rashmi Trivedi
 
Maximilian Xavier Stocker
Ranch Hand
Posts: 381
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Rashmi Trivedi:
Hi All
I am trying to write an java application which has DB2400 database. The aim of this application is following

Find the duplicate records in table
Write them into CSV file
Keep the first records and delete the rest.

eg if you find, 10 duplicate records for given criteria, keep the 1st one from the list and delete 9 records.

I need some help "Keep the first records and delete the rest"..Could you please give me some help with coding..

Thank you in advance..

regards,
Rashmi Trivedi


Simplest way would be like this I think.

Create a new table (copy of old one)

Populate new table with all records of old table.

Truncate old table.

Optional - add primary key to old table. I would highly recommend this as to prevent duplicate records from coming back.

Now add records from new table back to old table but use a DISTINCT on the SELECT this time.

Then you have the old table with the duplicates removed and the new table has all the records and you can do as you like (dump to CSV and drop the table I suppose)
 
Rashmi Trivedi
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Maxi
Thank you for the quick response..much appreciated..

in order to follow your suggestion, i have littl issue as table is quiet big and it is virtually impossible to copy the table and use distinct insert in this situation..

I will have to go by record set and determine form the recordset what stays the live table and what goes into CSV file.

once again..thank you so much for your response..

regards,
Rashmi Trivedi
 
Maximilian Xavier Stocker
Ranch Hand
Posts: 381
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well maybe but I would bet heavily that doing it all in SQL will be MUCH faster than processing it yourself in a recordset. Basically the more you can do directly in SQL the better off you will be.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd do this with a stored procedure. Open a cursor (use a select statement which orders the results), read through these results, insert each record into a new temp table unless the values have not changed from the last insert, in which case ignore them. This will be more performant than using a distinct, if the results are ordered the only comparison that is required is the last record, not every record in the table. I can't help you with the syntax of the procedure I'm afraid (I have almost no experience with DB2) but that's the logic I'd use.

Maximilian ofers some very good advice:

Optional - add primary key to old table. I would highly recommend this as to prevent duplicate records from coming back.

I'd add to this by saying its not optional. An entity without a primary key is not valid relational data, so has no place in your database.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic