• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Finding Duplicate Records

 
Ranch Hand
Posts: 3271
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to clean up a database table which contains duplicate records. I want to put in a constraint so that we won't have duplicates in the future but, before I can put the constraint in place, I need to remove all the existing duplicates. I have a table that looks like this:



In this case, I'd like to locate the records that look like this:



How can I write a query to do this? I'm a little stumped on the SQL. Also, is there an easy way to remove just all duplicates so I'm left with only one, or is that best done manually?

Thanks.
 
author & internet detective
Posts: 41860
908
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Corey,
The easiest way I can think of is to copy all the records to a temp table and then re-insert them using select distinct.

While I can imagine a way of identifying duplicates (using group by and having), I can't think of a way to delete them in a non database specific way. Of course, you probably don't care about that as you are dealing with a specific database. Which one?

I'm curious if anyone has a reply that doesn't involve a temp table.
 
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jeanne Boyarsky:
Corey,
The easiest way I can think of is to copy all the records to a temp table and then re-insert them using select distinct.

While I can imagine a way of identifying duplicates (using group by and having), I can't think of a way to delete them in a non database specific way. Of course, you probably don't care about that as you are dealing with a specific database. Which one?

I'm curious if anyone has a reply that doesn't involve a temp table.



ahh but Jeanne... where's the challenge in that? :p

Use a subquery with an aggregate function such as MIN to arbitrarily choose the ID to retain:

delete from mytable
where myPK not in ( select min(myPK)
from mytable
group by SomeID )
[ November 27, 2007: Message edited by: Paul Campbell ]
 
Corey McGlone
Ranch Hand
Posts: 3271
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Paul Campbell:
delete from mytable
where myPK not in ( select min(myPK)
from mytable
group by SomeID )



That worked great, Paul, and after seeing what you presented, I'm surprised I hadn't thought of it myself - I had done something very similar in another query I had written. Thanks for the help.
 
Paul Campbell
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
you're welcome... it is one of those solutions that a lot of very, very smart people find hard to visualize when they first encounter it since it requires you to think counter from how most of us solve problems.
 
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just wanted to make a further point for anyone that may read the post. It is usually easy enough to write a query to delete the duplicates, however in most cases you don't simply want to delete a record based on an aggregate function.

It often can require analysis to know which of the dupes should be removed. For example you may have the same product several times with all different prices. The one to keep is the one with the right price. That often takes analysis. God help you if you have many dupes in a large table.

That is why it is so important to not allow duplicates in the first place.
 
Paul Campbell
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by steve souza:
Just wanted to make a further point for anyone that may read the post. It is usually easy enough to write a query to delete the duplicates, however in most cases you don't simply want to delete a record based on an aggregate function.

It often can require analysis to know which of the dupes should be removed. For example you may have the same product several times with all different prices. The one to keep is the one with the right price. That often takes analysis. God help you if you have many dupes in a large table.

That is why it is so important to not allow duplicates in the first place.



Steve, the example is only intended solve the problem of duplicate rows. It isn't intended to solve the problem of duplicated identifiers or all possible instances of data duplication... though the logic process would likely be the same in your example... you identify what you're keeping and delete the rest.
 
Jeanne Boyarsky
author & internet detective
Posts: 41860
908
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Paul Campbell:
you're welcome... it is one of those solutions that a lot of very, very smart people find hard to visualize when they first encounter it since it requires you to think counter from how most of us solve problems.


But I bet they then remember it forever. That's a great solution. I look forward to citing it in the future
 
The only cure for that is hours of television radiation. And this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic