• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Tim Cooke
  • Paul Clapham
  • Devaka Cooray
  • Bear Bibeault
  • Junilu Lacar
  • Knute Snortum
  • Liutauras Vilda
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Piet Souris
  • salvin francis
  • Carey Brown
  • Frits Walraven

Is there any query/code that can find common values in records?

Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Would you please guide me (maybe Simple and fast query if there is or some fast code) to convert my CSV data file (with commas separation):


I have this table with ~1,000,000 records like these 7 records that you see (In real Database A,B,C,... are words in string), Records 1 and 2 are common in G and C value and 2,3 and 1,3 and ...

I want to sync records if they have at least two common value like Records 1 & 2,3,4 (but record 5,6,7 haven't at least 2 shared values with others) and generate a list like this:

1 A C Z F G Q R
2 G Q R C A Z F
3 Z G Q A C F R
4 C F A Z G Q R
5 O P
6 O X Y J
7 A P X

at the end we must have 4 same records if we sort data and one others without sync:

1 A C F G Q R Z
2 A C F G Q R Z
3 A C F G Q R Z
4 A C F G Q R Z
5 O P
6 J O X Y
7 A P X

Maybe I do not use good term for my meaning, please see:

1 A C Z F G
2 G Q R C

record 1 has C and G common with Record 2 now 1 has not R and Q thus we must have 1 A C Z F G + Q and R and Record 2 has not A,Z and F thus we must have: 2 G Q R C + A,Z and F thus at the end we have:

1 A C Z F G Q R
2 G Q R C A Z F

I need all records Respectively in the queue from top to bottom. wrote a delphi code but it is so slow. Someone suggest me this groovy code:

def f=[:]
new File('Data.csv').readLines().each{
def items=it.split(',')
def name
items.eachWithIndex { String entry, int i ->
   else if(entry){

f.findAll {it.value.size()>1}

It is very fast (because of using a map file I think), but It only finds the common values.
Posts: 68862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please explain how many duplicates you will have. One way you can do it is to put each String into a Set<String> and work out the intersections of those Sets. But I am a bit worried that you will end up with 1,000,000 Sets in memory simultaneously, which will be expensive in terms of memory consumption and performance. So it is quite likely there will be better ways to do it.
Are those Strings common words? If so, you can probably save space by interning every single String as soon as you read it from the database.
Are you reading from a CSV file or a database? It is probably better to create an SQL query to look for duplicates, but I can't think how at the moment.

And welcome to the Ranch
sam Saam
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your Time
Really I dont know How many duplicates may be inside of it, now I read CSV but I can make an mysql from it,
But How can I make a query for this when they are not in the same column and I do not know where are there
Those are not common words
I thout maybe with NoSql Databases like mongodb I can manage it.
It is sorta covered in the JavaRanch Style Guide.
    Bookmark Topic Watch Topic
  • New Topic