• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Paul Clapham
  • paul wheaton
Sheriffs:
  • Tim Cooke
  • Devaka Cooray
  • Rob Spoor
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:

Is there any query/code that can find common values in records?

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Would you please guide me (maybe Simple and fast query if there is or some fast code) to convert my CSV data file (with commas separation):

1,A,C,Z,F,G
2,G,Q,R,C,
3,Z,G,Q,
4,C,F,
5,O,P,
6,O,X,Y,J,
7,A,P,X,

I have this table with ~1,000,000 records like these 7 records that you see (In real Database A,B,C,... are words in string), Records 1 and 2 are common in G and C value and 2,3 and 1,3 and ...

I want to sync records if they have at least two common value like Records 1 & 2,3,4 (but record 5,6,7 haven't at least 2 shared values with others) and generate a list like this:

1 A C Z F G Q R
2 G Q R C A Z F
3 Z G Q A C F R
4 C F A Z G Q R
5 O P
6 O X Y J
7 A P X

at the end we must have 4 same records if we sort data and one others without sync:

1 A C F G Q R Z
2 A C F G Q R Z
3 A C F G Q R Z
4 A C F G Q R Z
5 O P
6 J O X Y
7 A P X

Maybe I do not use good term for my meaning, please see:

1 A C Z F G
2 G Q R C

record 1 has C and G common with Record 2 now 1 has not R and Q thus we must have 1 A C Z F G + Q and R and Record 2 has not A,Z and F thus we must have: 2 G Q R C + A,Z and F thus at the end we have:

1 A C Z F G Q R
2 G Q R C A Z F

I need all records Respectively in the queue from top to bottom. wrote a delphi code but it is so slow. Someone suggest me this groovy code:

def f=[:]
new File('Data.csv').readLines().each{
def items=it.split(',')
def name
items.eachWithIndex { String entry, int i ->
   if(i==0){
       name=entry
   }
   else if(entry){
       if(!f[entry])
           f[entry]=[]
       f[entry]<<name
   }
}

}
f.findAll {it.value.size()>1}

It is very fast (because of using a map file I think), but It only finds the common values.
 
Marshal
Posts: 78651
374
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please explain how many duplicates you will have. One way you can do it is to put each String into a Set<String> and work out the intersections of those Sets. But I am a bit worried that you will end up with 1,000,000 Sets in memory simultaneously, which will be expensive in terms of memory consumption and performance. So it is quite likely there will be better ways to do it.
Are those Strings common words? If so, you can probably save space by interning every single String as soon as you read it from the database.
Are you reading from a CSV file or a database? It is probably better to create an SQL query to look for duplicates, but I can't think how at the moment.

And welcome to the Ranch
 
sam Saam
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you for your Time
Really I dont know How many duplicates may be inside of it, now I read CSV but I can make an mysql from it,
But How can I make a query for this when they are not in the same column and I do not know where are there
Those are not common words
I thout maybe with NoSql Databases like mongodb I can manage it.
 
There is no "i" in denial. Tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic