Last week, we had the author of TDD for a Shopping Website LiveProject. Friday at 11am Ranch time, Steven Solomon will be hosting a live TDD session just for us. See for the agenda and registration link
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

comparing records

 
Ranch Hand
Posts: 493
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Guys ,

I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?
 
Ranch Hand
Posts: 525
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would sort both records, independently, and then step
through them in parallel to answer your question.
Jim ...
 
Sheriff
Posts: 22649
126
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sherif Shehab wrote:I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?


If you're talking about database records, then letting the database handle it is probably the best solution. Use an INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN to combine the two tables.

For instance, to get the number of records that are in the 1 million but not in the 6 million (let's call them tables "one" and "six"):
This will link the two tables on the two fields specified in the JOIN clause. Every record of table "one" without a matching record in table "six" will have NULL values for all columns of table "six". The WHERE clause then selects only these matches, and the COUNT(*) returns the number of records.


If they're not database records, use Maps.
 
S Shehab
Ranch Hand
Posts: 493
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

if they're not database records, use Maps.



What about Sets vs Maps from performance perspective, because what i'm thinking in is to use HashSet for the two groups of data as you name them one and six to sure there is no duplication , then check if the six HashSet contains what in one HashSet , What do you think ?
 
Rob Spoor
Sheriff
Posts: 22649
126
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It depends how you wrote your equals() and hashCode() methods. It they are not compatible a HashMap is better, with the common keys as the map's keys and the records themselves as the objects.

If you use a Set or Map, check out the bulk methods containsAll, removeAll and retainAll defined in java.util.Collection. Map has methods keySet(), values() and entrySet() you can use to get a Collection (Set extends Collection).
 
Sheriff
Posts: 14691
16
Eclipse IDE VI Editor Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You don't plan to hold these millions of keys in a Map, do you ?
 
S Shehab
Ranch Hand
Posts: 493
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think i;ll go for Sets , but what is more faster in iteration on the Set for loop or an iterator ? or both are same ?
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE VI Editor Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?
 
S Shehab
Ranch Hand
Posts: 493
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Christophe Verré wrote:If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?


Actually the records are in DB , but they dont want me to do anything on the DB for major performance issues , this why i need to put them in some Collections to the comparing on them ..
 
Rob Spoor
Sheriff
Posts: 22649
126
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So they are moving the performance problem from the database server to the application? Great...
 
S Shehab
Ranch Hand
Posts: 493
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:So they are moving the performance problem from the database server to the application? Great...



Ya Rob
 
Jim Hoglund
Ranch Hand
Posts: 525
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe you can creep up on those pesky DBAs. If they are
nervous about the joins, maybe you can at least get each
output record sorted before you receive it.
Jim ... ...
 
Marshal
Posts: 75698
354
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Rob is right (he usually is), and you will probably get better performance asking the database to do that; database management programs are specially optimised for that sort of query.
 
Why am I so drawn to cherry pie? I can't seem to stop. Save me tiny ad!
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic