File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes comparing records Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "comparing records" Watch "comparing records" New topic
Author

comparing records

Sherif Shehab
Ranch Hand

Joined: Mar 05, 2007
Posts: 483

Hi Guys ,

I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?


Thanks,
Sherif
Jim Hoglund
Ranch Hand

Joined: Jan 09, 2008
Posts: 525
I would sort both records, independently, and then step
through them in parallel to answer your question.
Jim ...


BEE MBA PMP SCJP-6
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

Sherif Shehab wrote:I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?

If you're talking about database records, then letting the database handle it is probably the best solution. Use an INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN to combine the two tables.

For instance, to get the number of records that are in the 1 million but not in the 6 million (let's call them tables "one" and "six"):
This will link the two tables on the two fields specified in the JOIN clause. Every record of table "one" without a matching record in table "six" will have NULL values for all columns of table "six". The WHERE clause then selects only these matches, and the COUNT(*) returns the number of records.


If they're not database records, use Maps.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Sherif Shehab
Ranch Hand

Joined: Mar 05, 2007
Posts: 483

if they're not database records, use Maps.


What about Sets vs Maps from performance perspective, because what i'm thinking in is to use HashSet for the two groups of data as you name them one and six to sure there is no duplication , then check if the six HashSet contains what in one HashSet , What do you think ?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

It depends how you wrote your equals() and hashCode() methods. It they are not compatible a HashMap is better, with the common keys as the map's keys and the records themselves as the objects.

If you use a Set or Map, check out the bulk methods containsAll, removeAll and retainAll defined in java.util.Collection. Map has methods keySet(), values() and entrySet() you can use to get a Collection (Set extends Collection).
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

You don't plan to hold these millions of keys in a Map, do you ?


[My Blog]
All roads lead to JavaRanch
Sherif Shehab
Ranch Hand

Joined: Mar 05, 2007
Posts: 483

I think i;ll go for Sets , but what is more faster in iteration on the Set for loop or an iterator ? or both are same ?
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?
Sherif Shehab
Ranch Hand

Joined: Mar 05, 2007
Posts: 483

Christophe Verré wrote:If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?

Actually the records are in DB , but they dont want me to do anything on the DB for major performance issues , this why i need to put them in some Collections to the comparing on them ..
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

So they are moving the performance problem from the database server to the application? Great...
Sherif Shehab
Ranch Hand

Joined: Mar 05, 2007
Posts: 483

Rob Prime wrote:So they are moving the performance problem from the database server to the application? Great...


Ya Rob
Jim Hoglund
Ranch Hand

Joined: Jan 09, 2008
Posts: 525
Maybe you can creep up on those pesky DBAs. If they are
nervous about the joins, maybe you can at least get each
output record sorted before you receive it.
Jim ... ...
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38765
    
  23
Rob is right (he usually is), and you will probably get better performance asking the database to do that; database management programs are specially optimised for that sort of query.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: comparing records