I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?
Sherif Shehab wrote:I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?
If you're talking about database records, then letting the database handle it is probably the best solution. Use an INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN to combine the two tables.
For instance, to get the number of records that are in the 1 million but not in the 6 million (let's call them tables "one" and "six"):
This will link the two tables on the two fields specified in the JOIN clause. Every record of table "one" without a matching record in table "six" will have NULL values for all columns of table "six". The WHERE clause then selects only these matches, and the COUNT(*) returns the number of records.
What about Sets vs Maps from performance perspective, because what i'm thinking in is to use HashSet for the two groups of data as you name them one and six to sure there is no duplication , then check if the six HashSet contains what in one HashSet , What do you think ?
It depends how you wrote your equals() and hashCode() methods. It they are not compatible a HashMap is better, with the common keys as the map's keys and the records themselves as the objects.
If you use a Set or Map, check out the bulk methods containsAll, removeAll and retainAll defined in java.util.Collection. Map has methods keySet(), values() and entrySet() you can use to get a Collection (Set extends Collection).