aspose file tools*
The moose likes Beginning Java and the fly likes Comparing two collections... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Comparing two collections..." Watch "Comparing two collections..." New topic
Author

Comparing two collections...

Andrew Hartman
Greenhorn

Joined: Mar 18, 2004
Posts: 16
Hey
I have a small problem that I'm trying to solve.
I have 2 HashSet's (called set1 and set2) containing 5 values each.
SET112345
SET2246810
I'm trying to compare the sets in 3 different ways.
1). Create a new HashSet that contains the elements from both SET1 and SET2
example1: Set contains: 2,4
2). Create a new HashSet that contains the elements from SET1 or SET2 or both.
example2: Set contains: 1,2,3,4,5,6,8,10
3). Create a new HashSet that contains the elements from SET1 and NOT SET2
example3: Set contains: 1,3,5

I think Question 2). is not a problem for me. As HashSets don't contain duplicate values I can add the contents of SET1 and SET2 to a new HashSet.
However, I can't yet think of a solution for solving 1). and 3).

Thanks
Jeffrey Hunter
Ranch Hand

Joined: Apr 16, 2004
Posts: 305
Assume the two sets are S1 and S2.
For 1.) and 3.) you can use a brute-force algorithm which basically iterates through the smaller set S1, checking if the current element is in S2, and if so, either add it to the new HashSet, or ignore it. Because it's a HashSet, the brute force method is not that bad in terms of efficiency.
You can find more info about iterating through a Collection at Sun's website.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

Look at the removeAll() and retainAll() methods of the Set interface.
If you make a copy of one set, and then call retainAll() on the copy passing the other set as an argument, you have the intersection. If you make a copy of one, then call removeAll() on the copy passing the other set as an argument, you have the answer to your number 3.


[Jess in Action][AskingGoodQuestions]
Andrew Hartman
Greenhorn

Joined: Mar 18, 2004
Posts: 16
Thanks, you have been most helpful.
Here is my code:
set1 = [URL1,URL2,URL3,URL4,URL5]
set2 = [URL2,URL4,URL6,URL8,URL10]

This is only part of a program I'm making, where I have a 'word' matched to a list or url names:
e.g.
the word 'hello' is in urls: [URL1,URL4]
'goodbye' : [URL1,URL4,URL5]
'another_word' : [URL1]
etc.
My program has many of these (sets of URLs that correspond with a word)
I'm performing queries on the lists/sets such that with the input :
and(hello,goodbye) would return : [URL1,URL4]
I've done this by parsing the input query string 'and(word,word,...,)'
Then tokenizing the words within the brackets. As each word is related to a set of URLs, I can then use their sets to perform union, intersection etc. on them.
I now want my program to be able to handle any number of words in the input string.
e.g. and(word,word,...,N words)
Will I be able to modify my above code to handle this? My intial ideas were, when tokenzing the input string do a token count. From here I'll then know how many 'sets of URLs' i'll be using. Then I can perform my union, intersection, set difference actions on them like in my above code.
Any help with this I will greatly appreciate, thanks
Tim West
Ranch Hand

Joined: Mar 15, 2004
Posts: 539
You'll find the Collections package in the Jakarta Commons project makes this trivial. Check out the CollectionUtils class (org.apache.commons.collections.CollectionUtils)
Both Set 1 and Set 2 - use CollectionUtils.intersection()
Either Set 1 or Set 2 - use CollectionUtils.union()
Set 1 and not set 2 - use CollectionUtils.subtract()
One advantage here is that the algorithms used are *probably* efficient ones, given the usual quality of Jakarta libraries. That said, I haven't checked the source to see.
Cheers,

--Tim
[ May 03, 2004: Message edited by: Tim West ]
Maulin Vasavada
Ranch Hand

Joined: Nov 04, 2001
Posts: 1871
hi,
just my 2 cents.
I would first,
1. sort SET1 in some algo with log(n) order
2. sort SET2 in some algo with log(n) order
3. consider bigger set SET1
4. loop thru SET1 indexes and same index in SET2
- if the element is equal --> put in "AND SET" set
- if the elements differ put both elements in "OR SET" set
- if the elements differ put the SET1's or SET2's element (depending upon where i want to put NOT) in "NOT SET" set
This would give me total order,
log(n)+log(m)+l where,
n = num of elements in first set
m = num of elements in second set
l = n or m; whatever is greater
i guess that would be good.
regards
maulin
Tim West
Ranch Hand

Joined: Mar 15, 2004
Posts: 539

1. sort SET1 in some algo with log(n) order

I assume you mean n log(n) right?
Your approach sounds efficient to me though. Your final step I think would be order l = (n + m), since at each iteration you may only increment your index through set 1 or set 2 respectively. (I'm imagining that you have an index into each "set", and that as you step through you increment the counter that points to the smaller item).
Alternatively, you could also use a priority queue for each set, but given the availability of efficient, well-tested sorting algorithms, your approach is better.
--Tim
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Comparing two collections...