• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

Intersection of two arrays

 
Ranch Hand
Posts: 95
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was asked this question recently in an interview. The interviewer wanted to know how to get the intersection of two unsorted arrays. My answer was to use the brute-force approach - for each element in one array, check if the element exists in the second array by looping thru it. He asked for a better approach, and my answer was to construct a binary search tree using the first array and do a look up with the elements in the second array. He didn't seem satisfied. Are there any better ways of doing this? TIA.
 
Bartender
Posts: 1205
22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Bala Krishna:
I was asked this question recently in an interview. The interviewer wanted to know how to get the intersection of two unsorted arrays. My answer was to use the brute-force approach - for each element in one array, check if the element exists in the second array by looping thru it. He asked for a better approach, and my answer was to construct a binary search tree using the first array and do a look up with the elements in the second array. He didn't seem satisfied. Are there any better ways of doing this? TIA.



The brute-force method takes O(m*n) time (an O(m) linear search n times).

Your second method takes O(n * log m) time (an O(log m) search n times).

You could sort both arrays, then step through them both simultaneously. This would take O(log m + log n) tme for the sorts and O(n+m) time for the search.

 
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The complexity of sorting algorithm is O(mLOG(m)), not O(LOG(m)!!!
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, the second way is O(m log m) to sort, plus O(n log m) to search. Ryan's third way is O(m log m + n log n) for the sorts, and O(m + n) to step through afterwards. Regardless, there's no requirement here to sort at all, and so the factor of log m can be avoided by forgetting about sorts or binary trees, and just using a hashtable of some sort (e.g. HashSet), such that the whole process is O(m) to store the first array, and O(n) to check the contents of the second. The only disadvantage I see is that this takes more memory as the hashtable would be a separate structure, and sorting would allow you to use the existing array. As long as memory isn't a major issue, I'd use a hashtable here.
[ June 08, 2006: Message edited by: Jim Yingst ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, it may be worthwhile (in the context of an interview) to verify some assumptions. Are the values in each array unique? If not, then if one array contains value 0 twice, and the other contains it three times - does the result need to contain 0 twice, or is once enough? If it needs to be twice, my solution above can be modified slightly to handle it - it's just a bit more work. The main thing is to find out just what the requirements really are in this case.
 
Ryan McGuire
Bartender
Posts: 1205
22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jason Liao:
The complexity of sorting algorithm is O(mLOG(m)), not O(LOG(m)!!!



OOOooooppps! Thank you for reminding me of that.

Along those lines...
Big O complexity (for time or memory) is more meaningful when array sizes are fairly large and/or the code will have to run often. If m and n are both less than 100, say, it would probably be "better" to spend 15 minutes to program the brute force method that runs in 500 ms than to double the programming time in order to cut run time to 100ms. On the other hand, if the arrays each have a million elements, then some engineering might make sense.

Another option: you could write each array out to a file, call 'sort' on the command line for each and then call 'comm -12' on the files to find the lines they have in common.
 
Bala Krishna
Ranch Hand
Posts: 95
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your responses. Jim, I tried to paste here some thing you wrote.

O(n) to check the contents of the second



My understanding is that with a HashSet/Hashtable, lookup operation is O(1) (for most cases). Can you please elaborate on why you think it's O(n) for the look up on the second array?
To be more specific, if we do an O(1) operation n times, will the complexity be the same as O(n)?
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Each individual lookup is O(1), yes, and each individual insertion is also O(1). However m insertions are O(m), and n lookups are O(n).

[Ryan]: If m and n are both less than 100, say, it would probably be "better" to spend 15 minutes to program the brute force method that runs in 500 ms than to double the programming time in order to cut run time to 100ms.

If we're programming in, say, C/C++, and with no useful (relevant) libraries available, I agree. If it's Java (and if we're not banned from using standard libraries), the existence of HashSet and HashMap makes my solution easier to program, I think.
[ June 09, 2006: Message edited by: Jim Yingst ]
 
Jason Liao
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Jim,

I am afraid your method is the best one to solve this problem: using Hashtable/HashSet, in case the performance is the priority.

good job,

Jason
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
[Jason]: I am afraid your method is the best one to solve this problem

What, you make it sound like that's a bad thing.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I realize that I'm kind of bringing this thread back from the dead, but I ran across this problem myself and I'm really trying to understand how best to implement the HashTable solution and why it's more efficient than the brute force approach. I came up with the class below, but I'm still a bit unclear on how this is more efficient.

Thanks,
Alex

 
lowercase baba
Posts: 13091
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's more efficient because you have to do less stuff. Lets say you have 5 elements in the first array, and 6 in the second. you may have to do up to 30 comparisons (comparing each element in the first with each element in the second). As the size of the two arrays go up, the total number of comparisons goes up exponentially. 10 and 10 would require 100 comparisons. 20 by 20 would require 400 comparisons. By the time you get to 1,000,000 by 1,000,000, the number of comparisons becomes HUGE - over a billion comparisons..

However, if you build a has out of one, that is linear. 10 elements takes 10 insertions. 20 elements takes 20 insertions. 1,000,000 elements takes 1,000,000 insertions. You then do lookups of the second array. again, 10 elements takes 10 lookups. 20 takes 20. 1,000,000 take 1,000,000.

The point is not to determine which is faster for a specific size arrays. The point is to determine which will slow to a standstill if the size of the arrays increase.

Doing brute force on arrays of size 2 and 3 will possibly be faster than building the hash table. However, if you coded the brute force and started it running, you could probably then design, code, test AND run the hastable version on 10^6 by 10^6 before the brute force finishes.
 
Alex Walker
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Of course, I see it now. Thanks for clearing that up.

Alex
 
Blueberry pie is best when it is firm and you can hold in your hand. Smell it. And smell this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic