Can anybody suggest an optimum algorithm for determining if two arrays are identical eg.,

I know that this can be solved using Arrays.sort but in that case the quicksort will be O(n logn). Again if we store the first list in a Hashtable then we will be using extra space as big as the array. Is there any solution that is both space and time efficient?

You mean if two arrays contain the same elements? That's a big difference.

I think first doing an Arrays.sort() and then an Arrays.equals() is your best option. It's simple, should be fast, and they are tried and tested methods.

It's impossible to say what space complexity they give, but I don't think they're in-place. Regardless, I think using these methods will be better than anything you or I can come up with without knowing exact speed and memory requirements.

Use Hashtable . first insert elements that are in first array . and now loop through 2nd array and see this current element is there in Hashtable or not .

*Assume both arrays size are same. and indentical what you mean is two arrays contain the same elements.

Seetharaman Venkatasamy wrote:Use Hashtable . first insert elements that are in first array . and now loop through 2nd array and see this current element is there in Hashtable or not .

It's not quite that simple if there can be repeated values. If you can guarantee there aren't repeated values I'd throw both arrays in a HashSet and call Set.equals() to compare them.

If you can't, then I'd go with Stephan's approach (probably adding a check that the arrays are the same size first - if they aren't they can't be equal, and so there's no point wasting time) until it was proven that wasn't good enough for some specific reason.

Matthew Brown wrote:
It's not quite that simple if there can be repeated values.

Yeah.. you pointed out well . in that case what we can do is put the counter to 1 [key - element, value-count] . if you find the element again increase the count.

when you loop the second array if the current element is there in hashtable decrement it .. upon the job completion, iterate the hashtable and see if the element value has 0 counter else say not identical.

Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3379

9

posted

0

I did some very simple tests, and the choice between the hash and the sorting based approaches seems to strongly depend on whether you want to sort an int[] or an Integer[] (of course assuming all the elements are unique, otherwise the sort based approach is favored).

I added 10.000.000 unique integers to a list, shuffled it and then assigned all the integers to two different int[] objects in the same order. Then I started the test.

The sorting algorithm took around 2200 milliseconds to complete. The hashing algorithm took around 7000 milliseconds. The sorting algorithm was also consistently faster for smaller sets of data.

When I started with two Integer[] arrays, the tables were completely turned. The sorting algorithm seems to have a much harder time sorting objects than primitives. On my system the sorting algorithm took around 11.000 milliseconds, whereas the hash based approach took roughly 3500 milliseconds.

This is of course a worst-case scenario. I haven't tested the situation where the two sets of data are unequal. Here's the quick and dirty int[] test so you can get an idea of what I did:

Stephan van Hulst wrote:I added 1.000.000 unique integers to a list, shuffled it and then assigned all the integers to two different int[] objects in the same order. Then I started the test.

Interesting. Your results would tend to suggest that there's quite an overhead for boxing/unboxing.

@Dorothy: In answer to your initial question "Is there any solution that is both space and time efficient?", I would say that Arrays.sort() is likely to be the most space-efficient, since it sorts in place; however, in the process it also destroys your original data.

If the arrays are specifically integer arrays, and you're not worried about repeated values, another alternative might be to use a BitSet. It will be horribly space-hungry for small arrays; particularly if they contain large values, but for very large arrays, it may be quite efficient.

Winston

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3379

9

posted

0

Yes, auto-boxing is fairly expensive. It should be noted though that the (quick-)sort on int primitives is still faster than the hashing on Integers. Merge-sort on Integer is significantly slower though, much more than the auto-boxing can account for. I don't know if this is a result of the difference between the quick- and merge-sort algorithms, or that the merge-sort algorithm has to make calls to the compareTo() method.

I also corrected my last post a bit since you replied. I tested on 10.000.000 elements.

Stephan, on line 38 of your code, you are creating a2 which is identical to a1(same value at all indices of a1). Why would you need to sort the array then?

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3379

9

posted

0

Because it is a test. It tests the methods testArrays() and testTables(). These two methods aren't supposed to know anything about the arrays passed to them (except that every element is unique), so they also don't know that the two arrays are already exactly the same.

The point is that the two arrays have the same elements, because this would constitute a worst-case-scenario: every element in the two arrays have to be inspected in order to find out whether the two arrays are really the same. This is to eliminate the possibility that either of the two proposed approaches terminates ahead of time. The downside is that this test says very little about the average-case-scenario.

The reason why I shuffled the first array was to make the difference between Arrays.sort(int[]) and Arrays.sort(T[]) more realistic. The merge-sort used for the generic case handles arrays that are already (almost) completely sorted much faster* than the quick-sort does for the primitive case. Shuffling the array gives merge-sort a realistic handicap. The reason why I *didn't* shuffle the second array, is because it in fact doesn't matter that the two arrays have the same order, as long as they're shuffled in the first place.

*: Faster in terms of time complexity. Not necessarily absolute speed.