aspose file tools*
The moose likes Beginning Java and the fly likes Removing duplicates from an ordered array Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Removing duplicates from an ordered array" Watch "Removing duplicates from an ordered array" New topic
Author

Removing duplicates from an ordered array

Johan Mena
Greenhorn

Joined: Nov 10, 2011
Posts: 11
Hello again guys.

So I have the assignment of adding a method to remove duplicates from an ordered array. My class consists of wrapped long array that uses insertion sort to sort its elements.

One can imagine schemes in which all the items from the place where a duplicate was discovered to the end of the array would be shifted down one space every time a duplicate was discovered (I wrote the methods noDups() and its helper, delete(), to do exactly this), but this leads to O(N2) time, at least when there were a lot of duplicates.

My goal is to make sure no item is moved more than once, no matter how many duplicates there are (this will give O(N) time).

My naive way of doing this is to first locate all the duplicates in the array and replace said duplicates with null values, and then shift the elements to the left according to the number of nulls I get. However, there must be a simpler/less-confusing-to-code way to do it. Ideas?

And obviously, no, I can not use hashSets or bitsets or or any other built-in class or data structure different than an array.

This is my class:



And here's main:

Praveen Kumar M K
Ranch Hand

Joined: Jul 03, 2011
Posts: 256
Can you cheat with a temporary array?

If yes, you can do a single traverse of your sorted array and fill in the temp array each time you encounter a newer element. Your newly created array will be duplicate free.
Ralph Cook
Ranch Hand

Joined: May 29, 2005
Posts: 479
I see three indices: one points to the value you "compare to", i.e., the first of the two numbers being compared. the second, "move to", points to the next location where you want the next unique value to go, and the third points to the next value to compare to the "compare to" value, which is also the value that may be moved. As a matter of fact, after you find your first duplicate, it is *always* going to be moved, and I call it the "copy from" index.

The "move to" index is not defined until you actually find duplicates.

If your array contained a b c c c d e f f f g, then you compare a b, b c, and finally c c (in 3rd and 4th positions) and discover you have a duplicate. "move to" gets the index to the 4th position, since that's the location to which you're going to copy the next unique element. then you compare c c (you can compare either 4th and 5th or 3rd and 5th; I find it less confusing if my compare to sits at one index until I find a non-match) and increment "copy from", and then c d; the 6th position (for d) is now your copy from.

From that point on, every time you compare and find no duplicate, you move the unique element from "copy from" to "copied to", then increment both those indices. If you do find a duplicate, you just advance the "copy from" without moving the "copied to" index.

To continue our example, after comparing c and d, you move the d to the 4th position, the 4th also becomes "compare to", then increment "copied to" the 5th position, and the "copy from" to 7th. Now you compare d to e, they do not match so you move e to 5th, make compare to 5th, increment "copied to" to 6 and "copy from" to 8. Compare e to f, repeat. compare f to f, you have a match, so increment "copy from" without moving "compare to" or "copy to".

"copy from" will get further and further away from "copy to" as you find more duplicates. But you only move each element that is going to be moved once. Be sure to null or otherwise eliminate all "leftover" elements; i.e., if you start with 100 and have 90 after removing duplicates, you need to do something to eliminate the 10 elements at the top that are no longer part of the sequence.

How's that?
Johan Mena
Greenhorn

Joined: Nov 10, 2011
Posts: 11
Praveen Kumar M K wrote:Can you cheat with a temporary array?

If yes, you can do a single traverse of your sorted array and fill in the temp array each time you encounter a newer element. Your newly created array will be duplicate free.


I could, but that would defeat my goal of O(N).
Ralph Cook
Ranch Hand

Joined: May 29, 2005
Posts: 479
Johan Mena wrote:
Praveen Kumar M K wrote:Can you cheat with a temporary array?

If yes, you can do a single traverse of your sorted array and fill in the temp array each time you encounter a newer element. Your newly created array will be duplicate free.


I could, but that would defeat my goal of O(N).


No, you would compare each element to one other element, and you would move all of the elements once. Both my scheme (done in place) and the separate array idea are O(N).
Praveen Kumar M K
Ranch Hand

Joined: Jul 03, 2011
Posts: 256
Ralph Cook wrote:
How's that?


Awesome!

I feel we can do away with the indices other than "move to". In "a b c c c d e f f f g", when we are at position 4, "move to" becomes 4. Next the array traversal can continue as normal upto d. At "d" we can copy d to 4th position and increment "move to" and continue to e, which can then be copied to "move to" position which is now 5 and so on.

Once we have reached the end of the array, everything starting from "move to" to the end of the array can be nullified or erased.
Johan Mena
Greenhorn

Joined: Nov 10, 2011
Posts: 11
Ralph Cook wrote:
How's that?


That was great. Somewhat confusing but definitely gonna try to work with it.

Doesn't creating another array and copying the elements to a temp array and then back over to the original array take a significant amount of time?

I'll try to code both ideas up and let you guys know how it goes.
Praveen Kumar M K
Ranch Hand

Joined: Jul 03, 2011
Posts: 256
Johan Mena wrote:
Doesn't creating another array and copying the elements to a temp array and then back over to the original array take a significant amount of time?


Well that would depend on the constraints of the question. If the question plainly says, take a sorted array and display an array without duplicates, then you don't have to go back and copy temp array to original array. You can directly display the temp array. But I suggest you go with implementing Ralph's idea, most data structure questions expect you to do without creating further structures.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7816
    
  21

Johan Mena wrote:Doesn't creating another array and copying the elements to a temp array and then back over to the original array take a significant amount of time?

Actually, the traversal time is likely to be much less than the time taken to actually do the moves, so if this is an exercise in doing things in the most efficient way, then you might want to sacrifice traversals for moves. Don't forget that O(2n) is still a form of O(n).

Here's an alternative method for you, that hopefully illustrates the point.
1. Create an int array the same size as the sorted array you want to convert.
2. Traverse your sorted array, counting the occurrences of each value, and put them in your int array; also, keep track of the number of distinct values found (pretty much required).
3. Starting at index 0, use the values in your "counts" array to skip to the next distinct value in your sorted array and move it to the next position at the front. Alternatively, create a new array of the same type as your sorted array but with only enough room for "distinct" values, and do the same thing.

As you can see, it involves two traversals, but you'll probably find that there's actually very little difference in speed between it and Ralph's method.

That said, Ralph's is definitely "purer" (and probably quicker too).

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Removing duplicates from an ordered array