aspose file tools*
The moose likes Beginning Java and the fly likes How do I remove duplicates from an array? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "How do I remove duplicates from an array?" Watch "How do I remove duplicates from an array?" New topic
Author

How do I remove duplicates from an array?

james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Hey all. So I have a pretty limited understanding of java methods, and am trying to keep my code fairly simple. That being said, I am trying to remove duplicate strings from an array, and have gotten this far:


I am also counting, via the variable uniqWords, every time I add a word to the tempArray that is housing the non duplicate words. Needless to say, this isn't what has been happening, and I am getting a lot of duplicates in my array. I am sure it's something simple, but I can't seem to puzzle it out. Help!
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

You may want to look into ArrayLists, as they give you the capability of removing items. It's much more tedious to use two arrays.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Unfortunately, that's a bit over my head, and I am trying to use only arrays, as that is what we have been covering in class.
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

Look at line 11. You're trying to compare tempString to tempArray, but tempArray is still empty. What you want to be doing is checking each index of tempString against the other indexes of tempString to see when it occurs more than once. After you've done that (if and only if the index is unique), that's when you want to set the next index of tempArray to the current tempString index.

EDIT:
I got this to work:


Now since you set the length of tempArray to the length of tempString, you will have some empty (null) indexes at the end of tempArray. If you first find which indexes of tempString are unique and then declare stringArray[] and make it the length of the unique indexes, you won't end up with null indexes.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I'm not sure I understand. You said tempArray is empty, but I assigned the value of tempString[0] to tempArray[0] so that there is a baseline for comparison. Then I iterate through the tempArray indexes to see if the tempString at some index i is the same as a value of tempArray. If not, put that value into tempArray at i. I tired modifying the code like this:

But that never returns a false value.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I think I understand your code, but it doesn't exactly solve the problem. I need to have an array that has strings in it, even if those strings are duplicates. It just can't contain the actual duplications. It looks like your code will only put those values into tempArray that are not duplicates.

edit: thanks for the help btw!
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

james aggeles wrote:I need to have an array that has strings in it, even if those strings are duplicates. It just can't contain the actual duplications.

Can you rephrase this part? I thought I understood what you were trying to do, but apparently I do not.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Ok. I have an array of strings, some of which are duplicates. I need to remove the duplications and end up with an array that has one instance of each unique string.

In other words, if I start with an array that has "abc" twice, "acb" twice, and "bca" once, I would end up with an array that has "abc" once, "acb" once, and "bca" once.

edit: also, having null counters at certain points in the resulting array is not a problem, I can sort those out later.
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

james aggeles wrote:Ok. I have an array of strings, some of which are duplicates. I need to remove the duplications and end up with an array that has one instance of each unique string.

In other words, if I start with an array that has "abc" twice, "acb" twice, and "bca" once, I would end up with an array that has "abc" once, "acb" once, and "bca" once.

edit: also, having null counters at certain points in the resulting array is not a problem, I can sort those out later.


Got'cha.

EDIT:
This works:

james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
That's pretty brilliant. I will work on it and get back.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I tried that, but when I ran a loop to print out tempArray, I got this huge array with lots of duplicates and null pointers.


This is my output:

I
wonder
why
null
null
null
why
I
wonder
null
null
why
I
wonder
null
null
null
null
null
null
null
null

Richard
Feynman
null

The space at the very end I am not too concerned with, but the rest isn't right. The first three spots are right, but then it has all that other stuff.

Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

What's the rest of your code? When I run the below code, the output is:
hi
hello
joel
christophel
pizza
h
null
null
null
null



How does your array tempString look like when it starts?
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
edit: tempString gets initialized when I use the split method. This is the first time I have ever used this method so if it's not right, I apologize.
edit: also, when I check the tempString array after I strip the punctuation, it's all right.

Here's the code for the entire class (not to get too bogged down in code, but I thought I might be overlooking something obvious):


Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

It seems to me that from the output you've posted, you made multiple calls to your method processDocument(). Basically, it's printing out the correct information, but just multiple times due to the multiple method calls you've made elsewhere in your code.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Only one call. It's called from the constructor which is invoked only once.

Here's the main method that calls the constructor:



The constructor in question is for the Document object "d", and it's code looks like this:

See, just one call.


james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
When I print out tempArray and tempString I get this:

tempArray[i]: i tempString[i]: 30
tempArray[i]: wonder tempString[i]: 41
tempArray[i]: why tempString[i]: 52
tempArray[i]: null tempString[i]: i
tempArray[i]: null tempString[i]: wonder
tempArray[i]: null tempString[i]: why
tempArray[i]: why tempString[i]: 30
tempArray[i]: i tempString[i]: 41
tempArray[i]: wonder tempString[i]: why
tempArray[i]: null tempString[i]: i
tempArray[i]: null tempString[i]: wonder
tempArray[i]: why tempString[i]: 30
tempArray[i]: i tempString[i]: 41
tempArray[i]: wonder tempString[i]: 52
tempArray[i]: null tempString[i]: 63
tempArray[i]: null tempString[i]: 74
tempArray[i]: null tempString[i]: 85
tempArray[i]: null tempString[i]: 96
tempArray[i]: null tempString[i]: 107
tempArray[i]: null tempString[i]: why
tempArray[i]: null tempString[i]: i
tempArray[i]: null tempString[i]: wonder
tempArray[i]: tempString[i]: 10
tempArray[i]: richard tempString[i]:
tempArray[i]: feynman tempString[i]: richard
tempArray[i]: null tempString[i]: feynman


Which doesn't look exactly right to me, though how not escapes me. Am I wrong?
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

The URL in your code points to the text file containing the following (as I'm sure you know):

I wonder why I wonder why.
I wonder why I wonder.
I wonder why I wonder why I wonder why I wonder!
--Richard Feynman

The output you gave me clearly corresponds to each line of text:
Line 1 (I wonder why I wonder why.) yields:
I
wonder
why
null
null
null

Line 2 (I wonder why I wonder.) yields:
why
I
wonder
null
null

Line 3 (I wonder why I wonder why I wonder why I wonder!) yields:
why
I
wonder
null
null
null
null
null
null
null
null

Line 4: (--Richard Feynman) yields:
Richard
Feynman

So it seems as if the method is run for every line from the text file, else why would this be happening? Cool poem, by the way.

james aggeles wrote:
Which doesn't look exactly right to me, though how not escapes me. Am I wrong?

So yes, that seems to be good. Once your null remover thingy is applied, it should be good to go. What exactly doesn't seem right?
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Hm. Ok. That is close to what I need, but somehow I have to compare the different lines to see if there are any duplicates between them. I will think about it, but if you have any thoughts, I'd be grateful to hear them. And yeah, the poem is cool. My prof has a thing for interesting cultural references.
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

What you need to do then, is make it so that where you have s = docRead.getLine(); change that to something like:



That way, each word of the entire text will be an index of tempString.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I came up with this:

james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Your way is much better...lol.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I can't keep track of the number of lines before I get the document, but I think this will work:

//etc. etc. etc...
Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

Yup that looks like it should do! Just a few small things:

james aggeles wrote:
while (docRead.hasLines() == true)


You can make this while (docRead.hasLines()) since it's already a boolean value.

james aggeles wrote:
s = s + docRead.getLine();


I'm not sure, but you may have to concatenate one space at the end so that the last word of the line and the first word of the next line don't get smushed together into one word.

Does it work?
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
I am having trouble removing the spaces. Here's what I am doing:



When I run this, I get an ArrayIndexOutOfBoundsException at the marked line. I think it's because count is not being incremented, but I am not sure how to tell the program to look for a space, or a line of spaces. Is there some kind of regex thing I could use or something? I don't want to get too complicated, but I didn't know if that might be the only way.

edit: by the way, the string s has the whole document in it, so it does work better than it did (to answer your question). Further testing proved I was right about the space not being in the array, but if it's a whole line of spaces, what do I tell the computer to look out for: "\n" ?
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Ended up using this:
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Now tempArray outputs this:

i
wonder
why
i
wonder
why
i
wonder
why
i
wonder
i
wonder
why
i
wonder
why
i
wonder
why
i
wonder
richard
feynman
why
i
wonder
richard
feynman
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null

which is headed in the right direction, but it looks like it's not comparing them to each other at some point. Here's my updated code:



This is as far as I have gotten....
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Progress! Now the program runs all the way through, but doesn't count the words properly. I am so close to this I can taste it. I will work on it and keep you posted.
james falk
Ranch Hand

Joined: Nov 02, 2012
Posts: 55
Success! The finished program looks like this:


You were such a big help, Joel. It can be overwhelming working on some of these problems alone sometimes, so thanks for lending a helping hand.
Again, Happy Thanksgiving!

Joel Christophel
Ranch Hand

Joined: Apr 20, 2011
Posts: 241
    
    1

I'm sorry I had to leave last night, but I'm glad I could be of help, and I'm glad you got it to work. Happy Thanksgiving to you too!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How do I remove duplicates from an array?