Two Laptop Bag*
The moose likes Beginning Java and the fly likes ArrayList Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "ArrayList" Watch "ArrayList" New topic
Author

ArrayList

francis joby
Greenhorn

Joined: Aug 30, 2005
Posts: 6
Dear Sir,

How to count duplicate entries in an ArrayList or Vector.

Regards joby
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
Use the java.util.Collections.frequency() method.
Juhi Gupta
Greenhorn

Joined: Oct 04, 2006
Posts: 12
Originally posted by Edwin Dalorzo:
Use the java.util.Collections.frequency() method.


Edwin,

I haven't found this method in this class..

Is this is not is JDK 1.4

You must be talking about JDK 1.5
Juhi Gupta
Greenhorn

Joined: Oct 04, 2006
Posts: 12
Originally posted by francis joby:
Dear Sir,

How to count duplicate entries in an ArrayList or Vector.

Regards joby


Infact you can check by this simple program if there is any duplicate entries in Vector etc..



Hope it helps.....

Or if you want to count duplicate entries, then you may use of Iterator.
Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Good Method Juhi,

But it's not upto expectations.

Dear francis joby,

You may use following code.



Hope it helps.
[ October 04, 2006: Message edited by: Ankur Sharma ]

The Best way to predict your future is to create it - Every great individual common man
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
You overlooked the possibitly of null items.

A JDK 1.4 implementation of the frequency() method developed by Sun in JDK 1.5 would be

Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Well Good point, I just forgot,

But I just want to bring one more thing to notice,

We may also need to override the equals method for a custom object. Otherwise it may leads wrong solution.

Am I correct.??
Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Let me show one example to prove my earlier thread.



Now this will give output 0..

So that mean you also need to override the equals method.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Well Juhi and Ankur,

Your code looks good but why you are concentrating only on String1. If list will have duplicate values other than String1 then your program will not count those..

Hope it is clear.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Try this program:

Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Well Dear Rathi,

What you expect output of your program, could you please let us know the explanations.

I think the output should be 6 but this program is giving 4 as output.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Ankur Sharma:
Well Dear Rathi,

What you expect output of your program, could you please let us know the explanations.

I think the output should be 6 but this program is giving 4 as output.


Yup. It will count the number of duplicate elements (as OP specified).

Why it should be 6??? There are only 4 duplicate elements: String1, String2, String3, String4.
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
Do you realize that you just wrote 3 methods and around 60 lines of code just to determine if there is a duplicate item in a List?

I would still stick to the Sun implementation of the frequency method. It is just 18 lines of code and it does the work.

I also liked the use of indexOf() and lastIndexOf() of a previous post. It may even be faster to determine duplicates on large lists.

It is well known that items used in Collection should implement equals() and hashCode(). This should not be issue. And on fulfilling this requirement the methods should work on any object, not just String as seem to suggest a previous post.
[ October 05, 2006: Message edited by: Edwin Dalorzo ]
Robert Hill
Ranch Hand

Joined: Feb 24, 2006
Posts: 94
Originally posted by Edwin Dalorzo:
[QB]Do you realize that you just wrote 3 methods and around 60 lines of code just to determine if there is a duplicate item in a List?

I would still stick to the Sun implementation of the frequency method. It is just 18 lines of code and it does the work.


It is called learning how to program.
Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Originally posted by Robert Hill:


It is called learning how to program.


Originally posted by Edwin Dalorzo:
Do you realize that you just wrote 3 methods and around 60 lines of code just to determine if there is a duplicate item in a List?

I would still stick to the Sun implementation of the frequency method. It is just 18 lines of code and it does the work.


Well Edwin,

As we already mentioned we are not aware of JDk1.5 so I wrote my own method and same done by Rathi, I would also like to use Sun's method Frequency, but what if We don't have it. ???
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961

As we already mentioned we are not aware of JDk1.5 so I wrote my own method and same done by Rathi, I would also like to use Sun's method Frequency, but what if We don't have it. ???


Go up in the posts, you will see that there is already one post offering a JDK1.4 implementation of the frequency algorithm. After all it is not a state secret. It is in the src.zip file that comes with the JDK.

Look, I know we programmers take a lot of pride of our code, and I apologize to all of you if somehow I offended anybody when I criticized one of the previously posted algorithms.

I am fairly new at Java, although I have accumulated good time as a programmer. Recently I read Effective Java (by Joshua Bloch) for the first time and I realized of all those algorithms I used to take pride of that were not so good at all compare to all those nifty ideas printed in the book. I and felt happy that somebody wrote that book to teach me how to do things better.

I probably couldn't tie Mr. Bloch shoes, and I may know less Java than many of you, but I do know we programmers like to complicate things more than necessary and we do love reinventing the wheel. We tend to be creative people after all. That's good. But we also have to develop the skill of simplicity and the ability to recognize that simple is better, is less cost, less money, less bugs, less maintenance, it is easy to read and easy to understand.

It simply appears, to my naked eye, that the posted algorithm was much more complicated than the ones previously posted. That�s all. My comments never had the intention to hurt any body's pride. I just want to say: why don't we try to make this simpler? What's the point in posting an algorithm that does the same than several others previously posted, but in a far more complicated way? Are we not misleading the person who posted the question?

That's something our bosses ask us as technologists every day? Why do you want to migrate to a more complicated more expensive database? Why do you want to remake a legacy system that is already working pretty well after years of IT evolution? And when they ask, we better have a good answer or we will get nothing of their money to buy those nice 21st century toys that everybody is deploying on their companies.

Again, I apologize if you found my words offensive.

[ October 05, 2006: Message edited by: Edwin Dalorzo ]
[ October 05, 2006: Message edited by: Edwin Dalorzo ]
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Also, note that all the algorithms so far discussed have O(n^2) performance, which is really unacceptable for large lists. Here's a vastly faster (and vastly simpler and vastly shorter) O(N) algorithm:



The folks who provided longer, slower versions might want to read this.


[Jess in Action][AskingGoodQuestions]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I very much like Ernest's use of the features already in the Collections. The OP wanted a count of duplicates ... I wondered if Set.addAll(someCollection) and checking the size difference wouldn't do it.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
I did the following implementation of the Ernest's algorithm:



I run it on BeanShell and with a List of 10.000 items all of them, but two, containing new String("Obi-wan") and the other two, randomly located containing the new String("Anakin") and it took an average of:

120 ms with the duplicates check
170 ms without the duplicates check.

The same algorithm using the JDK frequency method and run on BeanShell several times takes an avergage of

341 ms with the duplicates check.

Like this:



I think Ernest has a good point.
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
Originally posted by Stan James:
I very much like Ernest's use of the features already in the Collections. The OP wanted a count of duplicates ... I wondered if Set.addAll(someCollection) and checking the size difference wouldn't do it.


Unfortunately what "count" means seems ambiguous. Is it the number of duplicate entries for one item? The number of items that occur one or more times? The sum of all the entries that have a duplicate? Am I missing something here or are we tossing around solutions without any concrete requirements?
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Originally posted by Ken Blair:


Unfortunately what "count" means seems ambiguous. Is it the number of duplicate entries for one item? The number of items that occur one or more times? The sum of all the entries that have a duplicate? Am I missing something here or are we tossing around solutions without any concrete requirements?
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Originally posted by Ken Blair:


Unfortunately what "count" means seems ambiguous. Is it the number of duplicate entries for one item? The number of items that occur one or more times? The sum of all the entries that have a duplicate? Am I missing something here or are we tossing around solutions without any concrete requirements?


Yeah, since there was already some argument going on, I didn't want to muddy the waters even more; but that's why the body of my loop is a comment. Depending on what you were looking for, you could do different things.

As Stan alludes to, if the requirement is actually "if you discarded duplicates, how many items would you discard?" you could just write

int duplicateCount = theList.size() - new HashSet(theList).size();
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Requirements area PITA, no? Life would be so much easier if our customers would just accept what we give them. Or learn to disambiguate. And never change their minds.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Edwin Dalorzo:


Go up in the posts, you will see that there is already one post offering a JDK1.4 implementation of the frequency algorithm. After all it is not a state secret. It is in the src.zip file that comes with the JDK.

Look, I know we programmers take a lot of pride of our code, and I apologize to all of you if somehow I offended anybody when I criticized one of the previously posted algorithms.

I am fairly new at Java, although I have accumulated good time as a programmer. Recently I read Effective Java (by Joshua Bloch) for the first time and I realized of all those algorithms I used to take pride of that were not so good at all compare to all those nifty ideas printed in the book. I and felt happy that somebody wrote that book to teach me how to do things better.

I probably couldn't tie Mr. Bloch shoes, and I may know less Java than many of you, but I do know we programmers like to complicate things more than necessary and we do love reinventing the wheel. We tend to be creative people after all. That's good. But we also have to develop the skill of simplicity and the ability to recognize that simple is better, is less cost, less money, less bugs, less maintenance, it is easy to read and easy to understand.

It simply appears, to my naked eye, that the posted algorithm was much more complicated than the ones previously posted. That�s all. My comments never had the intention to hurt any body's pride. I just want to say: why don't we try to make this simpler? What's the point in posting an algorithm that does the same than several others previously posted, but in a far more complicated way? Are we not misleading the person who posted the question?

That's something our bosses ask us as technologists every day? Why do you want to migrate to a more complicated more expensive database? Why do you want to remake a legacy system that is already working pretty well after years of IT evolution? And when they ask, we better have a good answer or we will get nothing of their money to buy those nice 21st century toys that everybody is deploying on their companies.

Again, I apologize if you found my words offensive.

[ October 05, 2006: Message edited by: Edwin Dalorzo ]

[ October 05, 2006: Message edited by: Edwin Dalorzo ]



Not at all any hurt Edwin. In fact, I am thankful to you for pointing this.

I knew that my code is not optimized as I was not using (because not knowing) already built efficient APIs but let me check others� code�s output and efficiency (time).
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Edwin Dalorzo:
Do you realize that you just wrote 3 methods and around 60 lines of code just to determine if there is a duplicate item in a List?


Well, not exactly. My code is not just for determining 'is there any duplicate item present or not'. It actually counts the number of items which are appearing more than once.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Edwin Dalorzo:
I did the following implementation of the Ernest's algorithm:



I run it on BeanShell and with a List of 10.000 items all of them, but two, containing new String("Obi-wan") and the other two, randomly located containing the new String("Anakin") and it took an average of:

120 ms with the duplicates check
170 ms without the duplicates check.

The same algorithm using the JDK frequency method and run on BeanShell several times takes an avergage of

341 ms with the duplicates check.

Like this:



I think Ernest has a good point.


Hi Edwin,

Did you run my code? How much time it is taking?

Anyway, the following program finds the time taken for all three algorithms for the same list of 100000 items.



And the sample outputs are:



Please let me know if I am doing something wrong.

It shows that my algorithm is taking far less time than other two.

Please comments.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
One more thing is, other two algorithms are not giving the correct (?) output.

I think, OP wants to count the number of duplicate items or the number of items which are appearing more than once.

For example, list is having following items: a, b, c, a, a, b. So the output should be 2 as only two items are duplicate (a and b). But in case of other two algorithms, the output comes 3 (I don�t know by what logic).

But I might be wrong in understanding the requirement.
ankur rathi
Ranch Hand

Joined: Oct 11, 2004
Posts: 3830
Originally posted by Ernest Friedman-Hill:
Also, note that all the algorithms so far discussed have O(n^2) performance, which is really unacceptable for large lists. Here's a vastly faster (and vastly simpler and vastly shorter) O(N) algorithm:


EFH, can you please post any good link which explain how to find performance (O(n^2) and O(n) etc) for any algorithm, in a simple language.

Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Originally posted by rathi ji:


EFH, can you please post any good link which explain how to find performance (O(n^2) and O(n) etc) for any algorithm, in a simple language.



Rathi,

You must look into this Algorithm Complexity.

Hope you may find it good.
Shaan Shar
Ranch Hand

Joined: Dec 27, 2005
Posts: 1249

Originally posted by Ernest Friedman-Hill:
Also, note that all the algorithms so far discussed have O(n^2) performance, which is really unacceptable for large lists. Here's a vastly faster (and vastly simpler and vastly shorter) O(N) algorithm:



The folks who provided longer, slower versions might want to read this.


Well I like this code ...Thankx EFH,

Now I can implement this one in JDK 1.4. very easily..
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: ArrayList