File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes count instances of each word in a String Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "count instances of each word in a String" Watch "count instances of each word in a String" New topic
Author

count instances of each word in a String

Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Hi all,

I've set myself the challenge of writing a small class which will contain a method that will receive a String argument as a parameter and then count the number of times each word in the String occurs, printing these to the console.

For example, by passing in the sentence "A cat sat on a cat mat" should yield the results:

Word count
----------------
A 2
cat 2
sat 1
on 1
mat 1


My initial thoughts were to to create two arrays, one of type String which would hold each word once I had split the sentence up via String.split(" "). I would then initialise my second array which would be of type in to be the same length as that of my String[]. By create two nested for loops I could then loop through the String[] word by word incrementing each relevant cell of my int[].

This would successfully leave me with two arrays but seems a bit messy when I come to try and print out my results. I obviously don't want to print duplicate words in my output. I could loop through these arrays again and fin away to remove duplicates but at this stage I begin to think there must be a better way to approach this initially using a Set or maybe a Map.

Is anyone able to offer their advice on how I can neatly achieve this using a sensible Collection, in the meantime I'll start working on my arrays and see how far I get.

Thanks in advance!
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1753
    
    7

A java.util.Map implementation would seem like a good fit if you're looking to map a distinct word to its corresponding number of occurences in a sentence. Also, java.util.Scanner might be useful here.


Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Thanks Jelle,

I'll look into a HashMap and see if I can achieve this neater using that. Just need to look at the API and figure out how to print both the Key and Value from my map. I'd never even heard of the Scanner class so it's something I'll also try out too. I'm trying to learn by setting myself little coding challenges.

Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1753
    
    7

Learning by doing, that's really to only way to go!
You might want to check out the Cattle Drive if you find you've run out of ideas, or if you're looking for a challenge.
By the way, welcome to JavaRanch!
Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Thanks for the welcome and the advice.

I just got the HashMap working as you recommended and now just experimenting with my printResults method to create consistant spacing between my results. Still got so much to learn but really enjoying Java
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37940
    
  22
Rob McBryde wrote: . . . I just got the HashMap working . . .
Well done Please show us what you have got. Did you find the Java™ Tutorials section about collections? In the "Map" section there is an example of something similar to your problem.
Aditya Jha
Ranch Hand

Joined: Aug 25, 2003
Posts: 227

Also, you can explore related Map implementations like TreeMap and LinkedHashMap, in place of HashMap. It would be a good fun to experiment with these.
Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Thanks for all the advice and help, got plenty research to do with maps. Going to think up a few more challenges in the coming months to help develop my skills. In case anyone is interested I have included my code thus far below. I'm sure it's not the best way to resolve this so I would welcome any constructive criticism to help me learn. Just let me know if anything in my code isn't clear.



Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37940
    
  22
Pass the sentence to the constructor of the WordCount class, and have it as a field of the class.
The package name should be lower-case throughout.
You probably don't need to replace any characters in your sentence; simply pass a regular expression which matches whitespace.
It is unnecessary to have the two arrays. You can simply pass the word as a "key" to the Map, and increment the number which is the "value". D you find the Java™ Tutorials section I mentioned earlier? You can see how it is done there.
Aditya Jha
Ranch Hand

Joined: Aug 25, 2003
Posts: 227

One word of advice - It will be a little tricky to have words as keys in map, and still compare ignoring case for counting words. Try the design Campbell has suggested and compare the output with your current program's output.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37940
    
  22
Good point, Aditya Jha. Maybe using the word toLowerCase() as a key would solve that problem.
Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Thanks again for the helpful guidance. I had a quick look through the Java Map tutorials but couldn't find the example you were referring to Campbell. Haven't really had much spare time over the past few days but I'm determined to look over the tutorial as you recommended and try implementing the improvements you both kindly suggested.

The only reason I did the two arrays is I thought by putting the words straight into the map wouldn't work as the map doesn't allow for duplicate keys, didn't think of incrementing the value at that same point though.
John Laker
Greenhorn

Joined: Oct 01, 2010
Posts: 22
Rob McBryde wrote:Thanks for all the advice and help, got plenty research to do with maps. Going to think up a few more challenges in the coming months to help develop my skills. In case anyone is interested I have included my code thus far below. I'm sure it's not the best way to resolve this so I would welcome any constructive criticism to help me learn. Just let me know if anything in my code isn't clear.





In this code above, since the array size is not initialized, does the split words from string get added one by one to the array splitter and does it's size increase automatically ?

also, where you used the iterator, you could have used the for-each loop eliminating the need to define an iterator.
Rob McBryde
Greenhorn

Joined: Dec 18, 2010
Posts: 16
Hi John,

From what I can see in the API for String, the split(String regex) method returns a reference to a String array. Therefore, the split method builds up an array containing the entire sentence (delimited by spaces in this instance as that's what I passed in as my regular expression) my splitter array is then just assigned the reference to this array.

When you declare an array you must specify it's size, once intialized you cannot directly alter the length of the array. To achieve this, you would use something like an ArrayList which has a size that can be altered via the add() or remove() methods. So in answer to your question the split words from string does not get added one by one to the array splitter and it's size cannot be increased automatically.

As you have probably read, both Campbell and Aditya have advised that my arrays are surplus to requirement so I will be re-factoring those out to tackle this particular problem soon.

Thanks for your comment about replacing my Iterator with a for-each loop. I realise that this would be more efficient but in all honesty the reason I chose to use an Iterator purely because I don't have much experience with them and wanted to try it out for myself.



 
It is sorta covered in the JavaRanch Style Guide.
 
subject: count instances of each word in a String
 
Similar Threads
Array problem
An object to hold several arrays
Still Stacks
Algorithm for searching for a scrambled word in a list
Code Challenge