Hi all. I want to create a small application that takes a file as a parameter and performs the following tasks : counts the number of characters. counts the number of white spaces. counts the number of lines. counts the number of words. search for a specific word. but the problem is that I am't sure about these algorithms. so would you mind giving me some tips to create these methods ? (like how could I know that this line has ended and how to know that the word has ended) here is some code :
any corrections about the previous code ? I think there is some thing wrong with counting the spaces and chars, what do you think ? thanks alot.
Thanks. I want to count the english words. but I am confused, in order to count spaces, should I use : isWhitespace( ) or is SpaceChar( ) ?? which method in Character class counts the words ?? thanks again..
I'm dyin to write some hints on word counting but making myself wait until you try it first. You get all the fun!
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Aug 07, 2003
Originally posted by John Todd: in order to count spaces, should I use : isWhitespace( ) or is SpaceChar( ) ??
What does the program specification say you should count: spaces or whitespace? Read the JavaDocs for both methods and see which matches the spec and use it.
which method in Character class counts the words ??
How could a Character be able to tell you anything about words (other than single-letter words like "I" and "a")? Think about how you detect English words (try defining one for starters) in the context of a text file. Then translate that into a sequence of logic steps and finally code.
May I ask you what is the difference between the space char and white space ? I'm confused about them.
Joined: Aug 07, 2003
Originally posted by John Todd: May I ask you what is the difference between the space char and white space ?
Every system may vary. Typically, whitespace is considered to include space " ", horizontal tab "\t" and newline "\n". However, the JavaDoc for Character.isWhitespace(char) says
A character is considered to be a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space separator (category "Zs"), but is not a no-break space (\u00A0 or \uFEFF).
It is a Unicode line separator (category "Zl").
It is a Unicode paragraph separator (category "Zp").
It is \u0009, HORIZONTAL TABULATION.
It is \u000A, LINE FEED.
It is \u000B, VERTICAL TABULATION.
It is \u000C, FORM FEED.
It is \u000D, CARRIAGE RETURN.
It is \u001C, FILE SEPARATOR.
It is \u001D, GROUP SEPARATOR.
It is \u001E, RECORD SEPARATOR.
It is \u001F, UNIT SEPARATOR.
What that tells me is that you should never read JavaDocs before coffee. No wait, what that tells me is that the three I mentioned above are included in that much wider definition. But I would bet you that when your program is tested, only the three I mentioned will be considered (maybe carriage return if tested on a Mac). Regardless, using Character.isWhitespace(char) will count them all correctly.
The real trick is how to define and detect an "English word." How many words do the following sentences have?
This sentence has 5 or 7 words
Is punctuation part of the words preceding it or a separate word?
Perhaps you will only get letters and whitespace so it is easy
Yahoooooooooooooooooooooo I found it, I found how to count the words. but I have a question : look at this code please :
I am increasing the number of characters every time I encounter a unicode space or a Java space, which ofcourse will cause the sum to produce a wrong result. if I want to count the number of charcs in a file, which method should I use : isWhitespace or isSpaceChar ? note : what is the difference between the char and the letter ? thanks ranchers.
Joined: Aug 07, 2003
Think of it like this. For every character in the file, you want to take a set of actions.
Increment the total character counter.
If it's a whitespace character, increment the whitespace counter.
If it's a letter character, increment the letter counter.
Note where the various counter increment steps take place. This will solve your problem with over-counting total characters. In fact, Stan pointed this out in an earlier reply. [ November 08, 2004: Message edited by: David Harkness ]
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com