aspose file tools*
The moose likes Beginning Java and the fly likes StreamTokenizer problem again... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "StreamTokenizer problem again..." Watch "StreamTokenizer problem again..." New topic
Author

StreamTokenizer problem again...

Flora Ng
Greenhorn

Joined: Jul 05, 2001
Posts: 11
I'm stuck in this particular problem again...
I want to extract numbers from a text file or html. How can I use StreamTokenizer to filter out non-digit elements?
For example with the sentence below
I wish I can earn $1,000,000 a year!
The program will return an integer 1000000
I really don't know how to filter those commas using StreamTokenizer class...
Thanks a lot
Cindy Glass
"The Hood"
Sheriff

Joined: Sep 29, 2000
Posts: 8521
And why must you use StreamTokenizer?


"JavaRanch, where the deer and the Certified play" - David O'Meara
Flora Ng
Greenhorn

Joined: Jul 05, 2001
Posts: 11
Hello Cindy!
So do you think I should read in the text character by character and test whether it is a digit or not? Would that be slower if the file size is huge? I want the program be able to extract decimal numbers as well...
Thanks
Cindy Glass
"The Hood"
Sheriff

Joined: Sep 29, 2000
Posts: 8521
Well, that is what StreamTokenizer does in the background anyway. The problem is that you want to parse out formatting characters that are in the middle of what would normally be considered a "token". There is no good delimiter that you can define.
There are several approaches that you could consider.
1 - walk through the input character by character and pull out the good stuff. You would probably want to use StreamTokenizer first to break the input into String chunks first and then check the individual chars a token at a time.
2 - try and use StreamTokenizer and use the whiteSpaceChars() method to "blank out" all the non-numeric characters. Might work.
3 - use StringTokenizer and define the set of delimiters as all the characters that you do NOT want to allow. You would probably want to use StreamTokenizer first to break the input into String chunks first and then use StringTokenizer on those. If there is a possibility of other character sets (like Greek or whatever) this could get messy.

I would probably try number 1 first.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: StreamTokenizer problem again...