This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I'm stuck in this particular problem again... I want to extract numbers from a text file or html. How can I use StreamTokenizer to filter out non-digit elements? For example with the sentence below I wish I can earn $1,000,000 a year! The program will return an integer 1000000 I really don't know how to filter those commas using StreamTokenizer class... Thanks a lot
"JavaRanch, where the deer and the Certified play" - David O'Meara
Joined: Jul 05, 2001
Hello Cindy! So do you think I should read in the text character by character and test whether it is a digit or not? Would that be slower if the file size is huge? I want the program be able to extract decimal numbers as well... Thanks
Joined: Sep 29, 2000
Well, that is what StreamTokenizer does in the background anyway. The problem is that you want to parse out formatting characters that are in the middle of what would normally be considered a "token". There is no good delimiter that you can define. There are several approaches that you could consider. 1 - walk through the input character by character and pull out the good stuff. You would probably want to use StreamTokenizer first to break the input into String chunks first and then check the individual chars a token at a time. 2 - try and use StreamTokenizer and use the whiteSpaceChars() method to "blank out" all the non-numeric characters. Might work. 3 - use StringTokenizer and define the set of delimiters as all the characters that you do NOT want to allow. You would probably want to use StreamTokenizer first to break the input into String chunks first and then use StringTokenizer on those. If there is a possibility of other character sets (like Greek or whatever) this could get messy.