File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Tokenize numbers only, skip words

 
Jerri Loh
Ranch Hand
Posts: 31
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there, does anybody know how should I tokenize only the numbers from the textfile and skip the words?

I have two approaches:

the first is using a token type switch case..

something like:

while(s.nextToken() != StringTokenizer.TT_EOF) //30 July 2010
{
switch (ttype) {
case StringTokenizer.TT_WORD:
System.out.println("Header and Title ignored");
//do not want it to be returned
s.skip();// don't know how to go about here..
break;
case StringTokenizer.TT_NUMBER:
return;//not sure about this one
break;
}
}

or the whitespaceChar method in the StreamTokenizer class.. something like:

public void whitespaceChar(int low, int high)


I am not sure how to go about. anybody able to help me out here??
 
Jim Size
Greenhorn
Posts: 29
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hello there, i hope that i understand your problem. You need to acquire from a sentence the numbers only by using the Tokenizer.

so a piece of the code i believe its correct:


StringTokenizer st = new StringTokenizer("This is 12 an example"); //adding a random number into the String sentence.

while ( st.hasMoreTokens() ){

String s = st.nextToken();
int i = Integer.parseInt(s);
System.out.println(i);} // end while end example code

//this is my first time i reply into a problem and i don't know how to use the "code" thing, sorry guys.

so if - s - is an integer, - i - takes its value and then it gets printed out (you can do whatever you want with it )
i didn't compile it into eclipse or bluej because i am pretty sure it works.
you have to import the packages again
sorry for my bad english
 
Jerri Loh
Ranch Hand
Posts: 31
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi. i actually solved it.



 
Jim Size
Greenhorn
Posts: 29
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
welldone for solving your problem!
maybe i didn't get it.
your program tokenizes everything but as i can see from the code you take the info on 7 and 8 position. I mean the 7th and 8th word in the txt file.
And you adding it into another doc.
I thought that you need only the numbers from a text, not specific the 7th and 8th words of it.


 
Jerri Loh
Ranch Hand
Posts: 31
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But it wasnt entirely me. Someone from Daniweb, Tong1, helped me with a fragment.Thank you J Sizeas. I really appreciate your help. See you around.
 
Satya Maheshwari
Ranch Hand
Posts: 368
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could also use pattern matching to find this pattern : \\s\\d\\d*\\s i.e. a digit followed by any number of digits with white space on either side. You can modify a bit per your requirement.
 
Rob Spoor
Sheriff
Pie
Posts: 20376
44
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Of course \\d\\d* is equivalent to \\d+. Both mean 1 or more digits. And \\s may be too restrictive. How about dots, commas, other characters?
 
Jerri Loh
Ranch Hand
Posts: 31
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I could not have used pattern matching because each line of the pdb text file was in a different format and contained unnecessary information as well.
 
Shanky Sohar
Ranch Hand
Posts: 1051
Eclipse IDE Firefox Browser
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
using regex or scanner may also be a solution when you want to do some tokenizing.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic