Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

StringTokenizer / Speed

 
Greg Werner
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all, I hope I don't cause too much trouble with this one, but here goes:

I am reading in a large collection of files (let us say ASCII .txt files for the sake of this example.) For my i/o, I am using JNI to do buffered reading in C land and get back a collection of Strings representing the lines of the files. This is sufficiently fast for my purposes, so I am satisfied with this part of my solution for now.

The question then is how to parse/process these lines in Java. Currently, too much time is being spent processing the files. I need to split/tokenize by good old ascii 0x20 (space). The number of tokens on a line is the driving force in determing how long a line takes to process (duh!)

I have tried Pattern.split, String.split, an old class on the web called SimpleTokenizer, and the "legacy" StringTokenizer class. The StringTokenizer beats all the others hands down as far as speed goes for the task I am doing. With the number of files I have to process, there is no way I am going to use split even if it is considered proper Java.

I suppose my question is does anything faster than StringTokenizer exist. Way back when (2003), SimpleTokenizer supposedly beat StringTokenizer, but now in 2010 I am not finding that to be the case.

Just to throw out a code snippet, here is what I am doing:

 
Karthik Shiraly
Bartender
Posts: 1203
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Greg,

Since your app involves both file I/O and tokenizing their contents, perhaps using only java.io.StreamTokenizer on a BufferedReader may be faster than creating lots of Strings from JNI and tokenizing them. StreamTokenizer by itself would be slower than a simple StringTokenizer, but I suggest it because file I/O is involved.

Cheers
Karthik
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic