This week's book giveaway is in the Agile and other Processes forum.
We're giving away four copies of The Mikado Method and have Ola Ellnestam and Daniel Brolund on-line!
See this thread for details.
The moose likes Java in General and the fly likes StringTokenizer / Speed Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "StringTokenizer / Speed" Watch "StringTokenizer / Speed" New topic
Author

StringTokenizer / Speed

Greg Werner
Ranch Hand

Joined: May 07, 2009
Posts: 54
Hi all, I hope I don't cause too much trouble with this one, but here goes:

I am reading in a large collection of files (let us say ASCII .txt files for the sake of this example.) For my i/o, I am using JNI to do buffered reading in C land and get back a collection of Strings representing the lines of the files. This is sufficiently fast for my purposes, so I am satisfied with this part of my solution for now.

The question then is how to parse/process these lines in Java. Currently, too much time is being spent processing the files. I need to split/tokenize by good old ascii 0x20 (space). The number of tokens on a line is the driving force in determing how long a line takes to process (duh!)

I have tried Pattern.split, String.split, an old class on the web called SimpleTokenizer, and the "legacy" StringTokenizer class. The StringTokenizer beats all the others hands down as far as speed goes for the task I am doing. With the number of files I have to process, there is no way I am going to use split even if it is considered proper Java.

I suppose my question is does anything faster than StringTokenizer exist. Way back when (2003), SimpleTokenizer supposedly beat StringTokenizer, but now in 2010 I am not finding that to be the case.

Just to throw out a code snippet, here is what I am doing:

Karthik Shiraly
Ranch Hand

Joined: Apr 04, 2009
Posts: 364
Hi Greg,

Since your app involves both file I/O and tokenizing their contents, perhaps using only java.io.StreamTokenizer on a BufferedReader may be faster than creating lots of Strings from JNI and tokenizing them. StreamTokenizer by itself would be slower than a simple StringTokenizer, but I suggest it because file I/O is involved.

Cheers
Karthik
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: StringTokenizer / Speed
 
Similar Threads
Manipulating Vectors (code not working)
Reading multiple CSV files
collection query
NIO with large files
Reading/Writing Foreign Text