File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes StringTokenizer / Speed Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "StringTokenizer / Speed" Watch "StringTokenizer / Speed" New topic

StringTokenizer / Speed

Greg Werner
Ranch Hand

Joined: May 07, 2009
Posts: 54
Hi all, I hope I don't cause too much trouble with this one, but here goes:

I am reading in a large collection of files (let us say ASCII .txt files for the sake of this example.) For my i/o, I am using JNI to do buffered reading in C land and get back a collection of Strings representing the lines of the files. This is sufficiently fast for my purposes, so I am satisfied with this part of my solution for now.

The question then is how to parse/process these lines in Java. Currently, too much time is being spent processing the files. I need to split/tokenize by good old ascii 0x20 (space). The number of tokens on a line is the driving force in determing how long a line takes to process (duh!)

I have tried Pattern.split, String.split, an old class on the web called SimpleTokenizer, and the "legacy" StringTokenizer class. The StringTokenizer beats all the others hands down as far as speed goes for the task I am doing. With the number of files I have to process, there is no way I am going to use split even if it is considered proper Java.

I suppose my question is does anything faster than StringTokenizer exist. Way back when (2003), SimpleTokenizer supposedly beat StringTokenizer, but now in 2010 I am not finding that to be the case.

Just to throw out a code snippet, here is what I am doing:

Karthik Shiraly

Joined: Apr 04, 2009
Posts: 874

Hi Greg,

Since your app involves both file I/O and tokenizing their contents, perhaps using only on a BufferedReader may be faster than creating lots of Strings from JNI and tokenizing them. StreamTokenizer by itself would be slower than a simple StringTokenizer, but I suggest it because file I/O is involved.

I agree. Here's the link:
subject: StringTokenizer / Speed
jQuery in Action, 3rd edition