Meaningless Drivel is fun!*
The moose likes Performance and the fly likes Suggestions on fastest way to parse a String? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Suggestions on fastest way to parse a String?" Watch "Suggestions on fastest way to parse a String?" New topic
Author

Suggestions on fastest way to parse a String?

Ron Ditch
Ranch Hand

Joined: May 16, 2002
Posts: 33
Hello...
Does anyone have any suggestions on the fastest (and hopefully most efficient) way to parse a string?
Let's say I have a string that is comma delimited, and I wanted to convert it to a Collection. Also, the elements in the string that are comma delimited are of unequal length.
For example - item1,items22,item333,item55555
I was thinking of using an array of characters, but I don't know the speed implication of for loops versus creating sub-strings using String.substring(int,int).
Any suggestions?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Use java.util.StringTokenizer - it's optimized for exactly this type of parsing.
[ September 26, 2002: Message edited by: Ilja Preuss ]

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Blake Minghelli
Ranch Hand

Joined: Sep 13, 2002
Posts: 331
Just a warning about StringTokenizer if you have never used it before...
The default behavior ignores empty "tokens".
For example: "token1,token2,,token3"
A StringTokenizer created on that string will return 3 tokens.


Blake Minghelli<br />SCWCD<br /> <br />"I'd put a quote here but I'm a non-conformist"
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
If you really want the fastest parsing possible, you can probably improve on StringTokenizer a little bit, because StringTokenizer spends a little bit of time checking for multiple delimiters, and even checking to see if the set of delimiters has changed since the last time nextToken() was called. You can omit this for your situation, and thereby speed things up a bit, I imagine. But I doubt you'll see a big difference, so don't spend too much time on it unless you're sure performance is a real problem. I'd probably just store the input as a String, and use indexOf(',', startPos) to find delimiters, and substring(int, int) to create a String for each token. You could also store the input as a char[] array; I'm not sure if that will end up any faster or not. You'd have to try both ways and measure, I suppose.
Now in terms of development speed (rather than execution speed), the easiest solution is probably
String[] tokens = inputStr.split(",");
Try it; you may well find it's already fast enough for you. (You need to be using SDK 1.4 though.) It also fixes the annoying "feature" of StringTokenizer which Blake mentioned.


"I'm not back." - Bill Harding, Twister
Ron Ditch
Ranch Hand

Joined: May 16, 2002
Posts: 33
Thanks Jim, that's what I was looking for.
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
You should keep in mind that StringTokenizer was designed to parse Java programs. The token to split on was assumed to be a space. The reason we have the default behavior of the StringTokenizer is that multiple spaces doesn't mean anything special in java source.


Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
Yarik Chinskiy
Greenhorn

Joined: Oct 10, 2002
Posts: 11
Hi,
What if i want to parse records of a file?
whouldn't the StringTokenizer be a killer??
I want to monitor a log file and reformat the records for the output based on a pattern submitted by a user.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Tom's comment may be a bit misleading - it's possible to use StringTokenizer to parse a lot of things other than Java code. But it has a number of limitations - nowadays it's probably more powerful and flexible to learn how to parse using the classes in java.util.regex (at least, for anything more complicated than the split() method I showed above).
John Coffey
Greenhorn

Joined: Nov 11, 2002
Posts: 2
I have some sample code to test out "log" parsing. It looks like StringTokenizer isn't too good as far as performance is concerned. Using jdk 1.4.1, I got the following results:

Can anyone come up with a faster version? Is there a better IO class?
First a utility to create a big log file:

Now the Split code:

Now the StringTokenizer code:

Now the Pos code:
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Suggestions on fastest way to parse a String?
 
Similar Threads
Loading Data in Table
Using Main
String.split() question
Parse * delimited string
need help in connection performance...