Hello... Does anyone have any suggestions on the fastest (and hopefully most efficient) way to parse a string? Let's say I have a string that is comma delimited, and I wanted to convert it to a Collection. Also, the elements in the string that are comma delimited are of unequal length. For example - item1,items22,item333,item55555 I was thinking of using an array of characters, but I don't know the speed implication of for loops versus creating sub-strings using String.substring(int,int). Any suggestions?
Use java.util.StringTokenizer - it's optimized for exactly this type of parsing. [ September 26, 2002: Message edited by: Ilja Preuss ]
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Just a warning about StringTokenizer if you have never used it before... The default behavior ignores empty "tokens". For example: "token1,token2,,token3" A StringTokenizer created on that string will return 3 tokens.
Blake Minghelli<br />SCWCD<br /> <br />"I'd put a quote here but I'm a non-conformist"
If you really want the fastest parsing possible, you can probably improve on StringTokenizer a little bit, because StringTokenizer spends a little bit of time checking for multiple delimiters, and even checking to see if the set of delimiters has changed since the last time nextToken() was called. You can omit this for your situation, and thereby speed things up a bit, I imagine. But I doubt you'll see a big difference, so don't spend too much time on it unless you're sure performance is a real problem. I'd probably just store the input as a String, and use indexOf(',', startPos) to find delimiters, and substring(int, int) to create a String for each token. You could also store the input as a char array; I'm not sure if that will end up any faster or not. You'd have to try both ways and measure, I suppose. Now in terms of development speed (rather than execution speed), the easiest solution is probably String tokens = inputStr.split(","); Try it; you may well find it's already fast enough for you. (You need to be using SDK 1.4 though.) It also fixes the annoying "feature" of StringTokenizer which Blake mentioned.
You should keep in mind that StringTokenizer was designed to parse Java programs. The token to split on was assumed to be a space. The reason we have the default behavior of the StringTokenizer is that multiple spaces doesn't mean anything special in java source.
Hi, What if i want to parse records of a file? whouldn't the StringTokenizer be a killer?? I want to monitor a log file and reformat the records for the output based on a pattern submitted by a user.
Joined: Jan 30, 2000
Tom's comment may be a bit misleading - it's possible to use StringTokenizer to parse a lot of things other than Java code. But it has a number of limitations - nowadays it's probably more powerful and flexible to learn how to parse using the classes in java.util.regex (at least, for anything more complicated than the split() method I showed above).