aspose file tools*
The moose likes Java in General and the fly likes Most speedy String Split method possible Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Most speedy String Split method possible" Watch "Most speedy String Split method possible" New topic
Author

Most speedy String Split method possible

Paul Duer
Ranch Hand

Joined: Oct 10, 2002
Posts: 98
Hello all,
I have a situation in which I need to split a String during a JSP. Obivously I want to get the quickest code possible since it is happening in the JSP, and I don't want alot of objects or methods since it's JSP stuff.
My problem is this. I get a long string back of sentences. Each is divided by the string "</li><br/><li>", I have already substringed off the first <li>. Anyway, I want seperate these sentences into a collection of strings that can imediately iterate over and throw each new string into a section HTML code. So basically, I take one big String in, I want a collection of Strings out that I can iterate over, taking out the HTML element dividers there from before.
So what I am asking, is what is the fastest, most basic way to do this? That doesn't waste resources using any special objects or such?
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8867
    
    8

String.split() (java 1.4.0+)


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Paul Duer
Ranch Hand

Joined: Oct 10, 2002
Posts: 98
Sorry forgot to mention, JDK 1.3
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8867
    
    8

java.util.StringTokenizer it is.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
StringTokenizer wouldn't work here, because it only uses single characters as boundaries.
What is so special about your JSP environment??? I would probably just implement my own algorithm, basically using String::indexOf and String::subString.


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Paul Duer
Ranch Hand

Joined: Oct 10, 2002
Posts: 98
Well nothing is really special about it. I just meant that I don't have the luxury of doing it in a class somewhere in the controller or something. So I wanted to minimize the amount of code neccesary to insert into the JSP.
The 2 ways I came up with were first to use .indexOf and .Substring like you mentioned and second to make it a Char array and look for the first "<" character of the section I wanted to tokenize on.
I just want to make sure it's quick, here's my code as it is now:
Vector tokenList = new Vector(); // Holds the parsed features
String features = item.getDescription().getAuxDescription1().substring(4); // Remove first li tag
String token = "</li><br><li>"; // represents the token we wish to remove
int begin = 0;
int end = 0;
while (begin < features.length())
{
end = features.indexOf(token, begin);
if (end == -1)
end = features.length();
tokenList.add (features.substring(begin, end));
begin = end + 13;
}
Iterator tokenIter = tokenList.iterator();
while (tokenIter.hasNext()) {
Pankaj Kr
Author
Ranch Hand

Joined: Sep 09, 2003
Posts: 80
You could use to get the characters into a char[] and then have your favorite alorithm demarcate the required character sequences and convert them back into String objects.


Pankaj Kumar
Home - WebLog - J2EE Security
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I'm not at all convinced that getChars() is a good idea here. The String class is already manipulating a char[] internally, and for seaching for a simple sequence of chars you're not going to do much better than indexOf(). (Well, there may be some ways to optimize this with a more complex search algorithm, but for a relatively short target string you're not going to see much benefit IMO.) More importantly, because a char[] is a mutable object, each time you create a String from a portion of this array, the String constructor willl have to make a new private char[] array of the contents. In contrast, by keeping all the data in the original String, you can create a supstring() which returns a new String object which actually shares the same char[] buffer as the parent string - it just looks at a subset of that char[]. The String class is able to create a substring faster with its own methods than you can with a separate method, because String has access to its own private data and can hand it off to another String, knowing that the other String is also immutable and therefore won't corrupt the shared data.
Paul, your current method looks pretty good; I'd stick with this basic algorithm unless you want to get a lot more complex for minimal benefit. A few comments though. You'll probably need to add somthing after the first while() loop to catch the final element of the list, since it probably terminates with </li> rather than </li><br><li>. You could also modify the loop structure to sometheing like

Consider also - is there a reason you need a List (Vector in your case, ugh) if you're just going to iterator through it immediately after creating it, and then discard it? You could just omit creating and loading the List and move whatever processing you're doing directly into the loop. (The doSomethingWith() method above. Or maybe you really do need to save the List for some reason, that's entirely possible. Make it an ArrayList though; Vector is a dinosaur that's best ignored nowadays.
Note that if your list ever uses <LI> rather than <li>, you'll need to do something more complex - a single indexOf() can't check for both upper and lowe case. Though you can make use of toLowerCase() to remove this possibility before the indexOf().


"I'm not back." - Bill Harding, Twister
 
wood burning stoves
 
subject: Most speedy String Split method possible