• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Most speedy String Split method possible

 
Paul Duer
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
I have a situation in which I need to split a String during a JSP. Obivously I want to get the quickest code possible since it is happening in the JSP, and I don't want alot of objects or methods since it's JSP stuff.
My problem is this. I get a long string back of sentences. Each is divided by the string "</li><br/><li>", I have already substringed off the first <li>. Anyway, I want seperate these sentences into a collection of strings that can imediately iterate over and throw each new string into a section HTML code. So basically, I take one big String in, I want a collection of Strings out that I can iterate over, taking out the HTML element dividers there from before.
So what I am asking, is what is the fastest, most basic way to do this? That doesn't waste resources using any special objects or such?
 
Joe Ess
Bartender
Posts: 9298
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
String.split() (java 1.4.0+)
 
Paul Duer
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry forgot to mention, JDK 1.3
 
Joe Ess
Bartender
Posts: 9298
10
Linux Mac OS X Windows
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
StringTokenizer wouldn't work here, because it only uses single characters as boundaries.
What is so special about your JSP environment??? I would probably just implement my own algorithm, basically using String::indexOf and String::subString.
 
Paul Duer
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well nothing is really special about it. I just meant that I don't have the luxury of doing it in a class somewhere in the controller or something. So I wanted to minimize the amount of code neccesary to insert into the JSP.
The 2 ways I came up with were first to use .indexOf and .Substring like you mentioned and second to make it a Char array and look for the first "<" character of the section I wanted to tokenize on.
I just want to make sure it's quick, here's my code as it is now:
Vector tokenList = new Vector(); // Holds the parsed features
String features = item.getDescription().getAuxDescription1().substring(4); // Remove first li tag
String token = "</li><br><li>"; // represents the token we wish to remove
int begin = 0;
int end = 0;
while (begin < features.length())
{
end = features.indexOf(token, begin);
if (end == -1)
end = features.length();
tokenList.add (features.substring(begin, end));
begin = end + 13;
}
Iterator tokenIter = tokenList.iterator();
while (tokenIter.hasNext()) {
 
Pankaj Kr
Author
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use to get the characters into a char[] and then have your favorite alorithm demarcate the required character sequences and convert them back into String objects.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not at all convinced that getChars() is a good idea here. The String class is already manipulating a char[] internally, and for seaching for a simple sequence of chars you're not going to do much better than indexOf(). (Well, there may be some ways to optimize this with a more complex search algorithm, but for a relatively short target string you're not going to see much benefit IMO.) More importantly, because a char[] is a mutable object, each time you create a String from a portion of this array, the String constructor willl have to make a new private char[] array of the contents. In contrast, by keeping all the data in the original String, you can create a supstring() which returns a new String object which actually shares the same char[] buffer as the parent string - it just looks at a subset of that char[]. The String class is able to create a substring faster with its own methods than you can with a separate method, because String has access to its own private data and can hand it off to another String, knowing that the other String is also immutable and therefore won't corrupt the shared data.
Paul, your current method looks pretty good; I'd stick with this basic algorithm unless you want to get a lot more complex for minimal benefit. A few comments though. You'll probably need to add somthing after the first while() loop to catch the final element of the list, since it probably terminates with </li> rather than </li><br><li>. You could also modify the loop structure to sometheing like

Consider also - is there a reason you need a List (Vector in your case, ugh) if you're just going to iterator through it immediately after creating it, and then discard it? You could just omit creating and loading the List and move whatever processing you're doing directly into the loop. (The doSomethingWith() method above. Or maybe you really do need to save the List for some reason, that's entirely possible. Make it an ArrayList though; Vector is a dinosaur that's best ignored nowadays.
Note that if your list ever uses <LI> rather than <li>, you'll need to do something more complex - a single indexOf() can't check for both upper and lowe case. Though you can make use of toLowerCase() to remove this possibility before the indexOf().
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic