aspose file tools*
The moose likes Java in General and the fly likes Writing a faster String split. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Writing a faster String split. " Watch "Writing a faster String split. " New topic
Author

Writing a faster String split.

David Phluphy
Greenhorn

Joined: Sep 09, 2010
Posts: 25
I wrote a simple split method to save time.





It's much faster than .split and Tokenizer, but I doubt it's optimal.. Either way, I had what I thought was a good idea..
In stead of using a temp array and copying every String into a result array, I wanted to store the size of the array in the last cell, like this:



and just return the temp array.. when iterating over the array later thought I could use



and just ignore the empty cells.

However, this doesn't work for some reason... Am I missing something really obvious here?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Hmmm... I don't see the problem offhand, but you haven't shown the complete current code, right? You've showed an earlier version of the code and then show and described some changes. Let's see what you've actually got now.

Also, in what way does it "not work"? Throw an exception? Do nothing? Give results that are not what you expect? Can you show us a sample of how you call it, what result you expect, and what result you get?

This smells strongly of premature optimization. It's possible you might really, really need the extra speed. But I think it's pretty unlikely in most cases. It may be a fun programming challenge to get this as fast as you can, as a learning experience. But if it's for work, there are probably more productive ways to spend your time.

I think the biggest concern, however, is that the idea of storing the length as a string at the end of the array just seems very error prone. Users (other programmers) will not in general expect that sort of thing unless they've carefully read your documentation - and frankly many people won't read it until *after* it's blown up in their faces. If you really want to do something like this, just to avoid a single array copy, I might suggest creating a new class to wrap the results for the user, and make it easier for them to use the results. Something like this:

Actually as I think about it, you can just use existing utilities to create a List like this:
David Phluphy
Greenhorn

Joined: Sep 09, 2010
Posts: 25
Mike Simmons wrote:Hmmm... I don't see the problem offhand, but you haven't shown the complete current code, right? You've showed an earlier version of the code and then show and described some changes. Let's see what you've actually got now.

Also, in what way does it "not work"? Throw an exception? Do nothing? Give results that are not what you expect? Can you show us a sample of how you call it, what result you expect, and what result you get?

This smells strongly of premature optimization. It's possible you might really, really need the extra speed. But I think it's pretty unlikely in most cases. It may be a fun programming challenge to get this as fast as you can, as a learning experience. But if it's for work, there are probably more productive ways to spend your time.


Thanks for your reply. It is a fun challenge, nothing more
I found the error. If I wanted to split a string with single letters, I was overwriting the last letter. I changed

String[] temp = new String[line.length()/2]; to String[] temp = new String[(line.length()/2)+1]; , and now it works.

It was actually a little slower than my original code though
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36478
    
  16
That is completely different from the java.lang.String.split(java.lang.String) method. You are splitting on a single character, whereas the built-in method uses a regular expression to split on. It obviously takes much longer to match a regular expression than find a single character.

I think this is a more difficult question than we usually get for "beginning", so shall move this thread.
David Phluphy
Greenhorn

Joined: Sep 09, 2010
Posts: 25
Sorry about that, as you know I'm fairly new here, so I don't really know what goes where yet

I know it's different from .split, which is why I wrote it. Most problems I've encountered require that I split by single characters, so this method is usually all I need.

I tried skipping the split altogether and parsing the string directly, but I couldn't notice any difference in performance.. I guess I can't do it much faster than this? ^^
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24166
    
  30

The storing-the-length-at-the-end bit seems unnecessary and as Mike said, error-prone. I would think the most performant thing to do would be to return the String array with possibly some extra nulls at the end, and the user is just supposed to check for them; i.e.

>


[Jess in Action][AskingGoodQuestions]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Writing a faster String split.
 
Similar Threads
creating a new object in an ArrayList
constructor to add to an array list
HELP!
Question on string array values
String to integer conversion