aspose file tools*
The moose likes Java in General and the fly likes Appending a substring (a part of a String) into a StringBuilder. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Appending a substring (a part of a String) into a StringBuilder." Watch "Appending a substring (a part of a String) into a StringBuilder." New topic
Author

Appending a substring (a part of a String) into a StringBuilder.

Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Hello:

Until now, when I've wanted to append a substring (a part of an existing String) into a StringBuilder, I've used the method stringBuilder.append (CharSequence text, int start, int end), believing that this way was faster than creating a substring first, and appending it into the StringBuilder later, using stringBuilder.append (string.substring (int start, int end)) .

Taking a look into the source code of String and StringBuilder, for my surprise, I've discovered that my approach may be or is indeed slower. First of all, because it copies char by char with a for loop, whereas the other method uses System.arraycopy (), which is suppossed to be faster. And second, because String.substring () does not need to copy any array internally, but re-uses the one from the original String and applies limits. I had no idea about this, but it makes sense taking into account that a String is immutable.

So, Can you confirm that the second method is better and I should use it from now?

Thank you.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19720
    
  20

Avor Nadal wrote:And second, because String.substring () does not need to copy any array internally, but re-uses the one from the original String and applies limits. I had no idea about this, but it makes sense taking into account that a String is immutable.

Actually, at least in Java 7u7, String.substring() no longer reuses the original String's char[] but instead copies it. I was surprised to find out during a debugging session that String no longer had the count and offset fields, so I looked a bit closer and saw that a lot of copying occurs now.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Rob Spoor: Wow! You've ended with "my theory" in a few minutes, he he he. That's a very useful information. Do you know the reason behind that surprising change?
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

Avor Nadal wrote:Do you know the reason behind that surprising change?


Imagine you read the entire contents of a one-megabyte file into a string. Then you take a 10-character substring out of the middle of the string, and discard the original string object. If the substring() method shares the original string's character array, that 10-character substring is preventing one million characters (two megabytes!) from being garbage collected. Hopefully this shows why sharing the array is not always -- or even usually -- such a good idea. If memory allocation is very fast -- which in Java, it is! -- then creating a new array and copying data isn't necessarily a very time consuming operation, and it leads to more efficient memory usage.


[Jess in Action][AskingGoodQuestions]
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39409
    
  28
It also shows that delving too deeply into the innards of a class and trying micro‑optimisation can be counter‑productive. It works well until somebody at the other end realisers, and alters the code, so your performance improvement vanishes!
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 867
    
    5
Campbell Ritchie wrote:... so your performance improvement vanishes!


Agreed, and also I doubt this making much difference.

Of course resources should not be intentionally wasted, but the source code should be clearly expressing the programmer's intention, and micro-optimisation points of view should not muddy it.
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 867
    
    5
Ernest Friedman-Hill wrote:Then you take a 10-character substring out of the middle of the string, and discard the original string object. If the substring() method shares the original string's character array, that 10-character substring is preventing one million characters (two megabytes!) from being garbage collected.



It was the rare case when this made sense:
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Ernest Friedman-Hill: Thank you for the explanation. It's a good reason, of course.

Campbell Ritchie: He he he. This is not the first time that you tell me that... I've to agree, obviously ;) .
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39409
    
  28
As long as I wasn’t mistaken both times
Avor Nadal
Ranch Hand

Joined: Sep 15, 2010
Posts: 105

Campbell Ritchie: No, you were right then and now again, he he. It may sound ridiculous, but comments like that one, which take me to reality when needed, does good to me. I seriously believe that I suffer from some kind of OCD, because I tend to re-check my code many times, and try to re-invent the wheel up to an absurd level, as if I needed to demonstrate myself that I can do "everything" on my own (something absurd and irrational, I know).

Fortunately (but slowly), thanks to forums like this one, I'm realizing that every level of programming corresponds to a different "size" of application and also to different matters to worry about. I'll let Java help me more. After all, even C compilers do lots of optimizations that would be illegible if they were done by hand in the programmer's code .
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Campbell Ritchie wrote:It also shows that delving too deeply into the innards of a class and trying micro‑optimisation can be counter‑productive. It works well until somebody at the other end realisers, and alters the code, so your performance improvement vanishes!


This one was a bit of an oddball though. Although the use cases were probably fairly rare, if you didn't know about this implementation detail and code for it, you could end up with a bunch of wasted memory and no clear reason why.

And, just to take the devil's advocate stance a step further, the new implementation is effectively equivalent to the previously required (in those odd cases) new String(orig.substring(...)), and if memory allocation in Java is fast enough to make this a good trade-off, it's probably fast enough to make doing it twice a good trade-off as well, so that if we continue to do new String(orig.substring(...)), we'll avoid wasting memory on old JVM's and incur 2 "cheap" allocations instead of 1 on newer JVM's, which should still give good performance.


Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
Yeah, this one is kind of perverse. To optimize properly (in the rare cases where it makes a difference, but it can be a big difference) you need to know what version of Java you're dealing with. Maybe future JIT optimizations will silently remove unnecessary new String() calls.
 
jQuery in Action, 2nd edition
 
subject: Appending a substring (a part of a String) into a StringBuilder.