aspose file tools*
The moose likes Performance and the fly likes Create padding strings with minimum memory use Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Create padding strings with minimum memory use" Watch "Create padding strings with minimum memory use" New topic
Author

Create padding strings with minimum memory use

Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
The following code is my attempt to make an array of Strings, where each element PADDING[N] is a String comprising N space characters.

Does my code result in the minimum amount of memory use? If not, can someone suggest a better way to do it? Within reason, I don't care about speed.


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
You will get a better result if you instanciate the StringBuffer at the correct size from the beginning. This way it doesn't need to grow in the loop and all your Strings are able to share the same char array.
Something similar can be done without using StringBuffer:


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
I did think of the version with the literal string of several spaces. I thought it a little unappealing because the maintainer had to make sure that they changed the size of the literal string, if they changed the integer constant.
However, perhaps a good solution would be to link the two together:
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
The second method definately uses less memory, since all the padding strings share the same backing array.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by David Weitzman:
The second method definately uses less memory, since all the padding strings share the same backing array.

In the first they do, too - as long as the initial size of the StringBuffer is high enough so that it doesn't need to grow. Take a look at the source of StringBuffer - it's really interesting!
Vinod John
Ranch Hand

Joined: Jun 23, 2003
Posts: 162
Do you guys think it is nice to use substring() for this problem ? , because substring actually creates a "new" String object and not use the existing String from the String pool.
Why not use the 1 solution replacing StringBuffer by String like here

The number of String object created are the same as it is in the second solution but we may be using the existing Strings objects in the pool ???
[ July 30, 2003: Message edited by: Vinod John ]
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Vinod John:
Do you guys think it is nice to use substring() for this problem ? , because substring actually creates a "new" String object and not use the existing String from the String pool.

I don't follow you. If we want to have 9 Strings of different lengths, we do need exactly that - 9 different Strings. If you are referring to reusing the underlying char[] (which is different from the String pool), that is what actually *is* happening when using substring().
Why not use the 1 solution replacing StringBuffer by String like here


With most compilers
passing[i-1] + EMPTY_SPACE
is identical to
new StringBuffer().append(passing[i-1]).append(EMPTY_SPACE).toString()
The call to append(String) will actually *copy* the content of a String, so that there is no reuse at all in this solution.
This is one of the cases were trying to be smart can actually make things worse...

The number of String object created are the same as it is in the second solution but we may be using the existing Strings objects in the pool ???

Again, this doesn't compute for me. Either we are creating new Strings, or we are reusing existing ones from the pool. I don't know how we could do the latter if the Strings actually need to be different.
Vinod John
Ranch Hand

Joined: Jun 23, 2003
Posts: 162
Originally posted by Ilja Preuss:

If we want to have 9 Strings of different lengths, we do need exactly that - 9 different Strings. If you are referring to reusing the underlying char[] (which is different from the String pool), that is what actually *is* happening when using substring().

Correct, your solution actually creates 9 "different" String. It dosen't reuse the the underlying character array. Each of the Strings created have a diffent character array as the substring method returns a new String(....).


With most compilers
passing[i-1] + EMPTY_SPACE
is identical to
new StringBuffer().append(passing[i-1]).append(EMPTY_SPACE).toString()
The call to append(String) will actually *copy* the content of a String, so that there is no reuse at all in this solution.This is one of the cases were trying to be smart can actually make things worse...

Actually you may be correct in case of StringBuffer, but in the case of String, if the compiler knows the String at compile time it trys to reuse the String, So if there is a possibility of the String that was created
here is going to be created (without using a new operater) in another place inside the same appliczation, the compiler try to reuse the String. So I thought it is better to reuse the String string rather than
recreating the String.
Having said that, I did some testing, I invoked each of the code piece (the 2nd solution and mine) 10 times inside the same function and assigned the result to different arrays. As expected the in the case of the 2nd solution it created 10 * n Strings (n is 9 here) and when we used + operator it created only 9 Strings (as it was reusing the String objects)
but ..... the memory consumed by the 2nd solution was less than my solution . So I call System.gc() after each iteration it actually confirmed that using + operater actually creates lot of intermediate unreference objects (though could be garbage collected) but the referenced memory is more or less constant and very low (compared to
second solution). Am I making any sense ???
BYW this is how my code looked
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

Correct, your solution actually creates 9 "different" String. It dosen't reuse the the underlying character array. Each of the Strings created have a diffent character array as the substring method returns a new String(....).

The following is the core of the implementation of String#substring in JDK 1.4.2 - the value variable is the character array. As you can see, it is actually shared between the String instances. They just use different indices into the array.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Actually you may be correct in case of StringBuffer, but in the case of String, if the compiler knows the String at compile time it trys to reuse the String, So if there is a possibility of the String that was created here is going to be created (without using a new operater) in another place inside the same appliczation, the compiler try to reuse the String. So I thought it is better to reuse the String string rather than
recreating the String.

The value of passing[i-1] is not known at compile time. Therefore the concatenation needs to be done at runtime. Most compilers implement the concatenation operator using StringBuffer#append, as far as I know. That's even the proposed solution in the Java Language Specification (though it permits other implementations).
Having said that, I did some testing, I invoked each of the code piece (the 2nd solution and mine) 10 times inside the same function and assigned the result to different arrays. As expected the in the case of the 2nd solution it created 10 * n Strings (n is 9 here) and when we used + operator it created only 9 Strings (as it was reusing the String objects)
but ..... the memory consumed by the 2nd solution was less than my solution . So I call System.gc() after each iteration it actually confirmed that using + operater actually creates lot of intermediate unreference objects (though could be garbage collected) but the referenced memory is more or less constant and very low (compared to
second solution). Am I making any sense ???

You did fudge by using String#intern in your second solution!
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[David W]: The second method definately uses less memory, since all the padding strings share the same backing array.
[Ilja]: In the first they do, too - as long as the initial size of the StringBuffer is high enough so that it doesn't need to grow. Take a look at the source of StringBuffer - it's really interesting!

Yes, it is. If the StringBuffer had used any methods capable of changing parts of the char buffer which had previously been shared with String instances (e.g. insert(), delete(), setCharAt()) then the StringBuffer would have made a copy of its char buffer and discareded the shared buffer before proceeding (ensuring that the Strings using that buffer would not be changed). However the append() method cannot alter any shared portion of the char buffer, so the StringBuffer does not bother making a copy. Which is cool. I (and presumably David) knew about the general copy-on-change behavior, butI didn't realize they made an exception for append(). Makes sense though - thanks for pointing it out, Ilja.
Vinod John - Ilja has already gone into the details, so I'll just say I agree with Ilja. The substring() method is optimized to be pretty efficient by re-using the internal char array. Occasionally this can even be a problem. I once was reading every line of a file using BufferedReader's readLine() method, and I had to parse a small portion of that line (product ID) and stor it in a HashSet, whcih grew to be pretty big. Initially I was using much more memory than expected, because each product ID had been generated as a substring() of the original line, and it was still retaining the original backing char array, with the complete contents of the line (not just the part I needed). The solution was to use new String(String) to get a new String which did not use the substring's original backing array. Which I believe is the only legitimate use for the new String(String) constructor. Other than for crafting garbage-collection examples, that is.


"I'm not back." - Bill Harding, Twister
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
In the first they do, too - as long as the initial size of the StringBuffer is high enough so that it doesn't need to grow. Take a look at the source of StringBuffer - it's really interesting!
Good catch, Ilja! The clever folks at Sun took into account the fact that append() won't change any pre-existing characters in the StringBuffer.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Jim Yingst:
I (and presumably David) knew about the general copy-on-change behavior, but I didn't realize they made an exception for append(). Makes sense though - thanks for pointing it out, Ilja.

Well, in fact I already had started writing something along the lines of the copy-on-change behaviour for this thread, when I realized that it didn't need to be the case for append. Only at this time I looked up wether the programmers at Sun did have the same insight...
Vinod John
Ranch Hand

Joined: Jun 23, 2003
Posts: 162
Originally posted by Ilja Preuss:
The following is the core of the implementation of String#substring in JDK 1.4.2 - the value variable is the character array. As you can see, it is actually shared between the String instances. They just use different indices into the array.

I overlooked the constructor substring used. I was assuming it was making a arraycopy every time . Thanks, now I can use substring with confidence
Originally posted by Ilja Preuss:

You did fudge by using String#intern in your second solution!

I add it when testing but continued with out removing (now I know why the strings where ==) ... I didn't mean to fake .....
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Vinod John:

I add it when testing but continued with out removing (now I know why the strings where ==) ... I didn't mean to fake .....

Oh, ok. Could have happened to me, too.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Create padding strings with minimum memory use