File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes HashMap value stores complete string Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "HashMap value stores complete string" Watch "HashMap value stores complete string" New topic
Author

HashMap value stores complete string

Philip Grove
Ranch Hand

Joined: Aug 18, 2009
Posts: 68

During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?

I know from my analysis that the HashMap in question takes up over 15 MB of the heap, but a similar thing happens with the key and it comes from a different string for every different value (approx. 85 different values). By my calculations it should contain less than 5 MB of data in keys and values so where does the remaining 10 MB come from?
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8404
    
  23

Philip Grove wrote:During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?

Actually, your question has little to do with HashMap and more to do with: Is the result of a substring() a reference to the same String; and the answer is: not quite.

A substring is a separate String object, but (and I'm almost certain of this, but I'm happy to be corrected if anyone knows better) it shares the character array of the original String. Thus, it will take whatever space overhead is associated with an object (≈16 bytes I think), plus internal indexes (2 or 3 ints; I forget), plus the reference of the array itself (4/8 bytes).

HIH

Winston

PS: It's also worth noting that Java characters takes two bytes, not one. Not sure if you took that into account in your calculations.


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
I think Winston got it - now I recall hitting the same problem. Here is the Java 6 substring code:



Note that the "value" here is the existing array



so the new String object keeps a reference to the big array it was derived from!
However, note that the following constructor checks for this situation and makes a new copy of the substring characters:



SO - to get rid of the reference to the big String it looks like

String s = new String( bigstring.substring(......) ) ;

should drop the old big array reference.

Bill

Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 420
    
    2

It's also worth noting that Java characters takes two bytes, not one.


Unless you use -XX:+UseCompressedStrings.


"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4659
    
    5

Unless the characters are from a language that uses 3 or 4 byte code points.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8404
    
  23

Pat Farrell wrote:Unless the characters are from a language that uses 3 or 4 byte code points.

Which doesn't alter the fact that a Java character is a 16-bit unsigned number. I notice that UseCompressedStrings (which I've never tried) is defined as a 'performance' option, but I wonder if it actually saves anything except space (one article I read suggested that it's 5-10% slower). It's also likely to make space estimation more complex for anything but pure ASCII text.

Winston
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 420
    
    2

Well I've been doing quite a lot of profiling with compressed strings and haven't noticed any difference in terms of latency from the "compression" though I could well believe there is some. My main reason for using it is we are very sensitive to GC and memory usage and that's the key performance issue according to the stats. So I think it is a performance option but you need to know what your applications string usage profile is and the configuration of your garbage collector.

I think with all performance its profile first then optimize.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HashMap value stores complete string