Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HashMap value stores complete string

 
Philip Grove
Ranch Hand
Posts: 68
Firefox Browser Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?

I know from my analysis that the HashMap in question takes up over 15 MB of the heap, but a similar thing happens with the key and it comes from a different string for every different value (approx. 85 different values). By my calculations it should contain less than 5 MB of data in keys and values so where does the remaining 10 MB come from?
 
Winston Gutkowski
Bartender
Pie
Posts: 10111
56
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Philip Grove wrote:During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?

Actually, your question has little to do with HashMap and more to do with: Is the result of a substring() a reference to the same String; and the answer is: not quite.

A substring is a separate String object, but (and I'm almost certain of this, but I'm happy to be corrected if anyone knows better) it shares the character array of the original String. Thus, it will take whatever space overhead is associated with an object (≈16 bytes I think), plus internal indexes (2 or 3 ints; I forget), plus the reference of the array itself (4/8 bytes).

HIH

Winston

PS: It's also worth noting that Java characters takes two bytes, not one. Not sure if you took that into account in your calculations.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think Winston got it - now I recall hitting the same problem. Here is the Java 6 substring code:



Note that the "value" here is the existing array



so the new String object keeps a reference to the big array it was derived from!
However, note that the following constructor checks for this situation and makes a new copy of the substring characters:



SO - to get rid of the reference to the big String it looks like

String s = new String( bigstring.substring(......) ) ;

should drop the old big array reference.

Bill

 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's also worth noting that Java characters takes two bytes, not one.


Unless you use -XX:+UseCompressedStrings.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unless the characters are from a language that uses 3 or 4 byte code points.
 
Winston Gutkowski
Bartender
Pie
Posts: 10111
56
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pat Farrell wrote:Unless the characters are from a language that uses 3 or 4 byte code points.

Which doesn't alter the fact that a Java character is a 16-bit unsigned number. I notice that UseCompressedStrings (which I've never tried) is defined as a 'performance' option, but I wonder if it actually saves anything except space (one article I read suggested that it's 5-10% slower). It's also likely to make space estimation more complex for anything but pure ASCII text.

Winston
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well I've been doing quite a lot of profiling with compressed strings and haven't noticed any difference in terms of latency from the "compression" though I could well believe there is some. My main reason for using it is we are very sensitive to GC and memory usage and that's the key performance issue according to the stats. So I think it is a performance option but you need to know what your applications string usage profile is and the configuration of your garbage collector.

I think with all performance its profile first then optimize.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic