aspose file tools
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Does String pool really exist? Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Reply Bookmark "Does String pool really exist?" Watch "Does String pool really exist?" New topic
Author

Does String pool really exist?

Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
Does String pool really exist?

I had a discussion the other day with one of our most respectable forum members regarding whether the string pool existed or not and whether String literals where garbage collected or not. And since then I have seen this question posted over and over again in the forum.

Against the opinion of this experimented member I am going to post here my documented research which I hope clarifies this matters.

What is the constant pool?

First of all you can see the Java 2 SDK Help Documentation mentions the String pool as something real and not just as a explanatory metaphor.

This quote is taken from the intern method of String.

A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.


Now the The Java Virtual Machine Specificacion 2nd Edition (JVMS 3.5.5) says:


A runtime constant pool is a per-class or per-interface runtime representation of the constant_pool table in a class file (4.4). It contains several kind of constants ranging from numeric literals known at compile time to method and field references that must be resolved at runtime [...] The runtime constant pool for a class or interface is constructed when the class or interface is created by the Java Virtual Machine...



Regarding the structure of the constant pool The Java Virtual Machine Specificacion 2nd Edition (JVMS 5.1) says:


  • A string literal (2.3) is derived from CONSTANT_String_info structure (4.4.3) in the binary representacion of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode characters constituting the string literal.
  • [list] The Java programming language requires identical string literals (that is literals that contain the same sequence of characters) must refer to the same instance of class String. In addition, if the method intern is called o any string, the result is a reference to same class instance that would be returned if the string appeared as a literal. Thus, must have value true.[/list]
  • To derive a string literal, the Java Virual Machine examines the sequence of characters given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to the same instance of class String.
  • Otherwise, a new instance of class String is created containing the sequence of Unicode characters given by CONSTANT_String_info structure; that class is the result of string literal derivation. Finally the intern method of the new String instance is invoked.



  • Now Are string literals subject to garbage collection?

    Well, the only way to destroy the constant_pool is that the string class be unloaded. That way Strings referenced by the constant_pool structure would be garbage collected.

    Now then When are classes unloaded?

    Firs of all, it is necesary to set clear that there are two types of class loaders. The JVMS 5.3 says:


    There are two types of class loaders: user-defined class loaders and the bootstrap class loader supplied by the Java Virtual Machine...


    And then it says (JVMS 2.17.8)

    A class or interface may be unloaded if and oly if its class loader is unreachable. The bootstrap class loader is always reachable; as a result; system classes may never be unloaded.


    Conclusions
  • String pool does exists according to the JVMS because it does have a structure to hold them.
  • The String literals loaded by the bootstrap class loader cannot be garbage collected, because the bootstrap class loader is always rechable, so string literals loaded by the bootstrap class loader are always reachable.
  • A user define class loader could load a new String class, literals loaded by a user-defined class loader can be garbage collected once the class loader is unreachable, as our respectable forum member showed in our last discusion regarding this subject (Last discussion)


  • I just hope not to be wrong with my conclussions. I would appreciate your well documented feedback and opinions.

    Regards,
    Edwin Dalorzo.
    [ May 17, 2005: Message edited by: Edwin Dalorzo ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18670
    [Edwin]: Well, the only way to destroy the constant_pool is that the string class be unloaded. That way Strings referenced by the constant_pool structure would be garbage collected.

    There's more than one constant_pool and more than one runtime constant pool, and there's also a pool of interned strings. I think you're confusing all three. Each class has a constant_pool structure in its class file, and (once loaded) a corresponding runtime constant pool in memory. But those aren't the same as the pool referenced in the API for intern(), which is the same one referenced in JLS2 3.10.5. All these are real, but they have separate and distinct roles and responsibilities. There's just one intern pool, and it's apparently implemented using soft references (or a native-code equivalent). There are also constant pools for each loaded class, which seem to be implemented with hard references. Soft references won't prevent GC - but hard references will. For a string which originally came from a literal, the only way to make it eligible for GC is if (a) the class which declared the literal (not necessarily the String class) is unloaded, and (b) any other hard references to the same string are removed. (A) in turn also requires that the class loader that loaded that class be made eligigle for GC.

    This is not too different from what you describe. The most important difference is that any class can declare a string literal, and has a constant pool of its own which will prevent GC of the associated String object unless an untile the class itself is eligible for GC. It's not just the String class that this applies to.

    The String literals loaded by the bootstrap class loader cannot be garbage collected, because the bootstrap class loader is always rechable, so string literals loaded by the bootstrap class loader are always reachable.

    True. (Though since not all literals are loaded by the bootstrap loader, this is of limited relevance.)

    A user define class loader could load a new String class, literals loaded by a user-defined class loader can be garbage collected once the class loader is unreachable, as our respectable forum member showed in our last discusion regarding this subject (Last discussion)

    Sorta true. Better if you replace "String class" with "class (of any type)", as per the above discussion.


    "I'm not back." - Bill Harding, Twister
    Edwin Dalorzo
    Ranch Hand

    Joined: Dec 31, 2004
    Posts: 961
    I get it, Jim.

    I might have confused a few topics in my writing. I understand there are many different types of constant pools. But tell me what you think of this argument.


    When the class Pochita is loaded a Constant_String_info structure is created associated to it. Then according to JVMS 5.1 if the string �Code Master� is not yet interned, then it is first interned into the String class and then that same reference is stored into the Constant_String_info structure of class Pochita.

    Now I was believing that the interned strings were hold into the Constant_String_info of the String class, and therefore both the class Pochita and the String class would hold references to the same string object. Now if the Pochita class was loaded by the bootstrap class loader, that would mean that the �Code Master� string could not be garbage collected (unless of course the class have been loaded by user-defined class loader and that class loader were garbage collected).

    However you mentioned that interned strings are apparently implemented using soft references (or a native-code equivalent). And I was just wondering were can I get more information about it. Would you please post a reference to your source of information?

    Thanks for your reply, Jim. And will appreciate your insight!
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18670
    Hi, Edwin!

    [Edwin]: When the class Pochita is loaded a Constant_String_info structure is created associated to it.

    Technically CONSTANT_String_info is a format for a part of the class file, which is created at compile time and loaded at run time. There is a corresponding in-memory structure, a runtime constant pool, which is created when a class is loaded. (One such pool per loaded class.) From here on, I will assume that all references to CONSTANT_String_info are actually to the corresponding runtime constant pool.

    Then according to JVMS 5.1 if the string �Code Master� is not yet interned, then it is first interned into the String class and then that same reference is stored into the Constant_String_info structure of class Pochita.

    Yes. That is, the String object itself will exist on the heap; one reference will exist in the intern pool, and one in the runtime constant pool of class Pochita. The API makes it sound like the intern pool is somewhere within[i] the String class - but it looks like it's implemented in native code, which could reside anywhere I suppose. So I wouldn't normally say that anything is interned "into the String class" because I'm not sure where it is really. Maybe it doesn't matter, since apparently we can never unload the String class. Wherever the intern pool is, it's in one place, and as far as I can tell it can never be unloaded. (Although individual string references may be removed from the pool by garbage collection.)

    Now I was believing that the interned strings were hold into the Constant_String_info of the String class, and therefore both the class Pochita and the String class would hold references to the same string object. Now if the Pochita class was loaded by the bootstrap class loader, that would mean that the �Code Master� string could not be garbage collected (unless of course the class have been loaded by user-defined class loader and that class loader were garbage collected).

    An interned string is not referenced in the rutime constant pool of the String class. If it came from a string literal, then it [i]is
    referenced in the runtime constant pool of the class which declared the literal. And it's also referenced in the intern pool. But that's different from the runtime constant pool of class String.

    The latter part of your paragraph is correct - if Pochita was loaded by the bootstrap loader, then the "Code Master" object will never be eligible for garbage collection (as far as I know). That's because of the runtime constant pool in Pochita, not the runtime constant pool in String.

    However you mentioned that interned strings are apparently implemented using soft references (or a native-code equivalent). And I was just wondering were can I get more information about it. Would you please post a reference to your source of information?

    I don't know of an authoritative source for this in the documentation - the info was determined experimentally. Note that the JLS and API never (as far as I know) clearly specify whether the references are hard references or soft references. We've seen that if a class is unloaded and then reloaded, it's possible for a string literal from that class to refer to a different instance the second time than the first time. From the description we've been given of the way the intern pool works, this is only possible if the intern pool had lost the first reference sometime after the class was unloaded and before it was reloaded. That would imply that the reference held by the intern pool is not able (by itself) to prevent garbage collection. Thus, it's a soft reference, or some sort of native code thingie which is like a soft reference.

    However, it seems that garbage collection of an interned string can only occur if the class which declared the string is unloaded. This implies that the runtime constant pool for that class has a hard reference to the String. As long as the class remains loaded, the hard reference in the runtime constant pool prevents collection. Once the declaring class is unloaded, the remaining soft reference in the intern pool is insufficient to prevent collection.
    Edwin Dalorzo
    Ranch Hand

    Joined: Dec 31, 2004
    Posts: 961
    Thanks for your excellent reply, Jim

    I share most of what you wrote. However I still feel there is not much information, at least not rock solid data about what the String intern method really does.

    I will do some more research regarding this aspect of String class. Thanks for your feedback. If I get to know something else I will let you know. Hope you do the same.

    The discussion was very interesting. Thanks again.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Does String pool really exist?
     
    Similar Threads
    String Literals and Garbage Collection.
    String class issue in JSP
    String literals are String objects!
    valueOf
    Garbage collector