aspose file tools*
The moose likes Java in General and the fly likes How many objects can you create in JVM with very large RAM. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How many objects can you create in JVM with very large RAM." Watch "How many objects can you create in JVM with very large RAM." New topic
Author

How many objects can you create in JVM with very large RAM.

Ravi Gupt
Greenhorn

Joined: Oct 16, 2007
Posts: 17
Hello Guys,

is there a limit on how many objects can be created on the heap? Assume heap size is set to very large (more than 2^31)

If there is no limit then how we get unique hashcode for each object. Remember hashcode from Object class returns an int.

Consider a hypothetical powerful computer with RAM size extremely large. If there is a limit then, Wil same limit holds true for such computer as well ?

Regards,
Ravi
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39409
    
  28
I think the default size for the heap is 25% of available RAM, but I am not certain.
You don’t have to return unique hash codes from different objects; indeed in some circumstances (equals() returns true) you must return the same hash code, so number of hash codes does not restrict the possible number of objects. So I think there is no rule restricting the number of possible objects, if you can make the heap memory large enough to fit them all in.
Ravi Gupt
Greenhorn

Joined: Oct 16, 2007
Posts: 17
Hi Campbell,

thanks for the response. But i think you didn't get the question right.

Assume a class which does not override hashcode method (from Object class). Now if you create several objects out of this class, each would have a unique hash code value. (this is calculated by jvm using it's memory address etc.). So basically each object will have a unique hash code value in case the programmer does not chose to override hashcode method in that class.

Now assume you have virtually unlimited memory(heap size) at your disposal and you create zillion such objects, so they have to share some hash code values.

Following is quote from Java docs of Object class:
" As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.) "


I want to understand the meaning of text highlighted in red. Which is that impractical scenario which forces jvm to return non-unique hash code values for objects.

Regards,
Ravi
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39409
    
  28
You should try to give different hash codes to different objects. Should you have 4294967297 objects, and there are 4294967296 potential different hash codes, you are absolutely guaranteed to have a duplication. What it means is that you must try to have different hash codes. If among those 4294967297 objects, you have 4294967294 different hash codes, you are doing quite well. Only 3 collisions. If, however, you have 12 hash codes among 4294967297 objects, you are doing really badly.
I have written about how to find out about hash code methods before: look here, and in the link I gave to Bloch’s book.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

Ravi Gupt wrote:I want to understand the meaning of text highlighted in red.


It looks perfectly understandable to me. However it appears you're going beyond that, and you want to know why the original designers considered some undescribed actions to be reasonably practical and other undescribed actions to be unreasonable or impractical.

This seems rather pointless to me, especially since it's impossible to answer the question. But since it isn't absolutely necessary for unequal objects to have different hashcodes anyway, it's perfectly reasonable for a class's hashcode method to produce equal hashcodes for unequal objects sometimes. And the documentation says that might happen. Which is perfectly reasonable.
Ravi Gupt
Greenhorn

Joined: Oct 16, 2007
Posts: 17
Hi Paul,

thanks for reply.

As you said, "it isn't absolutely necessary for unequal objects to have different hashcodes anyway". I want to know in which all cases different objects would have same hash code value. What are those "sometimes" cases when that happens.

I understand one has to dig into the algorithm that Designers used for implementing hashcode for which memory address is one of the several parameters. We are just guessing several possibilities here.

one possible case is - when jvm runs out of available unique hash code values.

Is there any other case ?

Thanks
Ravi
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

I still don't understand what value you expect to get from examining the details of a decision from 15 years ago along with documentation which was clearly put there just to allow programmers to do something reasonable, rather than to document specific features.

And your approach of guessing what those details might have been is useless, if your goal is to find out the actual details.

However if you want to speculate in general about why people might write a hashcode algorithm which might produce duplicate hashcodes for unequal objects, you could certainly do that. Just bear in mind that such an algorithm is never designed so that it produces duplicate hashcodes under certain well-defined conditions; the actual situation is that they are designed so that they might sometimes produce duplicates because it's unavoidable. Nobody cares exactly what duplicates it might produce or when it might produce them because -- as you know -- it doesn't matter.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

Here's an example: the String class. Each String object has a hashcode, of course. There are 2^32 possible hashcodes, as you know, but there are far more than 2^32 distinct possible Strings. And by far more I mean unimaginably more. Not just 2^1000 or even 2^1000000, but unimaginably more than even that.

And so it follows that an unbelievably huge number of distinct Strings all have the same hashcode.

But it doesn't matter. It might be possible to look at the hashcode algorithm for String and bring in the heavy-duty mathematics to explain exactly in what case two different Strings would have the same hashcode, but that would be a complete waste of time because it doesn't matter.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14269
    
  21

Ravi Gupt wrote:one possible case is - when jvm runs out of available unique hash code values.

The JVM cannot "run out of available unique hash codes". The JVM isn't generating or handing out hash codes to objects; the idea that "the JVM runs out of hash codes" isn't a valid idea at all. What the hash code of an object is, entirely depends on the implementation of the hashCode() method of the class of the object. You could implement your own hashCode() method that always returns the same value:

That would be perfectly valid.

The only problem is this is that it would not be very efficient if you store this in a collection class that uses hash codes, such as HashSet or as a key in a HashMap.

The number of objects that you can create does not have anything at all to do with hash codes.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Ravi Gupt wrote:I want to know in which all cases different objects would have same hash code value.

Simply put: you can't. And the reason you can't is that the implementation for Object.hashCode() is (a) not specified, and (b) is JVM-specific.
That means that if, after poring for weeks over reams of JVM code, you finally discovered what the actual hashcode algorithm was, it might change completely when you get the next version (or indeed, possibly even after a patch).

However, one possible scenario for two objects having the same Object.hashCode():
1. The method returns the last 32 bits of the object's memory address.
2. You request hashcodes for two objects that are exactly 4Gb apart in memory.

Unlikely, but possible.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Ravi Gupt
Greenhorn

Joined: Oct 16, 2007
Posts: 17
@Jesper,

We are not talking about overridden hashcode method here. We are talking about Object.hashCode() Please read the question again.


@Winston,

This exactly the answer i was looking for. You rock.

Thanks,
Ravi




Dmitry Katsubo
Greenhorn

Joined: Jul 21, 2011
Posts: 5
I have recently learned that Java heap is of two types: young generation and old generation. In order to move objects from one part of heap to another, object references should be not direct, but indirect. Any knowledge how is it implemented? Depending on implementation, this can limit the number of objects that JVM can create.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Dmitry Katsubo wrote:I have recently learned that Java heap is of two types: young generation and old generation. In order to move objects from one part of heap to another, object references should be not direct, but indirect. Any knowledge how is it implemented? Depending on implementation, this can limit the number of objects that JVM can create.


Have you looked in the Java Virtual Machine Specification to see if the implementation is defined in there: http://docs.oracle.com/javase/specs/jvms/se7/html/index.html

My guess is it is not, and if it is not then the implementation is un-known and likely to change from one JVM to another (and probably not terribly interesting unless you plan on implementing a JVM yourself).


Steve
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14269
    
  21

Ravi Gupt wrote:If there is no limit then how we get unique hashcode for each object.

This presumption is false, objects do not have unique hash codes, and Object.hashCode() does not guarantee that objects have unique hash codes.
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 867
    
    5
Paul Clapham wrote:Here's an example: the String class.


Actually the API docs describes the way hashCode is computed, so it is not difficult to fabricate Strings even of the length two that share the same hashCode:

Dmitry Katsubo
Greenhorn

Joined: Jul 21, 2011
Posts: 5
Steve Luke wrote:Have you looked in the Java Virtual Machine Specification to see if the implementation is defined in there: http://docs.oracle.com/javase/specs/jvms/se7/html/index.html

My guess is it is not, and if it is not then the implementation is un-known and likely to change from one JVM to another (and probably not terribly interesting unless you plan on implementing a JVM yourself).


Your guess is right: it cannot be defined in JLS. No way. It is too specific. But let's narrow the question: Sun JVM version 6 – How many objects it may create? To answer this question we need a person who knows how memory management is implemented on concrete JVM.

Coming back to my theory: If JVM wants to move objects from young generation to old generation, it either needs an additional reference "layer", or keep the list of back references to places where this object is referred. Last approach is insane, so I assume that there is an intermediate reference layer.

I just thought that it does not matter how you organize the dereferencing: on 32-bit platform the object reference may hold 2^32 unique values, thus JVM cannot create more than this number of objects (also true for 64-bit JVM with compressed pointers, -XX:+UseCompressedOops). In practice it is much less, because each (even empty) object is far more bigger then 1 byte, but with one of techniques below that becomes theoretically possible.

Now coming back to indirect references:

It can be implemented using the small bridging pointer:

java_reference may be 32-bit on any platform if JVM manages to always allocate bridging records in first 4GB. However pointer_to_object_data may be 32- or 64- bit, depending on the platform.

Maintenance of such small bridging records brings a bit of overhead, as if it is allocated on heap in normal way, one need to record that this area is allocated. So for 4-byte pointer_to_object_data one will need maybe extra 32 bytes of "service" records.

Another approach is to use table of bridging pointers. It looks the similar way:

JVM will grow this table dynamically and keep its base address. "Free" table records are marked with special value "0", and operation of finding next "free" slot is O(n), unless JVM holds a basket of some reasonable amount of slots.

Also needless to say that Java objects are 8-byte aligned, which means that lower 3 bits are always zero. Other words, pointer_to_object_data should be left shifted by 3 to get actual reference in memory. This allows Java to reference 2^35 = 32GB of RAM with 32-bit pointers.
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 867
    
    5
Ivan Jozsef Balazs wrote:
Actually the API docs describes the way hashCode is computed, so it is not difficult to fabricate Strings even of the length two that share the same hashCode:


H ->2264
G?->2264
F^->2264
E}->2264
D?->2264
C»->2264
BÚ->2264
A?->2264

Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Dmitry Katsubo wrote:
Steve Luke wrote:Have you looked in the Java Virtual Machine Specification to see if the implementation is defined in there: http://docs.oracle.com/javase/specs/jvms/se7/html/index.html

My guess is it is not, and if it is not then the implementation is un-known and likely to change from one JVM to another (and probably not terribly interesting unless you plan on implementing a JVM yourself).


Your guess is right: it cannot be defined in JLS. No way. It is too specific.

That is the JVM specification, not the JLS. Different documents.

Dmitry Katsubo wrote:But let's narrow the question: Sun JVM version 6

Probably need to go even finer. Sun JVM version 6.whatever on Windows 7 x86 architecture, possibly with specific command line options switched. The point is:
1) It isn't specified so it isn't standard
2) It isn't specified so it can change
3) It isn't specified so you don't know unless you read the code. Anything else is speculation.
4) It isn't specified so if you do figure something out on some platform, then you can't take advantage of it, because you could be wrong on another configuration and system (code no longer portable).
5) It doesn't affect your day-to-day programming. Most likely you will run out of RAM (~128GB for references alone, at (2^32)-1 objects with 32 bit/reference) long before you get to Object count limits, and you get that many Objects you probably have other issues.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40


Don't have a direct answer, but here is another note to chew on. Years ago, I worked on a few applications that had 800GB+ footprints. The JVMs were configured to have most of the memory as heap (in comparison).

They all worked fine. If there is an internal limitation (that isn't specified), we certainly didn't hit it.

Henry

Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

Just to clarify, 2^64 = 18 billion billion. The number of stars in the Milky way galaxy are 300 billion. So, you will run out of stars in the Milky way before you run out of unique addresses in a 64 bit system. More importantly, you will run out of memory before you will run out of unique addresses. With 64 bit addressing, we are set for a long long long time says the guy who was jealous of his cousin for having 64K more RAM
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 867
    
    5
In order to move objects from one part of heap to another, object references should be not direct, but indirect.


This seems obvious but if the garbage collector tracks *all* referencfes to a given object, it can reassign them when moving the object to another place.
So it is conceviable to avoid indirect references despite of the garbage collection - I was recently told somewhere on to forums.

 
 
subject: How many objects can you create in JVM with very large RAM.