aspose file tools*
The moose likes Beginning Java and the fly likes maximum length of String = 32k ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "maximum length of String = 32k ?" Watch "maximum length of String = 32k ?" New topic
Author

maximum length of String = 32k ?

Arron Zhang
Greenhorn

Joined: Dec 27, 2002
Posts: 15
maximum length of String = 32k ?


novelist & programmer
Linda Jones
Ranch Hand

Joined: Aug 17, 2002
Posts: 57
Simplistically, yes.
Here's a discussion of the topic:
String length
Linda
Robbie shi
Greenhorn

Joined: Jan 05, 2003
Posts: 28
Java has no limit - the JLS does not specify a limit. There is an implicit limitation if one presumes that a String must be implemented as a single array.
Because of the way the JVM spec is laid out there are limits on the constant length and how it is specified.
There is a limit for each object based on the java heap. Strings live on the heap, so a single one can never be bigger than the heap. Keep in mind that if copying a large string that the space effectively doubles. And characters take two bytes. So one meg of characters take up 2 meg of space.
There is also a limit based on serialization due to a 'bug' which might or might not have been fixed (since the Sun folks were arguing as to whether it is a bug or not.) This limits it to 64k, or maybe 32k. But only when serialization is used.
For most basic purposes the limit is imposed by the heap size and the serialization limit.

----
Robbies
-----------------------------
1.java IDE tool : JawaBeginer
2.Java Jar tool : JavaJar
http://www.pivotonic.com
-----------------------------
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
If you read the other posts after the one Robbie quoted (uncredited), you see there are corrections to the statement "Java has no limit". The JLS does specify that the String class must follow the String API. The String API has methods int length() and char[] toCharacterArray() which can only work if the number of chars in the String is less than or equal to Integer.MAX_VALUE, which is 2^^31 - 1 = 2147483647. This would take about 4 GB of memory as a character array (2 bytes per char). This is the only hard limit imposed by the language. In practice you we are usually limited by memory. Try running the following program using java -mx8G (setting max heap size to 8 gigabytes, which is a hideously large amount for most of us):

Unless you actually have several gigabytes of RAM, this probram will slow down considerably as the strings get successively larger - the excess memory is stored on disk, which takes much longer to access. Personally I witnessed it create a string of size 134217728 before I got tired of waiting for it. But if you've got the memory available, plus time to wait, then there's no reason you can't see a String much larger strings created - up to the stated limit.


"I'm not back." - Bill Harding, Twister
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
I modified the program slightly (to use a StringBuffer) and it went pretty smoothly until 16777216 before disk crunching for a while. The next number was 33554432 (at which point I gave up--except Control-C wouldn't stop the VM!). I'm still hearing disk crunch. Hmmmm. We'll see what happens.
Maulin Vasavada
Ranch Hand

Joined: Nov 04, 2001
Posts: 1871
hi Jim,
i have a confusion here as per your post that says,

"The String API has methods int length() and char[] toCharacterArray() which can only work if the number of chars in the String is less than or equal to Integer.MAX_VALUE, which is 2^^31 - 1 = 2147483647."

i agree that String API must be followed for String objects but that can't be reason, i think, to have a limit of 4GB, as you described, on String object because if say i have String object > 4GB then it will truncate (cast) the length, which actually becomes long data type now, to int type and will return a -ve value (as the higher bit becomes 1 and all fundamentals you know...)...
this won't be true length of the string obviously but API can't do much about it if the object size violates API spec return type OR i don't know if JVM throws Exception if we increase the limit that violates length() return value...
the logic of having heap size limitation fits to my mind more.
i will have to test what happens when we have string > 4GB and call length() object on it but i don't have that much RAM so i am not sure if thats going to work for me...

regards
maulin
Arron Zhang
Greenhorn

Joined: Dec 27, 2002
Posts: 15
I test the length of String from 32k to 15M(the size of memory on my computer is 384M)
the length can be any digit less than 15M,
when the length is 15M,exception:
Exception in thread "main" java.lang.OutOfMemoryError
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Maulin- well, the API for length() doesn't say anything about taking the size of the String and casting it to long before returning the number. It says it returns the length, period. If it can't do that, it's unable to comply with its own API.
I know that there are a number of places in Java where longs get converted to ints, which may create negative numbers and assorted annoying effects. But each of these cases is actually documented somewhere - it would be exceedingly bad manners for them to throw in this type of conversion here without warning anyone.
Also, consider the toCharArray() method. Any array is limited to having an int number of elements - how else would you access the higher indices? So a String longer than Integer.MAX_VALUE couldn't return a char[] array that actually contains the complete contents of the String - another violation of the API. And what about many other String methods like lastIndexOf(char)? Again, it returns an int - which may not be able to hold the index of the correct answer.
So, what can a String do to avoid these violations? Well, it's always allowed to throw a RuntimeException or Error like OutOfMemoryError, the moment an attempt is made to create a string that's beyond the theoretical limit. As we've seen, this error is usually thrown well before reaching the theoretical limit. It's true that it might be nice if the String API actually documented what does occur in this case, but it's not required to do so for unchecked exception.
Arron - did you reset the max heap size? I mis-stated the option to use here - it's -Xmx rather than -mx, and G is not a recognized suffix. (I guess they figured no one would try setting the heap this big.) So to set a heap size of 4 Gb, for example, you'd use
java -Xmx4192m MyClass
If you don't do this, you're getting a default heap size of 64M. This seems to explain your results - a String of 15M chars probably takes 30M bytes. And if you've got this in a StringBuffer and you add another 1M chars onto it, you're creating a new internal char[] array of length 16M, taking up 32M bytes. Both old and new char arrays need to be kept in memory at the same time, at least long enough to copy values from one to the other. So you're using 62M right there - it's not difficult to imagine that you've done something else slightly different which causes you to use a little more instead, crossing the 64M line.
Hint - run java using the -verbose:gc option as well, to get messages telling you how big the heap really is (and how much gargage collection is going on). Enjoy...
[ January 06, 2003: Message edited by: Jim Yingst ]
Arron Zhang
Greenhorn

Joined: Dec 27, 2002
Posts: 15
I see.
Thanks.
Maulin Vasavada
Ranch Hand

Joined: Nov 04, 2001
Posts: 1871
hi Jim,
now its clear. thanks for the details explanation. it was good
so now i have got one more question to be asked to the person who claims to know Java "If I have 10GB of character data and want to process it using array how would you go about it?"
regards
maulin
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Ummm... don't? As we've seen it's going to be very hard to get that much data in an array at once, unless your computer has a lot more RAM than mine does. So typically you'd analyse the type of "processing" required and try to figure out how much data needs to be in memory at one time. A common situation is if you're reading data from a file, you often only need to save & process one line at a time. When you're done processing a given line, reuse the same String variable to store the next line. The previous line should then be discarded. See this linkfor discussion of this sort of thing. If you find that the relationships in your data are more complex and you can't do the processing you need one line at a time, then it may be preferable to design a database schema to capture all the data relationships you are interested in, and then let the database do the work of searching for things. That's a much more complex topic - but it really depends what sort of data you have, and what sort of processing you need to do.
Guennadiy VANIN
Ranch Hand

Joined: Aug 30, 2001
Posts: 898

Java has no limit - the JLS does not specify a limit.

But JVM specs do:
".From the Bill Venners' "The lean, mean, virtual machine. An introduction to the basic structure and functionality of the Java Virtual Machine

The size of an address in the JVM is 32 bits.The JVM can, therefore, address up to 4 gigabytes (2 to the power of 32) of memory, with each memory location containing one byte. Each register in the JVM stores one 32-bit address. The stack, the garbage-collected heap, and the method area reside somewhere within the 4 gigabytes of addressable memory. The exact location of these memory areas is a decision of the implementor of each particular JVM.

From JVM Specs, 1.1
While the Java Virtual Machines would appear to be limited by the bytecode definition to running on a 32-bit
address space machine, it is possible to build a version of the Java Virtual Machine that automatically
translates the bytecodes into a 64-bit form. A description of this transformation is beyond the scope of this
specification.

I would like to understand whether heap can be more than RAM+"Total paging size"

For ex., in my Windows XP
Total paging size for all drives:
(In Windows XP it is in Control Panel – System – Advanced – Performance – Settings – Advanced )
is 386 MB
RAM is 256 MB
So it is +-640MB
Can you all, who "crunched" with such patience your hard disks and first of all Arron Zhang, kindly write those in your PCs?
I propose to rename "Performance" forum to "JVM
[ January 14, 2003: Message edited by: yidanneuG ninaV ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
True - the JVM specs limit the JVM internal memory to 4 GB. This is more fundamental than the String API limits described above, since we can imagine a Java implementation which violates parts of the String API but otherwise works as expected - but even if a JVM contains more than 4 GB of memory, compiled bytecode will never access it unless that compiled code uses a new format with addresses longer than 32 bits. This is such a fundamental change that it would completely break compatibility with other Java implementations.
I would like to understand whether heap can be more than RAM+"Total paging size"
Probably it can't. However I tired of my test long before it came close to my total available paging size, so I can't confirm this at the moment.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: maximum length of String = 32k ?