This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes Maximum capacity of arrayList for String objects is one million? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Maximum capacity of arrayList for String objects is one million?" Watch "Maximum capacity of arrayList for String objects is one million?" New topic
Author

Maximum capacity of arrayList for String objects is one million?

Aaron Ravi Jakobovits
Greenhorn

Joined: Jul 27, 2010
Posts: 9
I have a program in which I need to store approximately 11 million words (about 8 mb) in an arrayList. My problem is arrayList will only hold 1 million String objects. Even when I explicitly declare a size, like this

or use the ensureCapacity method like this

the arrayList will still only hold 1 million String objects. Should I be using another data structure? I really would like to use arrayList, unless it is impossible to store this much data in it. Thanks.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

No, the maximum capacity of an ArrayList (or any kind of list whatsoever) is limited only by the amount of memory the JVM has available.

Your estimate of the amount of memory required for your data is surely wrong; if 11 million words really required 8 million bytes then each word, on average, would require less than one byte. Whereas in reality a String object requires something like 40 bytes, minimum.

So you may want to look at the possibility of giving your JVM more memory to work with.
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1761
    
    7

An ArrayList can easily hold 11 million String references, provided there's sufficient heap space available hold all those String objects.
How did you arrive at a maximum of 1 million String/Object references? Did you encounter some sort of error message?

Edit: Refresh the page whydontcha...


Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

The theoretical limit for ArrayList capacity is Integer.MAX_VALUE, a.k.a. 2^31 - 1, a.k.a. 2,147,483,647. But you'll probably get an OutOfMemoryError long before that time because, well, you run out of memory.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3617
    
  14

An ArrayList containing 11 million references should be about 42MB, assuming a reference is 4 bytes long.

That's 42MB in references alone. The actual Strings aren't even considered yet.
Aaron Ravi Jakobovits
Greenhorn

Joined: Jul 27, 2010
Posts: 9
Okay, so the file is 1.0288mb (10,288kb) and contains 10,534,015 Strings, to be precise. I think my problem is that the arrayList is declared within my main method. Would declarations and definitions within a main method be stored dynamically on the heap or stack?
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3617
    
  14

Are you saying that each String in the file is on average 1 byte long?
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

Is it necessary to hold the whole document ?


[My Blog]
All roads lead to JavaRanch
Aaron Ravi Jakobovits
Greenhorn

Joined: Jul 27, 2010
Posts: 9
If you do the math, 10,288kb * 1000 = 10,288,000, so yes, a little larger than 1 byte per string on average; but how can that be? I just googled the size of a string in Java and it is at least 4 bytes, right? Now I'm totally confused. I can post the code and add the file as an attachment if anyone would like to check this, but I don't know where I could have made an error.
Aaron Ravi Jakobovits
Greenhorn

Joined: Jul 27, 2010
Posts: 9
Okay, I made a mistake, big time. I'm using a BufferedInputStream object and the .read() method associated with it to iterate through the file. The .read() method apparently returns the number of bytes, not tokens, my mistake. So my question: is the definition of the arrayList inside the main method of the program limiting the number of elements the arrayList can hold and why?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3013
    
  10
Aaron Ravi Jakobovits wrote:If you do the math, 10,288kb * 1000 = 10,288,000, so yes, a little larger than 1 byte per string on average; but how can that be?

Well, the most likely scenarios are:

(1) You are mistaken, and the file is much bigger than 10,288kb.

(2) You are mistaken, and the file contains far fewer than 10,534,015 Strings.

(3) You are right, and almost all of the "strings" are exactly one byte in length, or maybe zero, and you neglected to tell us the magic formula by which you determine how long a single "string" (or maybe "line") is. Maybe all "strings" are exactly one character? Otherwise it seems like you need to allocate some more bytes to tell us how long each string is.

Aaron Ravi Jakobovits wrote:I just googled the size of a string in Java and it is at least 4 bytes, right? Now I'm totally confused. I can post the code and add the file as an attachment if anyone would like to check this, but I don't know where I could have made an error.

Hmmm, 4 bytes still seems an underestimate, but whatever. We don't know where you could have made an error either, but it seems that posting the code may be the best course of action.
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3617
    
  14

You shouldn't use BufferedInputStream to read character data.

Use BufferedReader instead, or maybe a Scanner.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14117
    
  16

But how did you come to the conclusion that there is a limit of 1 million strings in an ArrayList? Did you get an error while you tried to run your program? Perhaps an OutOfMemoryError?

If you really need to hold all those 10.5 million strings in memory at once, you might want to give the JVM some more memory by using the -Xmx command line switch. For example:

java -Xmx512m com.mypackage.MyProgram

to give the JVM max. 512 MB memory to work with. The default for the max. amount of memory when using a 32-bit JVM on Windows is quite low, I think 64 MB. If you have 10.5 million strings I can imagine that you'd easily be using more than 64 MB memory.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

Note that a 32-bit JVM on Windows allows you to go up to 1.5GB, not more. That's a Windows limitation. If you ever need to use more than 1.5GB inside a JVM you'll need to start using a 64-bit JVM, which of course also requires a 64-bit Windows.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38517
    
  23
I think this isn't a "beginning" question, so I shall move it.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Aaron Ravi Jakobovits wrote:If you do the math, 10,288kb * 1000 = 10,288,000, so yes, a little larger than 1 byte per string on average; but how can that be? I just googled the size of a string in Java and it is at least 4 bytes, right?


But those two numbers are only loosely related. The first thing to realize is that a character in Java requires two bytes of memory (it's a Unicode character). So if those bytes in your file are all ASCII characters, you need at least twice as much as 10,288 KB to store them as string data. And second, a String is implemented as an object containing an array of characters and some other control information. The estimates I have seen for this say it's more like 40 bytes than 4 bytes. So roughly speaking you need 40 bytes per String as overhead plus 2 bytes for each character in the data. And then there's the references to those Strings which you store in that list. Those are what take 4 bytes, so there's another 4 bytes overhead for each String.

And don't take those numbers as precise information. They are estimates with various degrees of accuracy and might depend on your environment. For example a 64-bit Java might require more memory to store references than a 32-bit Java. Or not... I don't really know.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Maximum capacity of arrayList for String objects is one million?