It's not a secret anymore!*
The moose likes Performance and the fly likes OutOfMemory error Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "OutOfMemory error" Watch "OutOfMemory error" New topic
Author

OutOfMemory error

Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
This one is killing me. I'm using java.sql.ResultSet to read in data, proucing a Vectored String from the ResultSet. So far, so good, as long as I have System.gc() after the first 10,000 iterations. Just for yuks, I do another System.gc() before I start creating an Excel worksheet using HSSF. NOW is when it gets interesting. I thought the Vector was large; this portion just KILLS the memory. I don't even get to the 1,000th iteration of building the table and it craps out. I'm at a loss. Thanks very much.

Pat
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
Can you just write a CSV file with the results and let excel read that?
Bill
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
No. It is required to be an Excel spreadsheet.
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30138
    
150

Pat,
A CSV is an unformatted spreadsheet. Even if you need the formatting, you can use it as an intermediate step. This reduces the number of things in memory at any given time. General process:

1) Iterate through resultset writing each row to a file. You may need to flush the file every bunch of records.
2) Read in the file one line at a time and write to the HSSF document one line at a time. Again, you may need to flush to preserve memory.

This essentially limits you to one row of data in memory at a given time. Once you have this working, you can optimize it by reading in X rows at a time (such as 100.)


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 860
10,000 rows really doesn't seem to be big enough to clobber your server. are you sure there isn't something else in the code that is the problem? post your code.


http://www.jamonapi.com/ - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Did you increase the memory for the VM? Did you use a profiler to see which objects are causing the memory problem?


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Okay, guys, here's the scoop. (Sorry I haven't gotten back to anyone sooner, but it got real intense here.)

I learned something interesting about Strings -- they seem to have quite an appetite for memory. When I approached one of my colleagues here about employing another strategy, he mentioned that he was surprised that a Vector would have an issue since it doesn't have memory restrictions like Strings do, to which I told him that the Vector was loaded with String arrays:

Vector data = new Vector();
// int numCols = -get the number of columns from the ResultSet metadata
while (rs.hasData());
{
String temp = new String[numCols];
for (int i = 0; i < numCols; i++)
{
temp = rs.getData(i+1); // ResultSet starts at 1, not 0
}
data.addElement(temp);
}

This was fine for under 13,000 rows. As soon as it went over, man did it blow up. No System.gc() could help, although I tried. Adding the System.gc() got it thru creating the Vector for the big market (25K rows), but died when it got to creating the spreadsheet. So when my colleague (a god, in my mine) mentioned that I shouldn't have had a problem with the Vector, I decided to try a Vectored Vector:


Vector data = new Vector();
// int numCols = -get the number of columns from the ResultSet metadata
while (rs.hasData());
{
Vector temp = new Vector();
for (int i = 0; i < numCols; i++)
{
temp.addElement(rs.getData(i+1)); // ResultSet starts at 1, not 0
}
data.addElement(temp);
}

Wow!!! You would not believe the difference! Hardly any memory used by comparison and EVERY SINGLE THING PRODUCED A REPORT! No muss, no fuss. Granted, the largest market took nearly three minutes to do so, but hey, that was much better than its former condition: bombing out (in Production) after 15 minutes with an OutOfMemoryError. So I thought I'd pass this on to you.

Also, quick note, if you're reading from Sybase and grabbing a ResultSet, do NOT do immediate processing except to load another object as I did above. Apparently, trying to process the data directly from Sybase holds up everything and since Sybase is being manipulated all the time, it takes much longer than doing a straight read and write. Our processing now takes 1/3 the time it did just by loading a Vector and then creating the spreadsheet rather than writing directly to the spreadsheet from Sybase.

Thanks to all of you for your ideas and responses. Wonderful to have a great support team.

Cheers!

Pat
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Aside from the fact that none of this code will compile (is "data" in the first example a String or a String[]? ResultSet has neither a "hasData()" method nor a getData() method; etc.) If the basic gist of this is that you're using Vectors instead of String[] to hold Strings, then your new version uses a bit more memory, not less.

I'm glad you got your program working, but don't fool yourself into thinking that you've found a general principle here. I suspect during the rewriting, you fixed the real bug, whatever it was -- because it sure wasn't what you've described here.


[Jess in Action][AskingGoodQuestions]
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Ernest is right - the Vector class uses arrays internally, and even needs to create new ones while it is growing in size. The final arrays is also likely to be bigger than the one you hand-coded.

And of course it also doesn't have to do with any "size limitations" regarding Strings - your String handling doesn't seem to have changed at all!
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Actually, Earnest, ResultSet <i>does</i> have hasData() and getData() since I'm successfully compiling and using my class, otherwise, why would I say it works. And the problem wasn't that I fixed a bug without knowing it; in further research, what I suspected to be true was, in fact, definitely true: Reading a ResultSet within an open Connection will slow down the process considerably, especially if Sybase is processing at the time.

Which leads back to the original problem prior to my using Vectors: they were reading the data directly into the spreadsheet. By using the code I described,
Vector data = new Vector();
// int numCols = -get the number of columns from the ResultSet metadata
while (rs.hasData());
{
String temp = new String[numCols];
for (int i = 0; i < numCols; i++)
{
temp = rs.getData(i+1); // ResultSet starts at 1, not 0
}
data.addElement(temp);
}
, I added speed and efficiency, although, logically speaking, it should have slowed things down by adding another data passthru. Hey, I'm new enough that I'll try anything, and if it works, heck, I'm happy to pass it on. I'm also old enough to keep my mouth shut when someone tells me that what they did worked because I may not know as much as I thought I did .

As far as Vectors and Arrays, yes, I'm quite aware of what the Vector is doing. However, a String array is what caused the memory consumption since, it turns out, Strings (arrays or no) apparently comsume actual memory whereas Vectors do not. Check it out for yourselves. I got this from our local guru who was surprised that a Vector would cause any memory problem. Of course, I could very easily have misunderstood the reasoning behind it, so please feel free to correct me <i>when you have the facts</i> so that we may all learn. Personally, as soon as he said that the Vector shouldn't have caused any problems, I immediately switched from the String[] array and used Vectoring instead. That simple switch changed everything and now my version of the class is in Production with a lot of really positive feedback coming in.

This was quite fun, albeit initially terribly frustrating. Man! I love a challenge! It was especially fun when I realized that not only did using Vectors get rid of my OutOfMemory problem, it turns out that it also sped up the entire process so that the maximum amount of time it now takes to create the data has dropped from 15+ minutes to under 3 -- that's an 80% improvement. Pretty durned cool, buckaroos! Yeah, I am so psyched! Talk about being a cloud nine!!
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Just so you know, I use the following to get the ResultSet:

ResultSet rs = Statement.getResultSet();

I'm researching this now, but Object has getData(), and since everything inherits from Object, ResultSet has getData(). I love inheritance. Just wish it were bigger and financially associated....
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Okay, I found the problem. The hasData() method was made up -- I was quickly typing and remembered wrongly. It was "while (rs.next())". Sorry for the minor miscommunication there.
Steven Bell
Ranch Hand

Joined: Dec 29, 2004
Posts: 1071
However, a String array is what caused the memory consumption since, it turns out, Strings (arrays or no) apparently comsume actual memory whereas Vectors do not.

I'm sorry but I simply can't let nonsense like this go by without a comment. All Objects that are created consume memory. Where do you think all that stuff you're putting into the Vector gets stored? Out in space somewhere?

A Vector is nothing more than an Object[] that will garuntee it doesn't run out of space and is syncronized. If anything using Vectors is less efficient as the default behavior of a Vector is to double in size each time it grows plus it starts at size of 10 so it will grow many times in your program continually creating larger Object[]'s and putting the GC to work cleaning up the ones it gets rid of.

I'm glad your program is running better, but don't fool yourself into thinking it was the Vector vs String[] that did it.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Originally posted by Pat Flickner:
Object has getData(), and since everything inherits from Object, ResultSet has getData().


Object has no such method!
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
...which was acknowledged in Pat's last message, yes?


"I'm not back." - Bill Harding, Twister
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18548
    
  40

Pat isn't the only person with the misconception that Strings waste memory. I have encountered a few others that also stated this.

If I had to guess, I speculate that it was probably caused by poor use of strings. The fact that it generates intermediate string objects when you build a string. Or the fact that you generate a new string everytime that you change it.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Originally posted by Jim Yingst:
...which was acknowledged in Pat's last message, yes?


He said that "hasData()" is really called "next()", but there was still the explanation regarding Object and "getData()". His original code used both methods.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I figured it was covered by "The hasData() method was made up" - but I suppose it doesn't hurt to be a little more explicit about the scope of the problem. Carry on, then.
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Okay, then <i>you</i> replace Vector with String in this method, test it on 25K rows from a database (preferably Sybase ASE 12.0) and tell me why the memory shoots up the way it does with String (AND takes so long) while Vecotr does not:



And I'm sure with your tremendous backgrounds, you can figure out how to bypass the DataSourceWrapper portion since that is ours and I am not allowed to show it to you.

Pat
[ June 06, 2005: Message edited by: Ilja Preuss ]
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
...and one more item: there are 54 rows, 30 of which contain Strings that can be up to 1,000 characters.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Originally, we were discussing whether using a String array was somehow worse than using a Vector. Now you said to replace the Vector with a String. To do the latter, of course, you'd have to change the code significantly, as a String is nothing like a Vector. Do you mean a String[]?

If so, then there are still questions. At what size do you allocate the array? Do you grow it a little at a time, or allocate at one fixed size? In other words, we can't make a meaningful comparison of one real piece of code with an imaginary one -- we'd have to see both versions, as you wrote them.

No one is calling you a liar, Pat: I'm absolutely positive that you ran two different versions of this code and this Vector one is much better and faster than the other one. But I'm also pretty sure something is really wrong with the other version, the one you haven't shown us. Right now I'm interested in finding out what was wrong and explaining it.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
I agree with Ernest - seeing your other version should be enlightening.

And please use code tags to preserve the formatting of your code. I will edit your post above accordingly.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
As far as I can tell, your code doesn't ever read 25k rows - it stops at 10k, no matter how many there would be in the database. Did your other version do that, too?
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
You're right. That should have been edited out. I was using that initially for the String arrays since 10,000 seemed to be the maximum allowed. I kept it to test the Vector and deleted it in the version I sent to Production. Sorry for sending the dev version.
Steven Bell
Ranch Hand

Joined: Dec 29, 2004
Posts: 1071
It would be interesting to see the original code for comparison.
Pat Flickner
Ranch Hand

Joined: Nov 17, 2003
Posts: 173
Oh, it was pretty simple to get the number of rows since I loaded the String arrays to the Vector. I used the getColumnCount to set the size of the array:



And you might want to watch your memory while you're running this; it runs out pretty fast.
Steven Bell
Ranch Hand

Joined: Dec 29, 2004
Posts: 1071
Just because this whole thing was bugging me I wrote up a little memory usage test between a Vector of Vectors and a Vector of String[].

Here is the code:


And the results I got from running it:

Showing that the String[] uses less memory than a Vector consistently. Noticed I ran both twice and changed the order so as to rule out any inconsistency based on order or other issues. Memory usage seems to be constant based on the size and type in this case.

This was run on Win2000 with the 1.4.2_05 jre.
 
jQuery in Action, 2nd edition
 
subject: OutOfMemory error
 
Similar Threads
System.gc()
Help for "Inconsistent classes map" error
Garbage collection output doubt
why two gc() method ?
Question on Garbage Collection