File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes JDBC and Relational Databases and the fly likes Pulling millions of unique values from Oracle? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "Pulling millions of unique values from Oracle?" Watch "Pulling millions of unique values from Oracle?" New topic

Pulling millions of unique values from Oracle?

Bill Gathen

Joined: Aug 03, 2003
Posts: 5
I'm rewriting an app that pulls a dozen data items for about six million rows in an Oracle database. About half of these are strings, and a majority are unique (physical addresses and 10-digit phone numbers).
I've been profiling my memory usage, and the garbage collector gets me back down to a fairly constant size every time it runs, but it gets behind at the threshold, which means it allocates a little extra space before clearing the memory, and the heap gets bigger every cycle.
Is there a way of pulling them (or streaming them) directly into a StringBuffer, to prevent the string pool from getting enormous from all the unique values? I've been trying getAsciiStream and getCharacterStream but not having much luck.
Alternately, is there a way to fiddle with the GC to prevent the reallocate before the collection?
Has anyone run into a problem like this before? Am I barking up the wrong tree?
Thanks in advance,
Loren Rosen
Ranch Hand

Joined: Feb 12, 2003
Posts: 156
Let me see if I understand what you're doing. You're doing a database query that returns millions of rows, looping through the result set, doing something with each row in turn (printing a mailing label or something). The key point is that once you finish an iteration of the loop, you're done with that row. So you'd expect the memory usage to have a sawtooth pattern, droping down to the same constant low point each time garbage collection completes. But instead each low point is little higher than the previous low point. Is that correct?
Bill Gathen

Joined: Aug 03, 2003
Posts: 5
The bottom of the sawtooth is constant. It's not a textbook memory leak, where something is not being dereferenced and lives until the program dies.
It's the *top* of the sawtooth that I'm concerned with. If the initial allocated heap is 2 meg, it seems to let the objects pile up until just below 2 meg, then run gc. The used memory drops drastically, then starts building back up.
The core problem seems to be that between the time the gc decides to run again and the time it actually starts freeing memory, the main thread has added a couple more objects (running past 2 meg) and has to allocate more space for the heap. Now the allocated heap space is bigger, so it goes longer before gc'ing the next time, with same lag problem increasing the size yet again. Repeat x,000 times and the heap has gotten very large.
I agree. Here's the link:
subject: Pulling millions of unique values from Oracle?
It's not a secret anymore!