aspose file tools*
The moose likes Java in General and the fly likes Handling millions of rows without using arrays Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Handling millions of rows without using arrays" Watch "Handling millions of rows without using arrays" New topic
Author

Handling millions of rows without using arrays

Andavar Perumal
Greenhorn

Joined: Mar 14, 2012
Posts: 5
Hi,
I am fetching around millions of rows from table and its stored into arrays.While i trying to handling arrays, the System performance is too slow . I tried Linked List but it also not helped me. Is there any other better way to handling millions of rows instead of arrays without affecting System performance .

thanks.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61424
    
  67

What makes you think that any other construct will use less memory than an array?

Do you really think it's reasonable to attempt to hold this much data in memory?


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 859

Why [do]would you want to store millions of rows in memory (arrays)?

If it is to generate some synopsis or totals, then you will be better off using a DB stored procedure.

WP
Andavar Perumal
Greenhorn

Joined: Mar 14, 2012
Posts: 5
Bear Bibeault wrote:What makes you think that any other construct will use less memory than an array?

Do you really think it's reasonable to attempt to hold this much data in memory?


Yes, in my development i need to handle such a rows from table.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

Well you are screwed then! SCREWED!!



You really need to ask yourself why you need millions of rows? Can the work be done without loading millions of rows? Can the work be done by the database? Can you do the work by loading batches of rows from the database?
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 859

LMAO @Jayesh

Yes, SCREWED pretty much sums it up.

WP
Andavar Perumal
Greenhorn

Joined: Mar 14, 2012
Posts: 5
Jayesh A Lalwani wrote:Well you are screwed then! SCREWED!!



You really need to ask yourself why you need millions of rows? Can the work be done without loading millions of rows? Can the work be done by the database? Can you do the work by loading batches of rows from the database?



first i fetch the data from database and its equally split into 3parts. the finally it is assigned to executed into any one the server like data mover in Peoplesoft. The above one is my task. could you help for that...
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12805
    
    5
Exactly what operation do you need to perform on these rows?

Why does this operation require that all rows be in memory at one time?

Bill
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

I don;t know what Data mover in People soft is.

Can you split your data into smaller parts? When you give data to data mover (or whatever that is) can you give it in 10 parts? or 100? or 1000?

Or can you tell data mover to retreive data directly from the database? like just give the data mover the IDs instead of whole rows? or maybe a criteria that the data mover can use to fetch the records itself?

Can you fetch rows only for one part at a time? DOn't load all the data in memory and then divide into parts. Load one part at a time and sent it to Data Mover.


Looking at it holistically based on the info you have given here, when you are building a system that distributes processing, it seems pretty silly to move your data from database to your server, and then move the data back to a worker. You should reduce the hops that the data takes. Your worker should fetch the data that it needs directly from the database. Of course, you have to balance the need to reduce the amount of IO with the amount of concurrent connections to the database. So, the general guideline that I follow is, small-frequently-accessed pieces of data should be retrieved by the controller, and then distributed to the workers by the controller, but large data should be batched into manageable sizes and retrieved directly by the worker and cached in the worker
 
wood burning stoves
 
subject: Handling millions of rows without using arrays