• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

handle large list in memory

 
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
if I have a large (half million objects) in a List and manipulate this list data, it causes heap size problem (let's put aside increasing heap size option as that's not what I am interested in this thread). Can I take this approach --- I break this large List into 100 small sublist list_1, list_2, ...list_100.

after I finish processing list_1, i do

list_1 = null;

similarly after I finish list_2, I do

list_2 = null;

loading the 1 million objects in the List eats too much memory. That's why I think about this appraoch. Now, if list_1, list_2,..list_99 do not get garbage collected then this does not help at all. If this helps to get list_1, list_2, garbage collected immediately, then it probably helps.

Any thought ? thanks.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So it's the manipulating of these list elements which causes you to run out of memory, not just trying to get them all into memory in the first place? In that case that question is far too imprecise to answer. We have no idea why you're running out of memory so it's impossible to guess whether that strategy would fix the problem. My guess is that it wouldn't, but that could be wrong too. If you posted some details and showed us the code which caused the memory overflow, that might elicit some suggestions.
 
ben oliver
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
let me rephrase it this way --- If the size of the entire list is too big, I don't want to create and load it in memory and process it. Can I create a sub list, process it; then create the 2nd sub list and process it, and then create 3rd sublist and process it.. Does this help ? I am not sure because there is no garbage collection guaranteed for each sublist, right ?
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The lists and the contents in the sublists will get garbage collected when they are no longer referenced, when the next GC cycle occurs - and there will be at least one GC cycle before you get the memory error. Will this prevent the memory error? That depends on the processing. If the processing doesn't produce any lasting memory use then probably. But if the processing either generates new Objects that maintain their references, or pass the Objects stored in the sublists to other locations where their references are maintained - then no, your strategy would not work.
 
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would try and take a multi threaded approach to this problem if possible. Maybe you could make a thread pool of size 10 or so and hand off 100 element chunks of the list to worker threads to process. I just pick 10 as some arbitrary number...I have no idea what this list actually contains or what you are trying to do with the data. You would have to find the right balance of chunk size and number of threads to yield you the best performance.
 
Steve Luke
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think the use of Threads has any bearing on the memory footprint the original poster has described. It is probably a little off track to suggest that as a solution. It may make the processing faster, but it may even make the memory footprint larger (since multiple sublists may be processed at once, instead of allowing the memory from one sublist to be 'cleared' before the next one is access). But like has been said - without more knowledge there is not a whole lot we can suggest that isn't guess work.
 
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Alex Hurtt wrote:I would try and take a multi threaded approach to this problem if possible. Maybe you could make a thread pool of size 10 or so and hand off 100 element chunks of the list to worker threads to process. I just pick 10 as some arbitrary number...I have no idea what this list actually contains or what you are trying to do with the data. You would have to find the right balance of chunk size and number of threads to yield you the best performance.



Just to add to what Alex said, if you have any IO operation to read the huge data to memory then the IO operation should be done in a single thread, having multiple threads doing IO operation will be slow.
however, you can use few threads to process those data read into the memory..

 
Alex Hurtt
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Steve Luke wrote:I don't think the use of Threads has any bearing on the memory footprint the original poster has described. It is probably a little off track to suggest that as a solution. It may make the processing faster, but it may even make the memory footprint larger (since multiple sublists may be processed at once, instead of allowing the memory from one sublist to be 'cleared' before the next one is access). But like has been said - without more knowledge there is not a whole lot we can suggest that isn't guess work.



Well it may or may not. But maybe what I'm envisioning isn't clear enough so let me clarify...You have some huge master list of stuff from some source. My initial assumption is that this list of stuff is NOT initially in memory. It might be in a file or in a db or something or who knows. Or maybe it is in memory and that is not a problem but the processing somehow is what is taking up the additional memory. Lets say you have 10 worker threads processing 100 element 'chunks' of this master list at a time you'd never be working with more than 1000 elements of the list at any given time. If my understanding of the posters proposed solution was correct, he was proposing to divide the list size by some arbitrary number...say n...and take the quotient of that division and create that many n sized lists. I'm not really sure I see what that buys either. Also I'm not sure why you'd need to create a new 'sub' list reference after each iteration...can't you just reassign the existing reference to a new 'sub' list object?

Not that I disagree with what you've said here...it could be true. I think we just need to know more specifics to be able to help find the 'right' solution.
 
Steve Luke
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I agree Alex... I don't mean to look like I was criticizing. My only point was there wasn't enough information about what the scenario is to make real suggestion.


...and create that many n sized lists. I'm not really sure I see what that buys either. Also I'm not sure why you'd need to create a new 'sub' list reference after each iteration...can't you just reassign the existing reference to a new 'sub' list object?



True, you don't need to make more than one variable / references. Re-using the same one is just as good, and will be easier to manage...

 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ben,

as others have pointed out, splitting the list into sublist should help, assuming that you do not load the whole list into memory and then try to split it into sublists. If you load, process and dispose one sublist at a time it could work.

If you process the items in your list one by one, you might be able to successfully employ the producer-consumer pattern. It might be much cleaner and more straightforward than managing the sublists, and easily pararellizable too.
 
Alex Hurtt
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Steve Luke wrote:I agree Alex... I don't mean to look like I was criticizing. My only point was there wasn't enough information about what the scenario is to make real suggestion.


...and create that many n sized lists. I'm not really sure I see what that buys either. Also I'm not sure why you'd need to create a new 'sub' list reference after each iteration...can't you just reassign the existing reference to a new 'sub' list object?



True, you don't need to make more than one variable / references. Re-using the same one is just as good, and will be easier to manage...



I guess the fact that he was doing this was what led me to think it would be more efficient to use a fixed number of threads instead of creating 'n' number of list objects for processing. I couldn't see why he was thinking of doing that unless he was thinking of processing them in parallel.
 
ben oliver
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I doubt that, if I use the sublist approach (without loading the entire big list i the memory first_, it would really help. JVM has no guarantee it will garbage clooect the sublist even it is no longer referenced because you don't know when the garbage collection cycle occurs. So, maybe when the code proceeds with the other sublists, the existing sublists have not been collected when it reaches the final sublist..

Isn't this possible ?
 
Martin Vashko
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

ben oliver wrote:I doubt that, if I use the sublist approach (without loading the entire big list i the memory first_, it would really help. JVM has no guarantee it will garbage clooect the sublist even it is no longer referenced because you don't know when the garbage collection cycle occurs. So, maybe when the code proceeds with the other sublists, the existing sublists have not been collected when it reaches the final sublist..

Isn't this possible ?


No, it isn't. (Assuming you do not errorneously keep references to sublists you want to have garbage collected.)

As has been already pointed out in this thread, garbage collecting is guaranteed to happen when new memory cannot be allocated. OutOfMemoryException will happen only when the allocation cannot be made even after garbage collecting.

There may be various garbage collecting strategies that may depend on many factors. This is why it cannot be predicted when GC occurs, and you do not have any means to guarantee that eg. certain part of code will not be interrupted by GC and therefore take longer than expected to execute. This is why the uncertainity of GC is stressed. You generally do not know when it happens, and particularly do not know for sure it will happen after System.gc() call.

Think about it. All Java programs allocate and deallocate objects all the time. If GC was not run when memory is tight, all but the simplest Java applications would mysteriously fail with OutOfMemoryException. You're not the only one who has ever allocated memory in Java . All you really need to take care of is not to keep references to lists and other objects you want GC to deallocate.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic