• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

OutOfMemory issue while creating XSSFWorkbook instance.

 
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

As per business functionality we need to read multiple excel files(both .xls and .xlsx format) at different locations in a multi thread environment. Each thread is responsible for reading a file. In order to test the performance, we have created 2 file sets in both .xls and .xlsx formats. One file set has just 20 row data while other file set contains 300,000 row data. We are able successfully read both files in .xls formats and load data into the table. Even for 20 row data .xlsx file, our source code is working fine.

But when the execution flow starts reading .xlsx file, application server is terminated abruptly. When I started tracing down the issue, I have been facing a strange issue while creating XSSFWorkbook instance.Refer the code snippet below:


Output :
Process XLSX file EXCEL_300K.xlsx start.
Process XLSX file EXCEL.xlsx start.
Created OPCPackage instance.
Created OPCPackage instance.
Created XSSFWorkbook instance.
Created SXSSFWorkbook instance.
Process XLSX file EXCEL.xlsx end.

For larger file set the execution hangs at causing heap space issue. Please do help me to fix this issue.

Thanks in advance.

Thanks,

Sankar.

 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Have you tried increasing the available heap space by starting java with the -Xmx switch.

For example To set a maximum heap of 2Gb you would use for 500Mb for
 
Shankara Sharma
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Currently i have set the maximum limit that jvm can use as 1.5 GB. I have allocated maximum limit that JVM can use as 1.5 GB . I guess memory is not an issue because I am able to read 300,000 rows in .xls format.
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Shankara Sharma wrote: I guess memory is not an issue because I am able to read 300,000 rows in .xls format.


In your first post you said you "had a heap space issue" so my initial guess was it was a memory issue. BTW just because you can read 300,000 rows doesn't necessarily mean you can create a new workbook, it may be reading the 300,00 rows has used all available memory, maybe you have a memory leak so after reading so many workbooks it then falls over etc etc.

Please can you post the exact error and stack trace so we can see what is actually happening.
 
Shankara Sharma
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually in our application we have a mechanism to track memory used when we run a job. So when I tried to read 300,000 rows in .xls file, I observed that after some point of time memory usage has been stabilized. But that's not the case while reading .xlsx file. The memory used by the job increased by time and at some point it crossed maximum memory limit set to JVM leading to server crash and OutOfMemory issue.

FYI. I was trying to read one large volume .xlsx file per job run.
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok so that is suggesting some form of resource leak. Make sure you are properly releasing the worksheets/rows/cells you have read in and no longer need.

This is also pointing to an issue that is related to the library you are using to read in the excel sheet (from the name XSSFWorkbook I'm guessing Apache POI) rather than a Java issue so I'll copy this thread to the Open Source Projects forum where you may get some additional help.
 
Shankara Sharma
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The problem is, even before getting sheet/row/cell related information the program hangs at the below point(You can also refer the code snippet below).

I agree with you Tony. Maybe the issue is with the library itself. I am using Apache POI -3.10 library. I started a new thread in another Java forum OutOfMemory .
I appreciate your efforts Tony to help me in resolving the issue
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually I was suggesting the resource leak was before you try to open this workbook ie in previous workbooks you have opened. BTW What size is the workbook it fails on and have you tried opening an empty workbook at this point?
 
Shankara Sharma
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have created 6 different workbooks of with differing sizes(50K, 100K, 150K,200K, 250K and 300K). I was importing one workbook per job and after the job run we delete the file.I am pretty much sure that there are no resource leaks(handling file input stream in finally block). The import mechanism we used worked for 50K to 200K records.Post that I am facing OOM issue. More over the size of the file with 300K records is around 11 MB.
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's not just the input stream you have to worry about (and not releasing a few streams is unlikely to give an OME unless they have massive buffers), it's all the objects that hold all the data that has been read in. It's a long time since I used POI so can't remember if you have to specify close it down after you've finished using it but at the very least your code should release all references to POI objects so they can be garbage collected.
I suggest you run a memory profiler (or frequently print out the values of the current memory state) to see how your code uses and releases memory as it s running - see java.lang.Runtime in particular the freeMemory(), maxMemory() and totalMemory() methods.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
An XSSF document requires a lot of memory, so if you try to load a large document you can run out of memory.

I seem to remember that there's a streaming API for Apache POI's Excel support but that's all I remember about it. I'd suggest you look into that.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic