aspose file tools*
The moose likes Other Open Source Projects and the fly likes POI: OutOfMemoryError on reading big excel file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "POI: OutOfMemoryError on reading big excel file" Watch "POI: OutOfMemoryError on reading big excel file" New topic
Author

POI: OutOfMemoryError on reading big excel file

Renjith Panikar
Greenhorn

Joined: Nov 06, 2012
Posts: 28

I am getting an OutOfMemoryError while trying to read a excel file(.xlsm) of size ~2Mb.
I am running this application on a websphere server. Is there any way i could read this file with out changing any server configurations?

Have tried to use this approach, but the red line gave me OutOfMemoryError.

OPCPackage pkg = OPCPackage.open(new File("sample.xlsm"));
XSSFWorkbook wb_template;
wb_template = new XSSFWorkbook(
pkg
);

SXSSFWorkbook wb = new SXSSFWorkbook(wb_template);
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9427
    
    2

Is increasing server heap space an option? If yes, try it!


SCJP 1.4, SCWCD 1.4 - Hints for you, Certified Scrum Master
Did a rm -R / to find out that I lost my entire Linux installation!
Renjith Panikar
Greenhorn

Joined: Nov 06, 2012
Posts: 28

I can do it for local deployment(in my machine), but dont have control over the server to which it will be deployed after development.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42289
    
  64
From the documentation I gather that SXSSF is meant more for creating documents, not reading them: http://poi.apache.org/spreadsheet/how-to.html#sxssf

A low-memory approach to reading would be the XSSF event API: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

2 MB doesn't sound all that much, though. What are the JVM memory settings of the server?


Ping & DNS - my free Android networking tools app
Renjith Panikar
Greenhorn

Joined: Nov 06, 2012
Posts: 28

Not sure about the current heap size. It will be the default of websphere 7.
Even increasing the heap size as given below dint helped me.
Initial heap size = 256 MB
Maximum heap size = 2048 MB


Planning to go with http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api
Renjith Panikar
Greenhorn

Joined: Nov 06, 2012
Posts: 28

What is relId of a sheet?
I couldn't understand from apache doc.
I tried with both sheetName & Sheet number. Both dint worked out.

--apache doc
public java.io.InputStream getSheet(java.lang.String relId)
throws java.io.IOException,
InvalidFormatException
Returns an InputStream to read the contents of the specified Sheet.
Parameters:
relId - The relationId of the sheet, from a r:id on the workbook
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42289
    
  64
Not sure. Maybe open the file manually (xlsx files are just zip files containing various XML documents) and see if somewhere in there you find a sheet attribute that looks like "r:id". (And, of course, see what that value is in various spreadsheets so as to figure out the logic behind it.)
Renjith Panikar
Greenhorn

Joined: Nov 06, 2012
Posts: 28

Sorry for the very very late update on this thread, but better late than never.
I have resolved this issue. Hope if i share it, it will be helpful for some one else.

Thanks Ulf Dittmer, followign reply helped me to resolve it
Maybe open the file manually (xlsx files are just zip files containing various XML documents) and see if somewhere in there you find a sheet attribute that looks like "r:id"


In detail,

Rename the .xlsx file as .zip. It will give you a zip file. Open it and you will see a folder called "xl". Inside "xl" folder concentrate on the following things.

worksheets folder : contains the xml representation of each sheet.
sharedStrings.xml : Common place for storing all the string values across sheets.
styles.xml : common place for holding the styles(color, formula...) applied to cells.
workbook.xml : Contains the information about all sheets in the .xlsx file


Now start the journey from workbook.xm. Inside this file all sheets are covered in <sheets> tag.
sheets are represented as <sheet> Eg: <sheet name="General" sheetId="15" r:id="rId1"/>
Take r:id, that will help you to access the General.xml which represents the General sheet of .xlsx file.

Once you have r:id, http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api will help you to read the XML file and get its contents.

SharedStringsTable sharedStringsTable = reader.getSharedStringsTable(); // using this we can access sharedStrings.xml
StylesTable stylesTable = reader.getStylesTable(); // using this we can access styles.xml

Since we are streaming the XML, we dont have to worry about memory consumption.

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42289
    
  64
Thanks for reporting back how the issue was solved; that should help others.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: POI: OutOfMemoryError on reading big excel file