This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Performance and the fly likes Performance of POI-HSSF Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Performance of POI-HSSF" Watch "Performance of POI-HSSF" New topic
Author

Performance of POI-HSSF

Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 9962
    
163

We just started using POI-HSSF - Java API To Access Microsoft Excel Format Files. We have xls files which contain huge amounts of data. While testing out the APIs provided by POI-HSSF, we observed that the time taken to create a HSSFWorkbook out of a stream, it takes around 6 seconds(the workbook had 1 sheet and 20000 records). Here's the code that we used to load the xls:



Is there any performance metric available for POI-HSSF. Is there any way that this performance can be improved. Our xls file contents start from 20000 records and might even run into 200000 records.

P.S.: If this topic is more appropriate in the Other Open Source Projects forum, please move it there. I thought since this is a performance issue, i better ask it here.


[My Blog] [JavaRanch Journal]
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

I doubt there's a "silver bullet" that will magically make it faster. You might try wrapping a BufferedInputStream around the FileInputStream before passing it to the POIFSFileSystem constructor (by the way, are you making sure your file gets closed somehow?) but I'd be surprised if that helped dramatically.

The only other choice would be to dig into the HSSF source code and see if you can find optimization opportunities. The first step would be to use a profiling tool to find out where the time is actually going.


[Jess in Action][AskingGoodQuestions]
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 9962
    
163

by the way, are you making sure your file gets closed somehow?

Yes, once the xls is loaded into HSSFWorkbook, the file is closed.

You might try wrapping a BufferedInputStream around the FileInputStream before passing it to the POIFSFileSystem constructor

Will certainly, give this a try.

The only other choice would be to dig into the HSSF source code and see if you can find optimization opportunities. The first step would be to use a profiling tool to find out where the time is actually going.


Yes, thats the last option we will be left with.

Thanks Ernest Friedman-Hill, for the inputs. Will give the BufferedInputStream approach a try.
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
The standard HSSFWorkbook API may need large amount of memory.

There another API in POI which allows reading (not writing) xls files without storing whole file (some representation of it) in memory, which allows to reduce memory usage.

It is described here (Event API):
http://jakarta.apache.org/poi/hssf/how-to.html

[ August 11, 2006: Message edited by: Vlado Zajac ]
[ August 11, 2006: Message edited by: Vlado Zajac ]
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 9962
    
163

Thanks Vlado Zajac, for pointing out the event model documentation. Will give it a try.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Performance of POI-HSSF
 
Similar Threads
Socket closed exception while downloading heavy files
Finding file type XLS or XLSX using POI APIs.
FileWriter Being Slow
hassle with jakarta poi
Converting Old POI Reading .xls Files to POI 3.7 Issues