This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes Reading word Documents Using JAVA Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Reading word Documents Using JAVA" Watch "Reading word Documents Using JAVA" New topic
Author

Reading word Documents Using JAVA

sushil grover
Greenhorn

Joined: Nov 28, 2008
Posts: 12
Hi All,

I have to read Word document(97-2003), using java. And data is in tabular form as name value pairs. I have already implemented it using
apache POI. But I am facing some issue with that, as when i change the document of word of 2003 in office 2007, POI API throws null pointer exception related to
LittleEndian. When i saw apache site, it is a bug and will be resolved in coming releases.

http://apache-poi.1045710.n5.nabble.com/Unable-to-get-paragraphs-from-test-doc-td2314726.html

My question here is has anybody used Open office for reading word documents. If yes please share some sample code.

Dear Friends, Please provide your opinions.

Thanks,
sushil
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41631
    
  55
The discussion is from 2005; whatever it talks about as being in the future, most likely has happened by now. Why do you think your problem is related to that, and why do you think endianness has anything to do with it? Can you post an SSCCE?


Ping & DNS - my free Android networking tools app
sushil grover
Greenhorn

Joined: Nov 28, 2008
Posts: 12
Hi Ulf,

Thanks for your reply. I have tested it using latest POI version 3.7 as well as 3.8 beta release.

Below is the code snippet

InputStream fis = new FileInputStream(fileName);
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);

Range range = doc.getRange();
for (int i=0; i<range.numParagraphs(); i++){
Paragraph tablePar = range.getParagraph(i); //Here i am getting exception
if (tablePar.isInTable()) {
Table table;
try{
table = range.getTable(tablePar);
}catch(Exception e){
continue;
}
for (int rowIdx=0; rowIdx><table.numRows(); rowIdx++) {
TableRow row = table.getRow(rowIdx);
for (int colIdx=0; colIdx><row.numCells(); colIdx++) {
TableCell cell = row.getCell(colIdx);
}
}
}

I am getting following exception

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 16
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:46)
at org.apache.poi.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:98)
at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.unCompressPAPOperation(ParagraphSprmUncompressor.java:87)
at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:63)
at org.apache.poi.hwpf.model.PAPX.getParagraphProperties(PAPX.java:136)
at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:828)
at com.jp.processor.Docfile_Reading.main(Docfile_Reading.java:62)

Sometimes it is nullpointer exception while uncompressing.


>
 
wood burning stoves
 
subject: Reading word Documents Using JAVA