• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Extracting Text from Word Doc

 
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I downloaded the lastest POI from Apache (poi_3.15), but trying to get the most basic word code working is not straightforward.

Two Examples I tried among lots of web searches:



Generates this error:



Then, looking around, I see that the netbeans/XMLException is deprecated and actually no longer in use or any links on how to refactor existing code.

----

Trying another example...


Gives this error stack:




XSSF seems to deal with Excel, but I couldn't find any examples that worked using XSSF.

--

So, how do I just read a simple Word (XML) 2011 document in Java if the Apache stuff doesn't handle it?

I'm sure it's simple, but, so far, I can't find a single example that works.

Thanks in advance.

- mike

 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You should start by using XWPFDocument and XWPFWordExtractor instead of HWPFDocument and WordExtractor.
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:You should start by using XWPFDocument and XWPFWordExtractor instead of HWPFDocument and WordExtractor.



It looks like you missed the first part of my posting above where I did just what you suggested. Please note that particular error stack.

No simple working example that I can find.

Thanks in advance.

- mike
 
Sheriff
Posts: 7125
184
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmm... this worked fine for me:

The only thing I did differently was create an XWPFWordExtractor object and then close it when I was done.  I also used POI v3.13.
 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Then, looking around, I see that the netbeans/XMLException is deprecated


It's not the netbeans.XMLException class you appear to need it's the xmlbeans.XMLException class which is in the apache xmlbeans bundle which can be downloaded from https://xmlbeans.apache.org.

Not sure why POI needs this and doesn't include or reference it in the download notes but from the stack trace you have shown it would appear it does.
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tony Docherty wrote:

Then, looking around, I see that the netbeans/XMLException is deprecated


It's not the netbeans.XMLException class you appear to need it's the xmlbeans.XMLException class which is in the apache xmlbeans bundle which can be downloaded from https://xmlbeans.apache.org.

Not sure why POI needs this and doesn't include or reference it in the download notes but from the stack trace you have shown it would appear it does.



Yep, as I also noted in my original posting, this download is now extinct with no clear replacement.

If you visit this site: http://attic.apache.org/projects/xmlbeans.html

You'll see what I mean.

Hence, my posting here.

Still baffled.

Thanks for your reply.

- mike
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That page gives a link to an archived version of XmlBeans (http://archive.apache.org/dist/xml/xmlbeans/) which can be downloaded.
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tony Docherty wrote:That page gives a link to an archived version of XmlBeans (http://archive.apache.org/dist/xml/xmlbeans/) which can be downloaded.



Yes, I understand how to download this code, but I was hoping to find working code that doesn't require extinct and mothballed projects.

It doesn't seem there is a single current example on how to read a word document in Java, like the one I posted about (that is, using currently-supported APIs).

Strange.

OK, I guess that's my answer.  Good to know.

Thanks!

- mike
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Knute Snortum wrote:Hmm... this worked fine for me:

The only thing I did differently was create an XWPFWordExtractor object and then close it when I was done.  I also used POI v3.13.



Thanks.

Yeah, I got this to work, finally, also. In my case, I used:

1. org.apache.xmlbeans:xmlbeans:2.6.0 and
2. poi-3 (latest version).

It seems odd that there isn't a currently supported API to read word docs, or at least that's my interpretation of this mini project.

Appreciate all the help.

- mike
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic