Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Java in General and the fly likes Reading contents from microsoft word document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Reading contents from microsoft word document" Watch "Reading contents from microsoft word document" New topic
Author

Reading contents from microsoft word document

Amirtharaj Chinnaraj
Ranch Hand

Joined: Sep 28, 2006
Posts: 241
hi guys

my need is to read the microsoft word document

and print it in the console while doing that

i faced a problem . iam getting some ascii characters that are

not present in the document. when i do the same thing with

text (*.txt) file things are fine
jeroen dijkmeijer
Ranch Hand

Joined: Sep 26, 2003
Posts: 131
I think you should have a look at the POI (apache) framework.
regards,
Jeroen.
Ulf Dittmer
Rancher

Joined: Mar 22, 2005
Posts: 42958
    
  73
.doc files contain many characters that are not part of the actual text (e.g., layout information and such). If you just want the text, use POI as suggested. This page explains how it can be used for text extraction.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading contents from microsoft word document
 
It's not a secret anymore!