File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Reading contents from microsoft word document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Reading contents from microsoft word document" Watch "Reading contents from microsoft word document" New topic

Reading contents from microsoft word document

Amirtharaj Chinnaraj
Ranch Hand

Joined: Sep 28, 2006
Posts: 230
hi guys

my need is to read the microsoft word document

and print it in the console while doing that

i faced a problem . iam getting some ascii characters that are

not present in the document. when i do the same thing with

text (*.txt) file things are fine
jeroen dijkmeijer
Ranch Hand

Joined: Sep 26, 2003
Posts: 131
I think you should have a look at the POI (apache) framework.
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 39548
.doc files contain many characters that are not part of the actual text (e.g., layout information and such). If you just want the text, use POI as suggested. This page explains how it can be used for text extraction.

Ping & DNS - updated with new look and Ping home screen widget
I agree. Here's the link:
subject: Reading contents from microsoft word document
Similar Threads
question on FileInputStream
Generating a word document
Writing a Microsoft Word Doc using java
Generate Microsoft Word Document using JSP
Download to word and making word non-editable