File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Other Open Source Projects and the fly likes Reading from .doc or .docx file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Head First Android this week in the Android forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Reading from .doc or .docx file" Watch "Reading from .doc or .docx file" New topic

Reading from .doc or .docx file

Sawan Mishra
Ranch Hand

Joined: Oct 24, 2013
Posts: 47

I understand the above program but my problem is reading from
Ms-word(.doc file or .docx file) and writing result to console gives
unexpected output.
How can I read from .doc file and write content to console correctly??

thanks in advance
with regards
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14911

Microsoft Word .doc and .docx files are not simple text files that you can read this way with a FileReader.

You'll need a library that understands the specific MS word file formats, such as Apache POI.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42956
Those are structured file formats which contain much else besides the plain text. You need to use a library like Apache POI (which can extract the plain text, and also provides an API to get at the structured content).
Paweł Baczyński

Joined: Apr 18, 2013
Posts: 1218

Don't read doc as a regular text file!

Tony Docherty

Joined: Aug 07, 2007
Posts: 2703
You need to use a library that understands the format that doc and docx files are saved in. Fortunately there are free libraries available such as POI which can be found at
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 43295
Too difficult for “beginnign”: moving.
Consider Paul's rocket mass heater.
subject: Reading from .doc or .docx file
It's not a secret anymore!