The moose likes Other Open Source Projects and the fly likes Reading from .doc or .docx file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Reading from .doc or .docx file" Watch "Reading from .doc or .docx file" New topic

Reading from .doc or .docx file

Sawan Mishra
Ranch Hand

Joined: Oct 24, 2013
Posts: 47

I understand the above program but my problem is reading from
Ms-word(.doc file or .docx file) and writing result to console gives
unexpected output.
How can I read from .doc file and write content to console correctly??

thanks in advance
with regards
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 15081

Microsoft Word .doc and .docx files are not simple text files that you can read this way with a FileReader.

You'll need a library that understands the specific MS word file formats, such as Apache POI.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
Those are structured file formats which contain much else besides the plain text. You need to use a library like Apache POI (which can extract the plain text, and also provides an API to get at the structured content).
Paweł Baczyński

Joined: Apr 18, 2013
Posts: 1583

Don't read doc as a regular text file!

OCPJP 6, 7, 8, OCMJD 6
Tony Docherty

Joined: Aug 07, 2007
Posts: 2836
You need to use a library that understands the format that doc and docx files are saved in. Fortunately there are free libraries available such as POI which can be found at
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46340
Too difficult for “beginnign”: moving.
I agree. Here's the link:
subject: Reading from .doc or .docx file
It's not a secret anymore!