This week's book giveaway is in the Mac OS forum.
We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line!
See this thread for details.
The moose likes Other Open Source Projects and the fly likes Reading from .doc or .docx file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Reading from .doc or .docx file" Watch "Reading from .doc or .docx file" New topic
Author

Reading from .doc or .docx file

Sawan Mishra
Ranch Hand

Joined: Oct 24, 2013
Posts: 44

I understand the above program but my problem is reading from
Ms-word(.doc file or .docx file) and writing result to console gives
unexpected output.
How can I read from .doc file and write content to console correctly??

thanks in advance
with regards
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14193
    
  20

Microsoft Word .doc and .docx files are not simple text files that you can read this way with a FileReader.

You'll need a library that understands the specific MS Word file formats, such as Apache POI.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42035
    
  64
Those are structured file formats which contain much else besides the plain text. You need to use a library like Apache POI (which can extract the plain text, and also provides an API to get at the structured content).


Ping & DNS - my free Android networking tools app
Paweł Baczyński
Bartender

Joined: Apr 18, 2013
Posts: 1012
    
  16

Don't read doc as a regular text file!
http://stackoverflow.com/questions/7102511/how-read-doc-or-docx-file-in-java


Formely Pawel Pawlowicz
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2302
    
  49
You need to use a library that understands the format that doc and docx files are saved in. Fortunately there are free libraries available such as POI which can be found at http://poi.apache.org/
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39062
    
  23
Too difficult for “beginnign”: moving.
 
GeeCON Prague 2014
 
subject: Reading from .doc or .docx file