This week's book giveaway is in the OCMJEA forum. We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line! See this thread for details.
.doc files contain many characters that are not part of the actual text (e.g., layout information and such). If you just want the text, use POI as suggested. This page explains how it can be used for text extraction.