This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
That's hard. Structured documents (like MS Office and PDF) do not lend themselves easily to being read. The Apache POI library has classes to extract text from DOC/DOCX files, but the library does not run on Android out of the box (due to the reliance on AWT classes that do not exist on Android). Maybe you can strip down the library to just those text extraction classes, and have an easier time porting those to Android.
Similarly, PDFBox can extract text from PDFs; I'm not sure if it runs on Android.