I want t o read a word document and print the contents to console. But, when I do it as if I were doing it with text files, it displays some weird characters. Can anyone throw some light on how I should proceed?
Thanks in advance.
posted 8 years ago
DOC is a binary file format; you can't treat it like you would treat text files. An API that can extract the text from a doc file is Jakarta POI; you can find some usage examples here.