I want t o read a word document and print the contents to console. But, when I do it as if I were doing it with text files, it displays some weird characters. Can anyone throw some light on how I should proceed?
Thanks in advance.
Dinakar Kasturi.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35232
7
posted
0
DOC is a binary file format; you can't treat it like you would treat text files. An API that can extract the text from a DOC file is Jakarta POI; you can find some usage examples here.