Hi, i want to read file contents from .doc file. i want to read the file byte by byte for searching word. Microsoft adds some extra bytes when a new doc file is created, i dont want to read those extra byts. Currently I am reading the file using FileInputSream. I have tried using FileReader as well. but it doesn't work. I dont want to use POI API because i need to read .txt and .rtf files as well. does anybody knows how to avoid reading those extra bytes.
Why don't you just use a different implementation of "com.foo.bar.FileContentReader" based on what kind of a file the user selects? The easiest way to do this would be to match the filename suffix (".doc", ".txt", etc.), or you could read the first N bytes and figure out whether that matches the Word file format (if it does, then continue reading from the beginning with "com.foo.bar.WordFileContentReader", if it doesn't, continue with "com.foo.bar.TextFileContentReader").
i m sorry if i m late in giving the reply....well Apache's POI project is the best if u want to read MS Word files.
what exactly u want to do.....only want to read files or u r doing the convertion into other formats too..if this is the problem then "CambridgeDoc's Java Doc Library"is quite useful....check'em out....
any questions regarding format convertions.......feel free to ask. good luck.
Joined: Aug 21, 2004
Hi, Thank you for ur suggestion. i just want to read the files and check whether the word entered is present in that file or not. i'll try this one. Thanks once again.
subject: How to read Microsoft doc file without reading extra bytes added by Microsoft.