| Author |
How to read Microsoft doc file without reading extra bytes added by Microsoft.
|
Kranti Datrange
Greenhorn
Joined: Aug 21, 2004
Posts: 3
|
|
Hi, i want to read file contents from .doc file. i want to read the file byte by byte for searching word. Microsoft adds some extra bytes when a new doc file is created, i dont want to read those extra byts. Currently I am reading the file using FileInputSream. I have tried using FileReader as well. but it doesn't work. I dont want to use POI API because i need to read .txt and .rtf files as well. does anybody knows how to avoid reading those extra bytes.
|
 |
Jessica Sant
Sheriff
Joined: Oct 17, 2001
Posts: 4313
|
|
|
moving this to Advanced Java. Please post replies there.
|
 |
Lasse Koskela
author
Sheriff
Joined: Jan 23, 2002
Posts: 11962
|
|
|
Why don't you just use a different implementation of "com.foo.bar.FileContentReader" based on what kind of a file the user selects? The easiest way to do this would be to match the filename suffix (".doc", ".txt", etc.), or you could read the first N bytes and figure out whether that matches the Word file format (if it does, then continue reading from the beginning with "com.foo.bar.WordFileContentReader", if it doesn't, continue with "com.foo.bar.TextFileContentReader").
|
Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
|
 |
Ashish Vegaraju
Ranch Hand
Joined: Aug 19, 2004
Posts: 47
|
|
Hi, i m sorry if i m late in giving the reply....well Apache's POI project is the best if u want to read MS Word files. what exactly u want to do.....only want to read files or u r doing the convertion into other formats too..if this is the problem then "CambridgeDoc's Java Doc Library"is quite useful....check'em out.... any questions regarding format convertions.......feel free to ask. good luck.
|
 |
Kranti Datrange
Greenhorn
Joined: Aug 21, 2004
Posts: 3
|
|
Hi, Thank you for ur suggestion. i just want to read the files and check whether the word entered is present in that file or not. i'll try this one. Thanks once again.
|
 |
 |
|
|
subject: How to read Microsoft doc file without reading extra bytes added by Microsoft.
|
|
|