aspose file tools*
The moose likes Java in General and the fly likes MS Word *.doc to *.HTML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "MS Word *.doc to *.HTML" Watch "MS Word *.doc to *.HTML" New topic
Author

MS Word *.doc to *.HTML

dar
Ranch Hand

Joined: Nov 08, 2001
Posts: 45
Hi, ranchers
I want to convert AB.DOC file to AB.HTML file in java. Does anyone has a good suggestion?
Best Regards,
chison
karl koch
Ranch Hand

Joined: May 25, 2001
Posts: 388
hi
check out the apache POI project (here)
k
dar
Ranch Hand

Joined: Nov 08, 2001
Posts: 45
Than You for reply.
POI is for only W97.
But I need for Word2000, it's better if could convert all MS Word format, Word97, Word2000, WordXP.

Originally posted by karl koch:
hi
check out the apache POI project (here)
k
Michael Crutcher
Ranch Hand

Joined: Feb 18, 2002
Posts: 48
Well I think you're out of luck. Word is in a propriety binary project. You could join the POI team and contribute to the effort but there is no quick fix. Figuring out the intricacies of a propriety file format is no trivial matter, until you or POI invest time into the new file formats you really don't have an option.
I don't know of any projects that are further along than POI.
Michael
Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
Actually, it's gonna be a headache, BUT...you're in luck.You can use JaWin or Jacob ( http://www.danadler.com/jacob/index.html ) to work with Java and COM. And in COM, Microsoft has an API to get a word doc (including word 2000) as an XML document. You can do that through COM in Java. Hang on, I think I still have some code dealing with word from java (it's not entirely complete, but should help you get started). It uses Jacob.

That should all work (assuming you've got jacob.jar in your classpath)! All I ask is that when you finish the work I started here, please post it on this site so the rest of us can benefit too! Thanks! Let me know if this is a help!
Robert
[Edited to break up a couple really long lines that were distorting the rest of the page - Jim]
[ January 24, 2003: Message edited by: Jim Yingst ]
Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
One more thing:
You can figure out how to turn word docs into XML by looking at the VB source code to this free program:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnword2k/html/odc_expwordtoxml.asp
Don't worry, VB is a cinch to figure out. They have the code for download. Just convert to Java! (All I was basically doing was creating Java wrappers for their COM counterpoints. It wasn't necessary but ALOT easier to work with!)
My Code, after making the classes, looks like this:

Nice clean Java code. The same thing using regular Java-COM code is not so clean and nice as it has dispatch and pointer calls all over the place.
Anyways, good luck and let us know how it goes!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: MS Word *.doc to *.HTML