Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

MS Word *.doc to *.HTML

 
dar
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, ranchers
I want to convert AB.doc file to AB.HTML file in java. Does anyone has a good suggestion?
Best Regards,
chison
 
karl koch
Ranch Hand
Posts: 388
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi
check out the apache POI project (here)
k
 
dar
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Than You for reply.
POI is for only W97.
But I need for Word2000, it's better if could convert all MS Word format, Word97, Word2000, WordXP.

Originally posted by karl koch:
hi
check out the apache POI project (here)
k
 
Michael Crutcher
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well I think you're out of luck. Word is in a propriety binary project. You could join the POI team and contribute to the effort but there is no quick fix. Figuring out the intricacies of a propriety file format is no trivial matter, until you or POI invest time into the new file formats you really don't have an option.
I don't know of any projects that are further along than POI.
Michael
 
Robert Paris
Ranch Hand
Posts: 585
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, it's gonna be a headache, BUT...you're in luck.You can use JaWin or Jacob ( http://www.danadler.com/jacob/index.html ) to work with Java and COM. And in COM, Microsoft has an API to get a word doc (including word 2000) as an XML document. You can do that through COM in Java. Hang on, I think I still have some code dealing with word from java (it's not entirely complete, but should help you get started). It uses Jacob.

That should all work (assuming you've got jacob.jar in your classpath)! All I ask is that when you finish the work I started here, please post it on this site so the rest of us can benefit too! Thanks! Let me know if this is a help!
Robert
[Edited to break up a couple really long lines that were distorting the rest of the page - Jim]
[ January 24, 2003: Message edited by: Jim Yingst ]
 
Robert Paris
Ranch Hand
Posts: 585
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One more thing:
You can figure out how to turn word docs into XML by looking at the VB source code to this free program:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnword2k/html/odc_expwordtoxml.asp
Don't worry, VB is a cinch to figure out. They have the code for download. Just convert to Java! (All I was basically doing was creating Java wrappers for their COM counterpoints. It wasn't necessary but ALOT easier to work with!)
My Code, after making the classes, looks like this:

Nice clean Java code. The same thing using regular Java-COM code is not so clean and nice as it has dispatch and pointer calls all over the place.
Anyways, good luck and let us know how it goes!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic