File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes How convert Word Doc to HTML/WML/XML? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How convert Word Doc to HTML/WML/XML?" Watch "How convert Word Doc to HTML/WML/XML?" New topic

How convert Word Doc to HTML/WML/XML?

Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
Does anyone know how (using Java on Linux AND/OR on Windows) to convert a Word doc to any of the following: HTML, XML, WML?
Barry Andrews
Ranch Hand

Joined: Sep 05, 2000
Posts: 523

Check out HDF (Horrible Document Format) in the jakarta POI project.
Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
I guess HDF was never tested (even by the person who wrote it?) It always fails on a "NegativeArrayException" so I submitted the bug and I guess they couldn't solve it because they accepted it. I can't figure out the problem but I'll say this:
1. Apparently somehow (to anyone who understands this) in the LVLF, the cbGrpprlChpx is -1. How is this possible? Is it a value for null? If so, how should that null be handled?
2. HDF is some of the worst code I've seen in a while (no offense meant). I was surprised because it's apache controlled, but there's NO exception handling ANYWHERE. The program either completes (which it doesn't do) or throws an exception and never cleans anything up (nor closes streams) but exits. It's ugly.
3. There's NO explanation of what they're doing at any point in the code. I had to do A LOT of research to figure out what they were even doing.
4. If anyone has the solution let me know! I need this to work! the problem only occurs when you have multi-level lists (like bullets) in the document.
I agree. Here's the link:
subject: How convert Word Doc to HTML/WML/XML?
Similar Threads
convert wml to xhtml
convert msword doc to pdf
Java Html parser
XML document inside XML
how to convert html to wml