File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Word 2003 to XML via XSLT Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Word 2003 to XML via XSLT" Watch "Word 2003 to XML via XSLT" New topic

Word 2003 to XML via XSLT

Eric Pascarello

Joined: Nov 08, 2001
Posts: 15385
Has anyone here tried to do convert an Word 2003 document into XML via XSLT? I may have a requirement in the near future that would require me to grab data from a word doc and put it into a database. If it could be done with an XSLT, it would make my life easier in the future to change.

I am finding poor documentation on the process. Hopefully someone has some insight into this matter.

Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

I just did that a couple of days ago. First I saved the document as XML (I don't believe that the .doc format is XML itself). Then I eyeballed the XML to find the bits I wanted to extract, and messed around with the XSLT until it extracted only those bits.

Okay, that's not very professional. A quick hack, but it did what I needed. But I know Microsoft has schemas for the XML version of Word 2003. Have you seen this page yet? Looks like a good place to start.
Madhav Lakkapragada
Ranch Hand

Joined: Jun 03, 2000
Posts: 5040
Glad to note that something is "free' from M$.

- m

Take a Minute, Donate an Hour, Change a Life
Prabha Enjeti

Joined: Oct 30, 2005
Posts: 2

I have a similar task to convert a MS Word document to an XML.
The word document has images and graphs.Someone suggested me to use Apache POI Framework for this task.Can some one please suggest me how to go about it?

I agree. Here's the link:
subject: Word 2003 to XML via XSLT
It's not a secret anymore!