File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes html to xml (wordml) converstion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "html to xml (wordml) converstion" Watch "html to xml (wordml) converstion" New topic
Author

html to xml (wordml) converstion

karthik venkatesan
Greenhorn

Joined: Dec 31, 2004
Posts: 26
Hi,

I need to convert html tags to wordml format withour losing html formatting( like bold, italic, etc..). Is there any way to accomplish this? I mean suppose I have a html input like the below,

<html>
<head>
</html>
<body>
<B> This is bold text</B>
</body>
</html>

The output wordml should be able to print the text "This is bold text" in bold.



The output should be...
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Sure, there's a way to do it. You just convert the <B> element to its equivalent in WordML. Generally speaking I would convert the HTML to well-formed XML (using JTidy or TagSoup or some similar product), then use XSLT to transform that into WordML.

However I don't know WordML so I can't tell you how to use it to mark text as bold. But as I said, I am sure it can be done. So if you actually meant to ask how to do that in WordML, then sorry, I don't know.
karthik venkatesan
Greenhorn

Joined: Dec 31, 2004
Posts: 26
Thanks for the reply Paul. I got your point. But the problem is, we dont know how the users going to form the input using html tags(for example they may have nested tables, image link... ).

So it may not be possible to write a generic xsl template to covert an unknown input with html tags. But I came across in net that, microsoft has released a generic template to convert wordml to html. But not the vice-versa, which is required for me.

So I would appreciate if anybody came across the same issue and the solution for the issue (if any).
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Ah, I see, you are looking for a generic XSLT to convert HTML into WordML. And you don't want to write it yourself because it's a large and complicated task and surely somebody must have done it already? I would definitely agree with that.

But I don't exactly see a solution when I search the web for one. If you look at this article, for example, you can see the complexities involved in converting HTML to XSL-FO, and exactly the same would be required in converting to WordML.
Neerav Narielwala
Ranch Hand

Joined: Dec 08, 2006
Posts: 106
I have been using wordML as a content management tool and using xslt to
convert it to XHTML via Cocoon. Is this of use to you? If so I can post an
example.


<a href="http://www.java-tips.org/java-tutorials/tutorials/" target="_blank" rel="nofollow">Java Tutorials</a> | <a href="http://www.planet-java.org" target="_blank" rel="nofollow">Java Weblog</a> | <a href="http://computer-engineering.science-tips.org" target="_blank" rel="nofollow">Computing Articles</a>
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: html to xml (wordml) converstion