aspose file tools*
The moose likes XML and Related Technologies and the fly likes How to handle nested html tags in XSL? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to handle nested html tags in XSL?" Watch "How to handle nested html tags in XSL?" New topic
Author

How to handle nested html tags in XSL?

kapil Gupta
Ranch Hand

Joined: Dec 17, 2001
Posts: 89
I am trying to convert HTML havinf nested tags to XML using XSLT.
My HTML looks like

The outer span contains an inner span. I get the inner span using foreach in outer one but unable to process the left over text i.e. �outer continues�.

Is there any method to get left over text so I can convert into something like this


Thanks,
Kapil
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18886
    
    8

If you're using normal XSLT methods, then text nodes are copied to the output by default. But you're using the procedural xsl:for-each instead of recursively processing the nodes of the XML tree, so you lose that feature.

If you want to convert span elements to richtext elements and keep the rest of the document unchanged, then start with an identity transformation and add the following template to convert span to richtext:Then the xsl:apply-templates element will automatically copy all the text and attributes below the span element.
kapil Gupta
Ranch Hand

Joined: Dec 17, 2001
Posts: 89
Thanks for your reply Paul.
I tried the code that you had suggested but it resulted in output of form
but I want to close the outer richtext when inner starts and again create a new outer richtext when inner closes. Basically I dont want hierarchy in generated XML.

Thanks,
Kapil
[ February 11, 2007: Message edited by: kapil Gupta ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18886
    
    8

Okay. So you don't just want to convert span elements to richtext elements. Then you will have to express what you do want to do instead.

Requirements that talk about start tags and end tags are hard to implement in XSLT. You need a requirement that talks about elements and nodes. If you have a text node inside a span element, what do you want the result to look like?
kapil Gupta
Ranch Hand

Joined: Dec 17, 2001
Posts: 89
Am sorry for not writing my requirements clearly. Will try to make it clearer by an example. As I had mentioned in my first post that i want to convert html to xml and html is in the form of
.
Now I want it to convert to XML in the form of

Basically converting div to paragraph tag and span to richtext.
The only change in XML is that richtext tag does not contain another richtext like spans. As soon as a nested span comes I want to close the outer richtext and start a new richtext for inner and then open a richtext after inner one closes.
Thanks,
Kapil
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18886
    
    8

As soon as a nested span comes I want to close the outer richtext and start a new richtext for inner and then open a richtext after inner one closes.
As I said, requirements like this are extremely difficult to implement in XSLT. You will not get anywhere until you rephrase that in terms of elements and nodes.

Let's try this as requirements: 1. If a text node is a descendant of one or more span elements, it should be replaced in the output tree by a richtext element containing only that text node. 2. A span element should be replaced in the output tree by its text descendants with requirement 1 applied.

This translates into XSLT asLet's see if that works for a start.
kapil Gupta
Ranch Hand

Joined: Dec 17, 2001
Posts: 89
I was able to generate the XML in required format after applying the transformation as suggested by you.
Thanks for helping me out Paul.
Kapil
kapil Gupta
Ranch Hand

Joined: Dec 17, 2001
Posts: 89
With the addition of new requirements, I have to handle some more html tags like bold, italic and under line. The html is in the form of:

I would like to convert it to the form

Basically richtext starts as soon as a html tag is found.
Am using following XSL to transform it


This code adds only one attribute to richtext tag i.e. it adds italic attribute to richtext but does not add bold attribute which is applied before italic in same span.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to handle nested html tags in XSL?