aspose file tools*
The moose likes Web Services and the fly likes Parsing the CDATA section in XML using XML Pull Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Web Services
Bookmark "Parsing the CDATA section in XML using XML Pull Parser" Watch "Parsing the CDATA section in XML using XML Pull Parser" New topic
Author

Parsing the CDATA section in XML using XML Pull Parser

sn omen
Greenhorn

Joined: May 23, 2013
Posts: 6
Sample XML



With the below code I was able to retrieve <title>, <published> and <author> values within the <entry> tag.



From the <content> tag how can I retrieve the "href" value within the <a> and the text value(The BJP is likely to anoint Narendra Modi.....) from the <p><span> tag.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18903
    
    8

It looks like the contents of that CDATA section is a fragment of HTML. Not an HTML document, but a bunch of HTML tags. So the first step is to get the contents of the CDATA section into a string (using your XML parser). The second step is to parse that String using an HTML parser -- no XML parser will be able to deal with that. Make sure you choose an HTML parser which is capable of dealing with "tag soup".
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 544
    
    3
Before suggesting how to do it, I would say the provider can at any time supply the payload as texte rather than cdsect without violating much, hence, a solution should take care of that freedom.

Further to simply life, I would simply use a regex as a way to pick up the href hoping the href would be normal enough.

This is what you can do.
>
sn omen
Greenhorn

Joined: May 23, 2013
Posts: 6
With the above code I'm getting the href value within <a> tag. How can I retrieve the text within the <span> tag??
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Parsing the CDATA section in XML using XML Pull Parser