• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing the CDATA section in XML using XML Pull Parser

 
sn omen
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sample XML



With the below code I was able to retrieve <title>, <published> and <author> values within the <entry> tag.



From the <content> tag how can I retrieve the "href" value within the <a> and the text value(The BJP is likely to anoint Narendra Modi.....) from the <p><span> tag.
 
Paul Clapham
Sheriff
Posts: 21111
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It looks like the contents of that CDATA section is a fragment of HTML. Not an HTML document, but a bunch of HTML tags. So the first step is to get the contents of the CDATA section into a string (using your XML parser). The second step is to parse that String using an HTML parser -- no XML parser will be able to deal with that. Make sure you choose an HTML parser which is capable of dealing with "tag soup".
 
g tsuji
Ranch Hand
Posts: 666
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Before suggesting how to do it, I would say the provider can at any time supply the payload as texte rather than cdsect without violating much, hence, a solution should take care of that freedom.

Further to simply life, I would simply use a regex as a way to pick up the href hoping the href would be normal enough.

This is what you can do.
>
 
sn omen
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
With the above code I'm getting the href value within <a> tag. How can I retrieve the text within the <span> tag??
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic