• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

get xml's cdata using saxparser

 
tasos georgiou
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi.i'm building an rss reader and i want to get the text-content of cdata.I've managed to get whatever is contained inside cdata but i'm only intersted in the text without for example links,src images, greater-than/less-than signs.I've tried to do that to some point using regex, but that becomes complex.Is there another way to do something like that, or regex is the only solution?
 
Paul Clapham
Sheriff
Posts: 21111
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Tasos, welcome to the Ranch!

Am I correct in guessing that what you get out of the CDATA is some kind of HTML data? And that it isn't necessarily well-formed HTML?

If so, then regex isn't going to be very useful in extracting the text and discarding the markup. Regular expressions don't work well with languages with recursive grammars like HTML. So what I suggest is that you should get an HTML parser and parse the contents of the string. Then extract only the text nodes from the parsed HTML and discard everything else.

 
tasos georgiou
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:Hi Tasos, welcome to the Ranch!

Am I correct in guessing that what you get out of the CDATA is some kind of HTML data? And that it isn't necessarily well-formed HTML?

If so, then regex isn't going to be very useful in extracting the text and discarding the markup. Regular expressions don't work well with languages with recursive grammars like HTML. So what I suggest is that you should get an HTML parser and parse the contents of the string. Then extract only the text nodes from the parsed HTML and discard everything else.



I've managed to do it for now with regex besides my cdata comes from an rss and there are only a couple of links and pics.Thanks for your help.
 
Paul Clapham
Sheriff
Posts: 21111
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, if you're only getting simple and predictable data then you can make a regex work. But later if you find the data is not as predictable, or it is more complex, you may find that you can't make a working regex any more.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic