HI,
I am working on a project that collects
alot of data from web(saved in .80 files). So, I have any possible form of data in these pages: HTML, XML, HTML inside XML, CSS. and also pages like this one:
http://www.usustatesman.com/se/the-statesman-rss-1.544390
I need to remove ALL the tags(ANYkind) from the content of these pages ang get pure texts.
is there any parser that can do this for me? or any other way to remove these tags?
Thank you so much!