I need to convert a String which has a HTML file but i need to convert that into normal text by removing all html related tags.
Detail:
I got a html file and from one of my java class is reading that entire HTML page and converting it to a single string and now i want all that HTML tags to be removed from that string..
thank you very much in advance.. Hari
This message was edited 1 time. Last update was at by Ulf Dittmer
You have the String with the HTML file and you want to convert it to text?
if we save file as txt then it will save as txt, what is the problem?
You want to remove html tag and extract the only text?
Does your html has images etc.?
Please tell me what you exactly want?
my problem is i got a activity to be run and in which it takes all the mails in my box and put it in database ok...
now there might be chances like people can put the html email, so previously i was just concatinating all the Multipart data in my body and concatinating to string and posting in database.
But in some html cases all the data like html tags are also getting concatinated and saved in db.
so now i just want to remove all the html tags and save only the actual content/information of that...
Harish Ponduri wrote:
so now i just want to remove all the html tags and save only the actual content/information of that...
Well, it depends on how you want it removed. If you don't care too much to be 100% accurate, and may leave parts of some tags behind, then using regular expressions is probably the easiest route.
If you need to be accurate, and your HTML is well formed, then using JAXP (or JAXB if you are using java 6) would work.
If you need to be accurate, and the HTML can be anything, then you'll need a third party parser. Just google for an HTML parser. There are lots of open sources ones.
can you just brief me what is this jaxp and where can i get information on this jaxp
any tutorial for it or any pdf you have?
can you please share some document related to jaxp.