I have a requirement of parsing an HTML page and pulling out a text from a specific HTML tag. This is the first time I am working on this. I am able to read the Tags and their Id's and also the complete text on the page but have no idea how to read the text enclosed in a specific tag. I have written my code below. I want to grab the text within <td id="dept1">Sales</td> only i.e., "Sales" in this case. Please help me.
-- Mazhar
[ October 09, 2008: Message edited by: Mazhar Ismail ]
[ October 09, 2008: Message edited by: Mazhar Ismail ]
[ October 09, 2008: Message edited by: Mazhar Ismail ] [ October 09, 2008: Message edited by: Mazhar Ismail ]
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35237
7
posted
0
I would guess that you need to override the "handleText" method.
Just remember that handleText is not required to handle all the text in a node in one go. Use StringBuilder to combine it; you can finish it in the handleEndTag method.
Originally posted by Rene Larsen: A HTML page is basically a XML document
If you're lucky. HTML allows nesting of tags, missing end tags, missing quotes around attributes, and much more that is not allowed in XML.ent.
That's why XHTML is invented. It's basically HTML that truely is XML. For instance, it requires <br> to be ended: <br />. [ October 10, 2008: Message edited by: Rob Prime ]
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to
run our stuff on 16 servers instead of 3.