How to get the string offsets of a tag found from Document Object?
posted 6 years ago
I'm currently using Mozilla Html Parser to take a string representation of an html response and parses it into a document object. At that point i'm traversing this DOM trying to find a tag. Once i find a specific target tag lets say with a specified attribute i'd like to get the string offsets of where that was found. I need this because i have the html in a textpane that i highlight that found target area using the offsets?
before i was using regex to find the tag and just passed the matcher offsets found to my highlighter class.
However, i think using html parser is better way just having difficulty finding the match in the corresponding html string?
If you can't figure this stuff out for yourself, then sorry, you have a big headache.
I haven't ever heard of this parser. Normally I don't mind reading the API documentation of open-source packages on behalf of people on forums, but those guys don't have a link to the API on their site. (Downloading the whole thing and extracting the API is beyond my curiosity level.) And they don't have a forum or a mailing list as far as I can see. One page of their documentation says their parser is "compatible with SAX parsers", whatever that means. Normally with a SAX parser you would attach an org.xml.sax.Locator object to your content handler and use that, but I have no idea how that relates to your code.
Edit: after some more googling I found that they do have a forum.
But you could try TagSoup, it has a SAX interface.