Hello!
My situation: on a site written in Java there're articles containing HTML. On the page A only the announcement of article is given (maximum 500 symbols). Ir size of article ir bigger than 500 symbols, it is cut down to 500 symbols, and after that goes link "...more" to page B, where visitor can read the whole page.
The problem is that articles contain HTML. For example, if cut text looks like "<table><tr><td><b> Hello, click <a href="...">here", it will spoil the whole page
I'm now searching for a tool, which could "optimize" HTML code - kicking off unclosed tags. I looked up jTidy (Java implementation of W3C HTMLTidy), but it seems too complicated - and at last, I just didn't find the feature I need in jTidy javadocs!
Maybe, you could give me an advice?
P.S. I'm thinking about alternative decision - to simply clean up ALL tags from text, using regular expression. But this will be a solution, which I hope I won't be forced to use :/