On your first quexstion, hmmm....if you really don't want any third party tools, then the only thing I can think of is to parse the HTML using SAX and add any missing end elements. What is a start element is missing? Remove the end element would me my answer - HTML ignores it anyway so its not really adding value to your HTML.
Additionally, search this forum for HTML to XML converter. It was discussed before, but I don't have the links handy.
On your second question,
we have an FAQ just for that.
- m
Take a Minute, Donate an Hour, Change a Life
http://www.ashanet.org/workanhour/2006/?r=Javaranch_ML&a=81