File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes building dom tree from html file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "building dom tree from html file" Watch "building dom tree from html file" New topic
Author

building dom tree from html file

Frank Piorko
Greenhorn

Joined: Apr 30, 2001
Posts: 2
Hi all,
I have the task to build a dom tree from an html file.
Concerning this I have two Questions.
1. Knows everyone a good way to build a dom tree from a
html file? ( html is not wellformed -> DOM Parser )
2. Knows everyone a good api, which can do this?
Thanks for your help.
Frank Piorko
Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
Frank - anything that is not a well-formed XML document is not an XML document. You will first have to think about making it well-formed. Any parser will error out if you try to form a malformed document.


Open Group Certified Distinguished IT Architect. Open Group Certified Master IT Architect. Sun Certified Architect (SCEA).
Holger Prause
Ranch Hand

Joined: Oct 09, 2000
Posts: 47
Yeah - i also search for such a solution, i know html is not werllformed , but there must be some custom parser out there building a dom tree from html.

Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
Why not tweak the HTML and make it well-formed??
Remember - a malformed XML document isn't an XML document in the first place. So parsing has no meaning in that context!
Frank Piorko
Greenhorn

Joined: Apr 30, 2001
Posts: 2
I cannot make the html file wellformed by hand.
The amount of html files is to big. The application
gets every some days many html files from other programmers,
who are not familar with the xml/html problem.
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Frank, as Ajith said, you can convert your HTML to XHTML (well-formed HTML). You do not need to make it "bu hand", just search for "Converting HTML to XHTML" on the Internet, and you'll find something like this: http://www.vbxml.com/xhtml/articles/html_to_xhtml/default.asp
Or you can check this site: http://www.xmlsoftware.com/convert/
W4F looks good.
or HEX on http://www.xmlsoftware.com/parsers/

[This message has been edited by Mapraputa Is (edited April 30, 2001).]


Uncontrolled vocabularies
"I try my best to make *all* my posts nice, even when I feel upset" -- Philippe Maquet
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: building dom tree from html file
 
Similar Threads
Exporting in memory DOM tree to HTML
DOM parser
receiving nullpointer while adding an attribute
Need for XPATH/XSLT
copy xml string into a dom tree