This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes Extract the headings from an html file and an xml file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extract the headings from an html file and an xml file" Watch "Extract the headings from an html file and an xml file" New topic
Author

Extract the headings from an html file and an xml file

JayaSiji Gopal
Ranch Hand

Joined: Sep 27, 2004
Posts: 303
Hi,

I have two external systems. both of these contain html files. how do i pull the html headings out of these files and compare them to see if there are any differences?

any ideas??


SCJP 1.4, SCWCD 1.4<br /> <br />Thanks in advance!<br />Jayashree.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
You could
1) do an HTTP GET request
2) read the HTML until you encounter the <head> element
3) read the HTML into a StringBuffer until until you encounter the </head> element, and
4) construct an XML DOM document with just the <head> element in it
5) compare the DOM with another page's similar DOM using XMLUnit's Diff class.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extract the headings from an html file and an xml file
 
Similar Threads
Creation of .xslt file for .owl file
creating HTML Editor in Swing
How to get Java code into a Word document
Latest From Sun 1.4
parsing Html