wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes Extract the headings from an html file and an xml file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extract the headings from an html file and an xml file" Watch "Extract the headings from an html file and an xml file" New topic
Author

Extract the headings from an html file and an xml file

JayaSiji Gopal
Ranch Hand

Joined: Sep 27, 2004
Posts: 303
Hi,

I have two external systems. both of these contain html files. how do i pull the html headings out of these files and compare them to see if there are any differences?

any ideas??


SCJP 1.4, SCWCD 1.4<br /> <br />Thanks in advance!<br />Jayashree.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
You could
1) do an HTTP GET request
2) read the HTML until you encounter the <head> element
3) read the HTML into a StringBuffer until until you encounter the </head> element, and
4) construct an XML DOM document with just the <head> element in it
5) compare the DOM with another page's similar DOM using XMLUnit's Diff class.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
 
Consider Paul's rocket mass heater.
 
subject: Extract the headings from an html file and an xml file
 
Similar Threads
Creation of .xslt file for .owl file
creating HTML Editor in Swing
How to get Java code into a Word document
Latest From Sun 1.4
parsing Html