File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes Extract the headings from an html file and an xml file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extract the headings from an html file and an xml file" Watch "Extract the headings from an html file and an xml file" New topic
Author

Extract the headings from an html file and an xml file

JayaSiji Gopal
Ranch Hand

Joined: Sep 27, 2004
Posts: 303
Hi,

I have two external systems. both of these contain html files. how do i pull the html headings out of these files and compare them to see if there are any differences?

any ideas??


SCJP 1.4, SCWCD 1.4<br /> <br />Thanks in advance!<br />Jayashree.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
You could
1) do an HTTP GET request
2) read the HTML until you encounter the <head> element
3) read the HTML into a StringBuffer until until you encounter the </head> element, and
4) construct an XML DOM document with just the <head> element in it
5) compare the DOM with another page's similar DOM using XMLUnit's Diff class.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
 
Don't get me started about those stupid light bulbs.
 
subject: Extract the headings from an html file and an xml file