This week's book giveaway is in the Java 8 forum.
We're giving away four copies of Java 8 in Action and have Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes Extract the headings from an html file and an xml file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extract the headings from an html file and an xml file" Watch "Extract the headings from an html file and an xml file" New topic
Author

Extract the headings from an html file and an xml file

JayaSiji Gopal
Ranch Hand

Joined: Sep 27, 2004
Posts: 303
Hi,

I have two external systems. both of these contain html files. how do i pull the html headings out of these files and compare them to see if there are any differences?

any ideas??


SCJP 1.4, SCWCD 1.4<br /> <br />Thanks in advance!<br />Jayashree.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
You could
1) do an HTTP GET request
2) read the HTML until you encounter the <head> element
3) read the HTML into a StringBuffer until until you encounter the </head> element, and
4) construct an XML DOM document with just the <head> element in it
5) compare the DOM with another page's similar DOM using XMLUnit's Diff class.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extract the headings from an html file and an xml file
 
Similar Threads
parsing Html
Latest From Sun 1.4
How to get Java code into a Word document
creating HTML Editor in Swing
Creation of .xslt file for .owl file