Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Extract the headings from an html file and an xml file

 
JayaSiji Gopal
Ranch Hand
Posts: 303
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have two external systems. both of these contain html files. how do i pull the html headings out of these files and compare them to see if there are any differences?

any ideas??
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could
1) do an HTTP GET request
2) read the HTML until you encounter the <head> element
3) read the HTML into a StringBuffer until until you encounter the </head> element, and
4) construct an XML DOM document with just the <head> element in it
5) compare the DOM with another page's similar DOM using XMLUnit's Diff class.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic