Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Html to Java Object

 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, I am sure there has to be an easy way and it must be possible, but I want to scrap a small piece of information from a web page. The page is html obviously, and with XPath I should be able to get to the exact element in that page, and then I want the value in there, and automatically create a simple Java Value Object to hold that piece of information.

I want to use Jaxb 2 with an annotation on the Value Object class, so I can just call an unmarshaller to get the data from the HTML to the Java object.

Does anyone have a good link to an example code like this?

Thanks

Mark
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64699
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Be aware that HTML is not XML. If your source is XHTML, then you'll have an easier time of things as the markup will be well-formed (if correct).

I haven't had this need in quite some time, but way back when there seemed to be plenty of 3rd-party libraries out there to scrape HTML.
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure if it satisfies your definition of "Java Value Object", but both TagSoup and NekoHTML can make a DOM object out of HTML (after regularizing it). So you would get a Node or Element object to play with.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic