File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Html to Java Object Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Html to Java Object" Watch "Html to Java Object" New topic

Html to Java Object

Mark Spritzler

Joined: Feb 05, 2001
Posts: 17276

So, I am sure there has to be an easy way and it must be possible, but I want to scrap a small piece of information from a web page. The page is html obviously, and with XPath I should be able to get to the exact element in that page, and then I want the value in there, and automatically create a simple Java Value Object to hold that piece of information.

I want to use Jaxb 2 with an annotation on the Value Object class, so I can just call an unmarshaller to get the data from the HTML to the Java object.

Does anyone have a good link to an example code like this?



Perfect World Programming, LLC - iOS Apps
How to Ask Questions the Smart Way FAQ
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63529

Be aware that HTML is not XML. If your source is XHTML, then you'll have an easier time of things as the markup will be well-formed (if correct).

I haven't had this need in quite some time, but way back when there seemed to be plenty of 3rd-party libraries out there to scrape HTML.

[Asking smart questions] [About Bear] [Books by Bear]
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42959
Not sure if it satisfies your definition of "Java Value Object", but both TagSoup and NekoHTML can make a DOM object out of HTML (after regularizing it). So you would get a Node or Element object to play with.
I agree. Here's the link:
subject: Html to Java Object
It's not a secret anymore!