File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other Open Source Projects and the fly likes How to grab data from <div> under <table> by using htmlparser library? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "How to grab data from <div> under <table> by using htmlparser library? " Watch "How to grab data from <div> under <table> by using htmlparser library? " New topic
Author

How to grab data from <div> under <table> by using htmlparser library?

Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Hello everyone.

I am tryig use htmlparser library to grab data (price, and item name) from a webpage which is very like the following one.

<table class="item">
<tr>
<td>
<div class="title">Desktop</div>
</td>
<td>
<div class="price">$1,200</div>
</td>
</tr>
</table>

My code is very complex.

I used two parsers, one for searching <div> tag which class equal to "title", the other one is for searching <div> tag which class equal to "price".
I do not know htmlparser library well. I just start to use it two weeks ago, and I find it is very hard to find any sample about it in google.
Does anyone have any better idea?
Appreciated any help.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

You could help us to help you: how about providing us with a link to the documentation? Or a link to the product's home page?

(You can post URL's with the "URL" button which you will see above the box you post in.)
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Thank you,
This is for a general question, not for one particular case.
I do not have a link for this.


I want to make a small program to grab data from a website like amazon, ebay, or any online shopping website.
I found there is always a <div> or <span> of product info is placed under <table> tag.
I knnow how to get all the <div> tag form one single page.
but I want to know if there is a way to get all <div>s from one particular <table> tag? and what is method for that in htmlparser library?
Thank you
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

Well, if you don't actually have the htmlparser software, then this seems like kind of a pointless question.

I assume you don't have it, otherwise you would have a link to where you downloaded it from. Or did somebody give you a copy? Maybe you could ask them?
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Sorry,
I thought you are asking my target html page I want to parser.

so you are askig about htmlparser software?

yes, I do have a link for this,

http://htmlparser.sourceforge.net/

That is the one I am using.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

Sorry, I don't see anything in their "Samples" page there. I suppose the "FilterBuilder" example might be sort of what you're looking for.

However your project is a rather dubious one anyway. All of the sites you mentioned, I'm pretty sure, have terms of use which forbid people from accessing the sites via computer programs. You should at least check out the terms of use on each site before you start trying to scrape its pages.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18

The target website I listed is for a example. To explain what is my target HTML page will look like.

I just want to know is there method in HTMLparser library could recognize DIV tag which might under a TABLE tag. Does any one has any experience about grab data from a HTML page like this. That is all.

Thank you for quick respond and your help.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

I'm sure you could do what you asked with that HTMLParser project. I doubt that there's a specific method to do it, though. Although now that I look through the API documentation, there's a Parser class and it has a method named "extractAllNodesThatMatch". Possibly -- actually quite likely now that I look at the docs more -- you could use that.

But yeah, the project really doesn't have much in the way of useful examples. You're pretty much left to trawl through the docs and figure it out for yourself. Although looking at their examples wouldn't do you any harm.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
I tried to use extractAllNodesThatMatch method. first, I use this method to get all <table> tag from webpage, it return nodelist, I call it tablelist.
after that, I tried to use this method to get <span> from tablelist.
But somehow, it does not work.

For now, I use regular expression method to get the content I want. but I am still looking for some simple method to parser html file.
 
jQuery in Action, 2nd edition
 
subject: How to grab data from <div> under <table> by using htmlparser library?
 
Similar Threads
struts: JSP, iterator and display a list as a grid
100% height does not work
Using checkbox on Spring 2.5 MVC ?
Shopping cart help
Number Format Exception