aspose file tools*
The moose likes General Computing and the fly likes Is there any possible way to parse an html page when it Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "Is there any possible way to parse an html page when it "completed"?" Watch "Is there any possible way to parse an html page when it "completed"?" New topic
Author

Is there any possible way to parse an html page when it "completed"?

Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Hello everyone.

I am not sure there is a way can get data from a html file after it runs completed.

Most html file come with <script>,

when I try to parser it in java, I can only get the data before it runs its script.

but html file could be very different, after it runs its script completed.

my question is how to get the data when it finished.

I have been serched it in the past two weeks, and could not find any.

I take javascript as an example.

Here is my test.html for testing



And here is my Java file




Surly, it cannot get both of text from my test html file.
anyone has a better idea?

Does anyone have any better idea?
Appreciated any help.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42264
    
  64
It will definitely not work if you read the HTML from the file system like that, because there is no JavaScript interpreter that would execute the script.

You could try a library like HtmlUnit -which has excellent support for JavaScript-, although I'm not sure if that works with files from the file system (as opposed to those accessed via HTTP). I'm also not sure if that would give you the altered HTML after running the scripts, but it's worth checking out.


Ping & DNS - my free Android networking tools app
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
HtmlUnit.

I will give it a try. thank you.

I have another question about online html parser.

some html file, when I open html source from a web browser, I can see the data in there.
But when I read this html page from java. It can not reach the data.
I saved this html file, and read it as local file, then I can read the data from there.

Why is that?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42264
    
  64
Can you give an example of such a page? Which Java code are you using to read it - the one you posted above?
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Yes, I use HtmlCleaner for parsering HTML file.

now, I do not have a example. I will find a example. Please check my poster later.

Thank you for your help. Appreciated.
I have been searching this for a looong time.
Have not clue at the all.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Here is my example

This is a link for searching iMac in eBay.
eBay.com

When I look at its HTML Scource, I find <div class="s2 distLoc"> content all the information about seller.
here is my code.



But with this code, I cannot reach the data I expected.
the <div class="s2 distLoc"> is totally blank until execute one javascript function in "Customise view" -> "Seller information".

Can you tell how to do that?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42264
    
  64
Try HtmlUnit. HtmlCleaner has no concept of JavaScript.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
I never use HtmlUnit before. It seems If I want to execute javascript, I need to work out what is javascript function name.
Is that right?

like eBay page, How could I find out which javascript I should execute?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42264
    
  64
Oh, the JavaScript is only executed if you click on something? Then you need to dig into the HTML/JavaScript to see what "Customise view" -> "Seller information" does, and try to emulate that with the HtmlUnit API. Sounds tricky and brittle at best.

With a company like Ebay I would assume that they offer an API that gives you the information you're looking for in a much more direct way, though.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
It is when I click "Customise view", there is a menu jump out, then I need to click "seller information" checkbox to display all the seller information.

I will try to find which one work for that, I may take question back to here if I am failed to find.

Thank you so much.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61413
    
  67

ebay has a very rich API for getting all sorts of information. The way that you are going about this is like trying to drive a car standing on your head in the seat while wearing a straightjacket.


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18651
    
    8

Also, from your eBay User Agreement:

You agree that you will not use any robot, spider, scraper, or other automated means to access our sites for any purpose without our express handwritten permission.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Bear Bibeault wrote:ebay has a very rich API for getting all sorts of information. The way that you are going about this is like trying to drive a car standing on your head in the seat while wearing a straightjacket.


LOL.. Life would be much less fun, if we do not do things we think itcould not be done.
Cameron ax
Greenhorn

Joined: Oct 04, 2012
Posts: 18
Paul Clapham wrote:Also, from your eBay User Agreement:

You agree that you will not use any robot, spider, scraper, or other automated means to access our sites for any purpose without our express handwritten permission.


Surely, I won't do any thing against eBay policy. It just hapens it is a good study case.
 
Don't get me started about those stupid light bulbs.
 
subject: Is there any possible way to parse an html page when it "completed"?