jQuery in Action, 2nd edition*
The moose likes General Computing and the fly likes Screen scrapping(extract data from webpage) in java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "Screen scrapping(extract data from webpage) in java" Watch "Screen scrapping(extract data from webpage) in java" New topic
Author

Screen scrapping(extract data from webpage) in java

muthu bharathi
Ranch Hand

Joined: Dec 10, 2008
Posts: 97
Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41124
    
  45
I'd probably use a library like jWebUnit for downloading the pages, and extracting the relevant parts. Then you can use any XML- or XLS-creating library you like for storing the interesting parts.


Ping & DNS - my free Android networking tools app
muthu bharathi
Ranch Hand

Joined: Dec 10, 2008
Posts: 97
Hi ulf,


Thanks for your quick response.




I was searched in net and i got a one open source tool. it's working fine for "HTTP" only.... i need to scrap the data from "HTTPS"....


Im a new bie.... i tried to write the code using JWEBUNIT. but it's not working... can you give me sample code to write in JWEBUNIT and also i want to know "JWEBUNIT" support "HTTPS", because ineed to extract the data from "HTTPS" also......

Awaiting for your reply......

--
With Thanks
M. Bharathi
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 347
this may be what you need. First result for "jwebunit https" in google.

The site talks about untrusted certificates. So jwebunit may already be trusting a number of certificates from certificate authorities. It might be using the java truststore itself?
muthu bharathi
Ranch Hand

Joined: Dec 10, 2008
Posts: 97
Hi,


Thanks for your response. I have scrap the data from http / https through one opensource web data extractor tool...

But one issue in that tool. i cannot scrap the data from https having session(the page has session). please help me or guide me for this issue. i was searched in net but... i face only failure....


--
with thanks,
M. Bharathi
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38025
    
  22
Too difficult a question for beginners. Moving.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41124
    
  45
But one issue in that tool. i cannot scrap the data from https having session(the page has session).

Why not? jWebUnit supports cookie, if that's what's used for the sessions. If the session use URL rewriting, then there's no problem to begin with.
muthu bharathi
Ranch Hand

Joined: Dec 10, 2008
Posts: 97
Hi,


i was tried a lot. but i cant get the output. please give me sample source....




--
regds,
M. Bharathi
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41124
    
  45
What have you tried? Post a relevant code excerpt. What, exactly, happened when you ran it?
nhu dinh thuan
Greenhorn

Joined: Jul 20, 2004
Posts: 3
muthu bharathi wrote:Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi


View here, screenshot http://binhgiang.sourceforge.net/xmlalbum/screenshots.html

and download free version web data extrator http://binhgiang.sourceforge.net/site/download.jsp.

VDer build from java html parser, download from http://sourceforge.net/projects/binhgiang/files/htmlparser/HTMLParser2_Build9.zip/download. Is is open source.
muthu bharathi
Ranch Hand

Joined: Dec 10, 2008
Posts: 97
Hi,


Thanks for your valuable guidance....

One thing i need to be known is it scrap the https data's........



--
With Thanks
M. bharathi
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Screen scrapping(extract data from webpage) in java
 
Similar Threads
html screen scrapping
Screen Scrapping Problem!
Groovy class for HTML Scrapping
Screen scrapping(extract data from webpage) in java
Webpage Scrapping in Java