This week's book giveaway is in the HTML Pages with CSS and JavaScript forum.
We're giving away four copies of Testing JavaScript Applications and have Lucas da Costa on-line!
See this thread for details.
Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Screen scrapping(extract data from webpage) in java

 
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi
 
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd probably use a library like jWebUnit for downloading the pages, and extracting the relevant parts. Then you can use any XML- or XLS-creating library you like for storing the interesting parts.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi ulf,


Thanks for your quick response.




I was searched in net and i got a one open source tool. it's working fine for "HTTP" only.... i need to scrap the data from "HTTPS"....


Im a new bie.... i tried to write the code using JWEBUNIT. but it's not working... can you give me sample code to write in JWEBUNIT and also i want to know "JWEBUNIT" support "HTTPS", because ineed to extract the data from "HTTPS" also......

Awaiting for your reply......

--
With Thanks
M. Bharathi
 
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
this may be what you need. First result for "jwebunit https" in google.

The site talks about untrusted certificates. So jwebunit may already be trusting a number of certificates from certificate authorities. It might be using the java truststore itself?
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,


Thanks for your response. I have scrap the data from http / https through one opensource web data extractor tool...

But one issue in that tool. i cannot scrap the data from https having session(the page has session). please help me or guide me for this issue. i was searched in net but... i face only failure....


--
with thanks,
M. Bharathi
 
Marshal
Posts: 69889
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Too difficult a question for beginners. Moving.
 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

But one issue in that tool. i cannot scrap the data from https having session(the page has session).


Why not? jWebUnit supports cookie, if that's what's used for the sessions. If the session use URL rewriting, then there's no problem to begin with.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,


i was tried a lot. but i cant get the output. please give me sample source....




--
regds,
M. Bharathi
 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What have you tried? Post a relevant code excerpt. What, exactly, happened when you ran it?
 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

muthu bharathi wrote:Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi



View here, screenshot http://binhgiang.sourceforge.net/xmlalbum/screenshots.html

and download free version web data extrator http://binhgiang.sourceforge.net/site/download.jsp.

VDer build from java html parser, download from http://sourceforge.net/projects/binhgiang/files/htmlparser/HTMLParser2_Build9.zip/download. Is is open source.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,


Thanks for your valuable guidance....

One thing i need to be known is it scrap the https data's........



--
With Thanks
M. bharathi
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    Bookmark Topic Watch Topic
  • New Topic