• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

crawler

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can web crawler be able to extract html source of web url that only run on cookie enable brower?
 
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Some can, some can't, so it depends on which one in particular you're asking about. If you haven't settled on a specific one, start here: http://java-source.net/open-source/crawlers
 
Sunil Baboo
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Thanks for your quick response. Well, its java spider. I know this is working great for other web link but I am not sure it can work for web link that required cookie enable browser.


Thank you


 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is "java spider" - some software you wrote? Downloaded? Bought?
 
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Lester

Well, its java spider.


Sunil's referring to JSpider in the link you provided.
 
Sunil Baboo
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually, my concern is:
I had application that crawl on web link. If login required on web link, it login as well. But there are some sites which required cookie enabled browser. I attempt to login on this site, i always return back with login page. In my application, crawler first request login page, retrive cookie information from this header and request another page after login attaching those server send cookie on it. It has been working for most of the site but not site which ask cookie enable browser. Am i missing something? so that my application crawl on those page after login.

Thanks in advanced.
 
Bartender
Posts: 2856
10
Firefox Browser Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The sites that need cookies enabled, store sessions in cookies (usually) and do not support URL based session IDs or URL encoding.
So if your cookie functionality isn't on, you dont get a session and hence the login page.
Hope that helps
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Sunil,
Ideally web applications should take care of handling sessions using URL Rewriting if cookies are not enabled. But, it seems in your case the sites you are accessing are not doing that. You can get through the sites you are unable to login by enabling cookies in your browser.
 
Sunil Baboo
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Amit,

yes you are right, the sites that need cookie enabled store session in cookie. so crawler fails to crawl on site that demand cookie enable browser. Am I right?

I know that crawler has no cookie enabled. Can we enable
cookie on crawler by code. Something like web broser embedded crawler?
 
Wink, wink, nudge, nudge, say no more, it's a tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic