Some can, some can't, so it depends on which one in particular you're asking about. If you haven't settled on a specific one, start here: http://java-source.net/open-source/crawlers
Thanks for your quick response. Well, its java spider. I know this is working great for other web link but I am not sure it can work for web link that required cookie enable browser.
Actually, my concern is:
I had application that crawl on web link. If login required on web link, it login as well. But there are some sites which required cookie enabled browser. I attempt to login on this site, i always return back with login page. In my application, crawler first request login page, retrive cookie information from this header and request another page after login attaching those server send cookie on it. It has been working for most of the site but not site which ask cookie enable browser. Am i missing something? so that my application crawl on those page after login.
The sites that need cookies enabled, store sessions in cookies (usually) and do not support URL based session IDs or URL encoding.
So if your cookie functionality isn't on, you dont get a session and hence the login page.
Hope that helps
Hi Sunil,
Ideally web applications should take care of handling sessions using URL Rewriting if cookies are not enabled. But, it seems in your case the sites you are accessing are not doing that. You can get through the sites you are unable to login by enabling cookies in your browser.
Cheers,
Naren
(OCEEJBD6, SCWCD5, SCDJWS, SCJP1.4 and Oracle SQL 1Z0-051)
yes you are right, the sites that need cookie enabled store session in cookie. so crawler fails to crawl on site that demand cookie enable browser. Am I right?
I know that crawler has no cookie enabled. Can we enable
cookie on crawler by code. Something like web broser embedded crawler?
Wink, wink, nudge, nudge, say no more, it's a tiny ad:
a bit of art, as a gift, that will fit in a stocking