File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Writing a spider Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Writing a spider" Watch "Writing a spider" New topic

Writing a spider

Dale DeMott
Ranch Hand

Joined: Nov 02, 2000
Posts: 515
Okay... so I have an application that needs to be spidered. The issues at hand are such
1) needs to be able to fill out a form field before spidering
2) needs to handle javascript
3) needs to start at a specified location after the form field has been filled out
I was thinking about writing this using HTTPUnit. Has anyone written one using this? Does anyone have any other ideas or programs that I might be able to use. Any ideas would be appreciated.
Dale DeMott

By failing to prepare, you are preparing to fail.<br />Benjamin Franklin (1706 - 1790)
Cindy Glass
"The Hood"

Joined: Sep 29, 2000
Posts: 8521
I guess that we are not into creepy crawling critters in Intermediate .
Let's move this to Advanced and see if they can offer some advice.

"JavaRanch, where the deer and the Certified play" - David O'Meara
William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13037
What is implied in that "needs to handle Javascript" ???
Do you mean it needs to parse out forms, etc that have Javascript mixed in to the HTML or that it has to execute JavaScript.
I just used HttpClient (from the Jakarta Commons toolkit) to create a load tester that faked responding to a form. I had to use JTidy to get a parsed DOM representation of the page because the HTML was not well formed.
I agree. Here's the link:
subject: Writing a spider
jQuery in Action, 3rd edition