This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes Writing a spider Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Writing a spider" Watch "Writing a spider" New topic
Author

Writing a spider

Dale DeMott
Ranch Hand

Joined: Nov 02, 2000
Posts: 515
Okay... so I have an application that needs to be spidered. The issues at hand are such
1) needs to be able to fill out a form field before spidering
2) needs to handle javascript
3) needs to start at a specified location after the form field has been filled out
I was thinking about writing this using HTTPUnit. Has anyone written one using this? Does anyone have any other ideas or programs that I might be able to use. Any ideas would be appreciated.
Regards,
Dale DeMott


By failing to prepare, you are preparing to fail.<br />Benjamin Franklin (1706 - 1790)
Cindy Glass
"The Hood"
Sheriff

Joined: Sep 29, 2000
Posts: 8521
I guess that we are not into creepy crawling critters in Intermediate .
Let's move this to Advanced and see if they can offer some advice.


"JavaRanch, where the deer and the Certified play" - David O'Meara
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
What is implied in that "needs to handle Javascript" ???
Do you mean it needs to parse out forms, etc that have Javascript mixed in to the HTML or that it has to execute JavaScript.
I just used HttpClient (from the Jakarta Commons toolkit) to create a load tester that faked responding to a form. I had to use JTidy to get a parsed DOM representation of the page because the HTML was not well formed.
Bill
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Writing a spider