Originally posted by Ulf Dittmer: So the input would be a web page, and the output would be a list of all URLs on that web page?
Mhhh, my initial understanding was that the input would be a website address, and the output would be the URLs of all pages that belong to that site.
To which the answer would have been: not possible in general, not with Java or any other language. The HTTP-protocoll simply doesn't provide the necessary information.
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Joined: Mar 22, 2008
Originally posted by Rob Prime: Like I said, parse the page and filter out the right attributes.
Of course SRC is not the only one. The following could also be used: ACTION (forms) BACKGROUND CODEBASE SRC (images, iframes, etc)