I am trying to create a java application that will interact with websites. For example my application may have to navigate to a certain website, extract the text on the page, compute results, fill up a form and submit. Can anyone tell me what is the best way to go about making such a system? Would i have to create teh components that speak http or https or do apis exist?
I came across HTMLunit api which is primarily used to test and java browsers like lobo and jrex that seem to have an api too. How do these compare?
The premier library for this is jWebUnit, IMO. No need to deal with HTTP or HTML on a low level, that's all been done before. Don't be put off that it's billed a "unit testing tool" - it works just fine as a general-purpose web access library.
Thanks for the replies. I will check them out and get back.
One more thing here, is anyone aware of a similar api that might support interactions with applets as well? The reason i ask is because a large number of sites i will need to perform these functions on might have the content as applets. I know extraction of any text from an applet is going to be tough, but is it even possible? what about interactions on the applet like button clicks?
That's tough. From within the same JVM, the java.awt.Robot class could be used to control a GUI to a certain extent, but from a different JVM that would be much harder. Going out on a limb, I'd say it's impossible to do in the general case where you don't know the applet beforehand. And even if the applet GUI is known, extracting text that was painted on the screen amounts to OCR; I foresee numerous hard problems that way.
Joined: Jan 06, 2010
hmm.. ok here is another idea, ideally all data being displayed by the applet too is coming in through a socket connection made by the browser right. So if i made the browser (or used an api that is a mock browser) I would have access to the data flowing in and out of the applet. And if that is the case, this data would follow a definite pattern and can be extracted, unless the data is encrypted.
Is this even possible and has someone attempted this?