• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

using a crawler to invoke a google search & analyse google results

 
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am really into java and software agents and wanted to focus my java coding on that. I wanted to code a crawler that could accept a search topic, invoke a Google search and analyze results. Based on a java crawler template I got online I edited the code and set up my own custom link analysis algorithms. My problem is the bit where the app interface accepts user text, then passing it to the Google engine and retrieving the Google results (I am designing it to be a stand-alone app or plugin).

Thanks
 
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What, specifically, are you having a problem with? What is or is not working as expected?
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not sure how (from a stand-alone app) input text can be passed to the Google engine and the results retrieved (the crawler will go through through the retrieved links). I am trying to avoid using a browser
 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use the HttpClient library to pass the search query to Google and retrieve the result. You'll have to spend some time reverse-engineering the format of the search URL, though; it's not as simple as (e.g.) http://www.google.com/?q=jebediah+springfield.

You might also want to check if Google has a proper REST API for doing searches; for low search volumes it would probably be free to use.
 
Marshal
Posts: 69831
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
… and welcome to the Ranch
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am using the httpclient (4.x) library and I am trying to get it to return the search results but I keep getting an error.



And the error I receive is;

Fatal transport error: www.google.com
java.net.UnknownHostException: www.google.com
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressesFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:278)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:162)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1066)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1044)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1035)
at HttpClientTutorial.main(HttpClientTutorial.java:47)

when i try the url(http://www.google.com/search?q=batman&btnG=Google+Search&aq=f&oq=) in a browser, it displays the results the directly. I understand enough of the error to know that it is an issue with the source of the request but cant pin down what exactly.

Thanks

 
Sheriff
Posts: 21972
106
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is your browser using a proxy? If so, you must use the same proxy with HttpClient as well.
 
Never trust an airline that limits their passengers to one carry on iguana. Put this tiny ad in your shoe:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic