Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

to list all links

 
vas vas
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I am trying to list all the links ( or results) produced by a search engine in a file.

I mean, when u query google about " java " ,i want all the results to be saved to a file.

Can anyone help me in ,how to proceed.i.e how to capture the links from the webpage.By links, I mean only relevant links.( or the links in green color in google).

I have to implement this in java.What are the relevant API?

Thanking You,
Suman Tedla.
 
Jeroen Wenting
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You'd have to write a parser to extract the information you want from the html.
It's possible but since html wasn't meant to be used like this and is often messy it can be tough going.
Maybe Google offers a means to get search results in a machine friendly XML format in which case this becomes easier.
 
vas vas
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

But what I need is not only to extract links from google,but also from few other search engines.

Please, Can anyone help me???
 
pascal betz
Ranch Hand
Posts: 547
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
google offers the google api. it is verry easy to use and would avoid the parsing of html.
you need to register with google, you are limited to 1000 queries a day and can not develop commercial products without asking google first (google api fag)


pascal
 
Jeroen Wenting
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you need more data, you need to write more code to extract it.
No way around it I'm afraid. There's no magic bullet.

And better make sure you have permission to do what you intend because it may well be in violation of the terms of use of the sites which could lead to nasty legal trouble!
 
Azriel Abramovich
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Google have an API which you can use to get the results... but mind you, this is limited for X queries per day and not for commercial use.

Other than that, I think these sites make a living of people looking at the pages so that is why it is not so easy to get the "real" links from them.

I am talking from experience ;-)

Azriel
 
vas vas
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Thanks for ur suggestions.I have succeeded in extracting links from google and yahoo.I have used, java.util.regex; package.

Thank you so much.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic