• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

to list all links

 
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I am trying to list all the links ( or results) produced by a search engine in a file.

I mean, when u query google about " java " ,i want all the results to be saved to a file.

Can anyone help me in ,how to proceed.i.e how to capture the links from the webpage.By links, I mean only relevant links.( or the links in green color in google).

I have to implement this in java.What are the relevant API?

Thanking You,
Suman Tedla.
 
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You'd have to write a parser to extract the information you want from the html.
It's possible but since html wasn't meant to be used like this and is often messy it can be tough going.
Maybe Google offers a means to get search results in a machine friendly XML format in which case this becomes easier.
 
vas vas
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

But what I need is not only to extract links from google,but also from few other search engines.

Please, Can anyone help me???
 
Ranch Hand
Posts: 547
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
google offers the google api. it is verry easy to use and would avoid the parsing of html.
you need to register with google, you are limited to 1000 queries a day and can not develop commercial products without asking google first (google api fag)


pascal
 
Jeroen Wenting
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you need more data, you need to write more code to extract it.
No way around it I'm afraid. There's no magic bullet.

And better make sure you have permission to do what you intend because it may well be in violation of the terms of use of the sites which could lead to nasty legal trouble!
 
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Google have an API which you can use to get the results... but mind you, this is limited for X queries per day and not for commercial use.

Other than that, I think these sites make a living of people looking at the pages so that is why it is not so easy to get the "real" links from them.

I am talking from experience ;-)

Azriel
 
vas vas
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Thanks for ur suggestions.I have succeeded in extracting links from google and yahoo.I have used, java.util.regex; package.

Thank you so much.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic