wood burning stoves 2.0*
The moose likes Java in General and the fly likes to list all links Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "to list all links" Watch "to list all links" New topic
Author

to list all links

vas vas
Greenhorn

Joined: Dec 13, 2004
Posts: 19
Hi All,

I am trying to list all the links ( or results) produced by a search engine in a file.

I mean, when u query google about " java " ,i want all the results to be saved to a file.

Can anyone help me in ,how to proceed.i.e how to capture the links from the webpage.By links, I mean only relevant links.( or the links in green color in google).

I have to implement this in java.What are the relevant API?

Thanking You,
Suman Tedla.
Jeroen Wenting
Ranch Hand

Joined: Oct 12, 2000
Posts: 5093
You'd have to write a parser to extract the information you want from the html.
It's possible but since html wasn't meant to be used like this and is often messy it can be tough going.
Maybe Google offers a means to get search results in a machine friendly XML format in which case this becomes easier.


42
vas vas
Greenhorn

Joined: Dec 13, 2004
Posts: 19
Hi,

But what I need is not only to extract links from google,but also from few other search engines.

Please, Can anyone help me???
pascal betz
Ranch Hand

Joined: Jun 19, 2001
Posts: 547
google offers the google api. it is verry easy to use and would avoid the parsing of html.
you need to register with google, you are limited to 1000 queries a day and can not develop commercial products without asking google first (google api fag)


pascal
Jeroen Wenting
Ranch Hand

Joined: Oct 12, 2000
Posts: 5093
If you need more data, you need to write more code to extract it.
No way around it I'm afraid. There's no magic bullet.

And better make sure you have permission to do what you intend because it may well be in violation of the terms of use of the sites which could lead to nasty legal trouble!
Azriel Abramovich
Ranch Hand

Joined: Dec 10, 2003
Posts: 38
Google have an API which you can use to get the results... but mind you, this is limited for X queries per day and not for commercial use.

Other than that, I think these sites make a living of people looking at the pages so that is why it is not so easy to get the "real" links from them.

I am talking from experience ;-)

Azriel


Don't be shy, be quiet!
vas vas
Greenhorn

Joined: Dec 13, 2004
Posts: 19
Hi,

Thanks for ur suggestions.I have succeeded in extracting links from google and yahoo.I have used, java.util.regex; package.

Thank you so much.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: to list all links
 
Similar Threads
JavaRanch stewardship and SEO
Exporting ArrayList of beans to xls, csv, txt, dbf
How does search engines work?
Manipulating Google Reader account from java
Self-modified links or I'm feeling lucky!