aspose file tools*
The moose likes Beginning Java and the fly likes Reading URL with Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Reading URL with Java" Watch "Reading URL with Java" New topic
Author

Reading URL with Java

Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 742
URL yahoo = new URL("http://www.yahoo.com/");

This is getting the main url. How can I read the all urls related to this link e.g. http://mail.yahoo.com or http://www.yahoo.com/cc/bb/tt.asp etc

Thanks & best regards
Ben Souther
Sheriff

Joined: Dec 11, 2004
Posts: 13410

You would have to parse the results and generate a new URL for each link found.


Java API J2EE API Servlet Spec JSP Spec How to ask a question... Simple Servlet Examples jsonf
Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 742
Originally posted by Ben Souther:
You would have to parse the results and generate a new URL for each link found.


Can you please explain in code example?

Thanks for your reply
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1836
    
    7

Based on a URL object you can perform an HTTP GET of the HTML document to which the URL points. Once you have the HTML document you would have to parse its body to retrieve all the links (URLs) it contains and convert those to new URL objects. For those URL objects you can do the same and that way you might end up indexing every page of the web site. You just have to be smart about which URLs to retrieve, or the process might take a wee bit of time as it indexes half the pages on the internet...
[ March 28, 2008: Message edited by: Jelle Klap ]

Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 742
Originally posted by Jelle Klap:
Based on a URL object you can perform an HTTP GET of the HTML document to which the URL points. Once you have the HTML document you would have to parse its body to retrieve all the links (URLs) it contains and convert those to new URL objects. For those URL objects you can do the same and that way you might end up indexing every page of the web site. You just have to be smart about which URLs to retrieve, or the process might take a wee bit of time as it indexes half the pages on the internet...

[ March 28, 2008: Message edited by: Jelle Klap ]


Thanks for prompt response
Can you please let me know any example or tutorial

Thanks again
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1836
    
    7

This article should get you where you need to go:

http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
Ben Souther
Sheriff

Joined: Dec 11, 2004
Posts: 13410

Originally posted by Farakh khan:


Can you please explain in code example?

Thanks for your reply


That would take more time than I have right now.

There is a popular unix program called wget that is used to replicate websites for mirroring. Out of curiosity, I googled 'wget java implementation' to see if anyone has written a Java version and found this project.
http://www.openwfe.org/apidocs/openwfe/org/misc/Wget.html

I'm sure, with a little searching, you could find others that do the same thing.
Ben Souther
Sheriff

Joined: Dec 11, 2004
Posts: 13410

Jelle beat me to it.
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1836
    
    7

For once
Bill Shirley
Ranch Hand

Joined: Nov 08, 2007
Posts: 457
Everyone answering assumed you meant that you wanted to access all links from the original.

Your question implies you are trying to find all subdomains and/or all subdirectories available at a site. This is not necessarily possible; there is no standard way to do it. Crawling the site as hinted at *might* be successful.


Bill Shirley - bshirley - frazerbilt.com
if (Posts < 30) you.read( JavaRanchFAQ);
Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 742
Originally posted by Jelle Klap:
This article should get you where you need to go:

http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/


The link is very useful

Thanks
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading URL with Java