aspose file tools*
The moose likes I/O and Streams and the fly likes How to read a html page from internet. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "How to read a html page from internet." Watch "How to read a html page from internet." New topic
Author

How to read a html page from internet.

ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
I want to read a html page from internet. Html page is easy to read offline but i don't know how to read from internet. If any body give me code i will be very thankful.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41489
    
  53
I'm assuming you're trying to write Java code that reads a particular HTML file, and then saves it to disk? This code may get you started. Saving to disk is left as an exercise to the reader

Note that this only deals with the HTML; if you want to download the images, styles, JavaScript and other resources used by that page, it gets a whole lot more complicated.


Ping & DNS - my free Android networking tools app
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
this code don't work. Actually i want to read the whole link and data from a site and store it on data base like i want to read whole data from site
"http://seriouswheels.com/cars-a.htm" and store its specific data on data base.
[ August 23, 2006: Message edited by: ehsan dar ]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41489
    
  53
this code don't work.

That's not a very useful error description: TellTheDetails.

If I understand correctly that you're trying to duplicate the actual data on that web site (and not the HTML) in your database, then be aware that it is most likely covered by some sort of copyright, which might make that illegal.
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
This is my code which i write on the main method. It compile and execute fine but did not show the result.
try {

URL url = new URL("http://seriouswheels.com/cars-a.htm");


BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = br.readLine()) != null) {
System.out.println(str);
}
in.close();
} catch (MalformedURLException e) {
}
Actually if you see the site there is a index from "A" to "Z" which contain car name and its model. what i want to read each page and store car name and its model in the data base. But read operation must be from internet not from save page. If you can help me i will be very thankful.
[ August 23, 2006: Message edited by: ehsan dar ]
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14104
    
  16

The code that Ulf gave you reads the page from Internet and prints the HTML code on the console.

If you want to load the pages that are referenced on that page by hyperlinks, then you have to parse the HTML code, find all the links in it and read the pages that the links point to. You would have to look for <a href="..."> HTML tags in the HTML text and extract the URLs of the other pages from them.

Write some code yourself, and if you get stuck, please ask more questions.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41489
    
  53
but did not show the result.

What did it show, then? Are you certain that no exception is thrown? Having an empty catch block is rarely a good idea, and definitely not here.
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
try
{
// Create a URL for the desired page
URL url = new URL("http://seriouswheels.com/cars-a.htm");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader ul.openStream()));
String str;
while ((str = in.readLine()) != null)
{
System.out.println(str);
}
in.close();
}
catch (MalformedURLException e)
{System.out.println("Error");}
catch (IOException e)
{System.out.println("page not open");}

When i write this code exception through page not open.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41489
    
  53
Post the stack trace of the exception (i.e., what gets printed if you wrote "e.printStackTrace();"). It contains information on what is going wrong.
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
Sir, sorry for disturbance again & again. There are almost 15 exceptions which i post you.
java.net.UnknownHostException: Seriouswheel.com
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:177)
at java.net.Socket.connect(Socket.java:507)
at java.net.Socket.connect(Socket.java:457)
at java.net.NetworkClient.doConnect(NetworkClient.java:157)
at java.net.www.http.HttpClient.openServer(HttpClient.java:365)
at java.net.www.http.HttpClient.openServer(HttpClient.java:477)
at java.net.www.http.HttpClient.<init>(HttpClient.java:214)
at java.net.www.http.HttpClient.New(HttpClient.java:287)
at java.net.www.http.HttpClient.New(HttpClient.java:299)
at java.net.www.protoco.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:792)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:744)
at sun.net.www.protocol.http.HttpRULConnection.connect(HttpURLConnection.java:669)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:913)
at java.net.URL.openStream(URL.java:1007)
at HthmReader.main(HtmlReader.java:67)

Thanks in advance.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41489
    
  53
Just for the record, that is just one exception. The line sbelow that just point to the exact line in the code where it happens. But the important clue is right in this line:


The host you're trying to connect to is accessible, which could happen for a number of reasons. Earlier you mentioned "Seriouswheels.com" (with an "s") - now the s is missing. Is that intentional? Can you ping the server? Can you reach it if you use its IP address instead of the host name?
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
's' is just typing mistake. How can i ping? And how i got its IP address and how to use it ?
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8866
    
    8

Ping will give you the IP address:



"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
oh sorry i know that i think may be ping by using some APIs. I always do that and my computer answer ping request could not find host http://seriouswheels.com/ pls check the name and try again.
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8866
    
    8

If your network cannot connect to the remote server the problem is not with your Java code. Can you ping the IP directly? That would indicate a DNS problem. Can you ping other host names? That would indicate a problem with that particular remote server.
[ August 25, 2006: Message edited by: Joe Ess ]
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
When I ping from ip address message appear destination host unreachable. If i try to ping yahoo or hotmail with host name ping command don't work also and say "ping request could not find host name."
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8866
    
    8

Are you even connected to the internet?
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
yes offcourse I am.
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8866
    
    8

Is there a proxy between you and the internet?
ehsan dar
Greenhorn

Joined: Aug 21, 2006
Posts: 18
Actually i don't know much about the proxy setting. I contact to my network administrator and then tell you.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to read a html page from internet.