This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Sockets and Internet Protocols and the fly likes retrieve the HTML page of any URL without using java.net.URL Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "retrieve the HTML page of any URL without using java.net.URL " Watch "retrieve the HTML page of any URL without using java.net.URL " New topic
Author

retrieve the HTML page of any URL without using java.net.URL

salman khalid
Greenhorn

Joined: Jun 05, 2005
Posts: 9
I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes. Any good suggestions in this regard will be highly appreciated... Thanks in advance...
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19656
    
  18

Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


Why?

Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18541
        
        8

    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.

    I'm pretty sure it doesn't use URLConnection; that caused problems for me when I tried to use it in an applet.
    salman khalid
    Greenhorn

    Joined: Jun 05, 2005
    Posts: 9
    Paul Clapham wrote:
    salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


    Why?

    Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.


    thanks for the response...I agree with you that it will not be a simple class. I have implemented your suggested method. The following code snippet describes this method, but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.

    like "www.google.com" URL will return HTML contents but not "http://www.oracle.com/technetwork/java/index.html" URL.

    salman khalid
    Greenhorn

    Joined: Jun 05, 2005
    Posts: 9
    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.




    I will try Apache's HttpClient and then I will let you know....
    Rob Spoor
    Sheriff

    Joined: Oct 27, 2005
    Posts: 19656
        
      18

    Please UseCodeTags next time. It preserves indentation, and adds syntax highlighting. I've added them to your code, and you can see it's much easier to read now.
    Steve Luke
    Bartender

    Joined: Jan 28, 2003
    Posts: 4168
        
      21

    salman khalid wrote:... but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.


    You will have to properly format the get request. This URL may help:
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html

    Steve
    Rob Spoor
    Sheriff

    Joined: Oct 27, 2005
    Posts: 19656
        
      18

    Which is why I suggested HttpClient, as it will do all the hard work for you.
     
    It is sorta covered in the JavaRanch Style Guide.
     
    subject: retrieve the HTML page of any URL without using java.net.URL
     
    Similar Threads
    how can i validate an url using weblogic 7.0 api?
    URL Redirect
    How to connect to HTTPS url from a java method
    parent location of a url
    How to connect to HTTPS url using java URL Class