File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Sockets and Internet Protocols and the fly likes retrieve the HTML page of any URL without using java.net.URL Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "retrieve the HTML page of any URL without using java.net.URL " Watch "retrieve the HTML page of any URL without using java.net.URL " New topic
Author

retrieve the HTML page of any URL without using java.net.URL

salman khalid
Greenhorn

Joined: Jun 05, 2005
Posts: 9
I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes. Any good suggestions in this regard will be highly appreciated... Thanks in advance...
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19541
    
  16

Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18114
    
    8

salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


Why?

Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18114
        
        8

    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.

    I'm pretty sure it doesn't use URLConnection; that caused problems for me when I tried to use it in an applet.
    salman khalid
    Greenhorn

    Joined: Jun 05, 2005
    Posts: 9
    Paul Clapham wrote:
    salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


    Why?

    Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.


    thanks for the response...I agree with you that it will not be a simple class. I have implemented your suggested method. The following code snippet describes this method, but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.

    like "www.google.com" URL will return HTML contents but not "http://www.oracle.com/technetwork/java/index.html" URL.

    salman khalid
    Greenhorn

    Joined: Jun 05, 2005
    Posts: 9
    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.




    I will try Apache's HttpClient and then I will let you know....
    Rob Spoor
    Sheriff

    Joined: Oct 27, 2005
    Posts: 19541
        
      16

    Please UseCodeTags next time. It preserves indentation, and adds syntax highlighting. I've added them to your code, and you can see it's much easier to read now.
    Steve Luke
    Bartender

    Joined: Jan 28, 2003
    Posts: 3934
        
      17

    salman khalid wrote:... but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.


    You will have to properly format the get request. This URL may help:
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html

    Steve
    Rob Spoor
    Sheriff

    Joined: Oct 27, 2005
    Posts: 19541
        
      16

    Which is why I suggested HttpClient, as it will do all the hard work for you.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: retrieve the HTML page of any URL without using java.net.URL
     
    Similar Threads
    URL Redirect
    How to connect to HTTPS url from a java method
    parent location of a url
    how can i validate an url using weblogic 7.0 api?
    How to connect to HTTPS url using java URL Class