Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

retrieve the HTML page of any URL without using java.net.URL

 
salman khalid
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes. Any good suggestions in this regard will be highly appreciated... Thanks in advance...
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.
 
Paul Clapham
Sheriff
Posts: 21107
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


Why?

Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.
     
    Paul Clapham
    Sheriff
    Posts: 21107
    32
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.

    I'm pretty sure it doesn't use URLConnection; that caused problems for me when I tried to use it in an applet.
     
    salman khalid
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:
    salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.


    Why?

    Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.


    thanks for the response...I agree with you that it will not be a simple class. I have implemented your suggested method. The following code snippet describes this method, but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.

    like "www.google.com" URL will return HTML contents but not "http://www.oracle.com/technetwork/java/index.html" URL.

     
    salman khalid
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.




    I will try Apache's HttpClient and then I will let you know....
     
    Rob Spoor
    Sheriff
    Pie
    Posts: 20526
    54
    Chrome Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Please UseCodeTags next time. It preserves indentation, and adds syntax highlighting. I've added them to your code, and you can see it's much easier to read now.
     
    Steve Luke
    Bartender
    Posts: 4181
    21
    IntelliJ IDE Java Python
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    salman khalid wrote:... but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.


    You will have to properly format the get request. This URL may help:
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
     
    Rob Spoor
    Sheriff
    Pie
    Posts: 20526
    54
    Chrome Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Which is why I suggested HttpClient, as it will do all the hard work for you.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic