You don't get the whole result because you don't read the whole result. Instead you stop reading earlier than that because of this:
By the way, that readLine() method is deprecated. The API documentation has some suggestions about what you should be using instead.
Also, you said this:
When i use the URL class getcontent I get all the html, but i need to use sockets.
That doesn't quite make sense to me, as the URL class does use sockets. So if you use that, you are using sockets.
mj zammit
Ranch Hand
Joined: Nov 16, 2008
Posts: 49
posted
0
Hi Paul
Thanks for the reply. What i meant when i said that i want to use sockets and not URL is that i want to use low-level sockets. I have followed your suggestions but i am still only getting half of the html This the code:
This is the output i am getting on the console: Line 1: <html> Line 2: <head> Line 3: <meta NAME="GENERATOR" Content="Microsoft FrontPage 12.0"> Line 4: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> Line 5: <title>Nature Net</title> Line 6: <link REL="stylesheet" HREF="styles/style.css" TYPE="text/css"> Line 7: <script src="include/i_javascript.js" type="text/javascript"></script> Line 8: Line 9: <style type="text/css"> Line 10: .style1 { Line 11: text-align: center; Line 12: } Line 13: .style2 { Line 14: border-width: 0px; Line 15: } Line 16: .style5 { Line 17: color: #E4761F; Line 18: } Line 19: </style> Line 20: Line 21: </head> Line 22: <body leftmargin="0" topmargin="0" bgcolor="#FFFFFF"> Line 23: <table border="0" cellpadding="0" cellspacing="0" width="780"> Line 24: <tr> Line 25: <td width="195" bgcolor="#4346D3" align="center" valign="middle"> Line 26: <img src="images/naturenetlogo2.gif" width="92" height="92" align="middle"></td> Line 27: <td width="585"> Line 28: <table border="0" cellpadding="0" cellspacing="0"> Line 29: <tr> Line 30: <td width="443" height="138" bgcolor="#84C55F" align="center" valign="center"> Line 31: <img src="images/headertitle.gif" alt="Naturenet The Environmental Learning Network" width="409" height="96"> Line 32: </td> Line 33: <td width="142" bgcolor="#84C55F"> Line 34: <img src="images/headerpic1.gif" id="rightuppergraphic" alt="" width="142" height="140"> Line 35: </td> Line 36: </tr> Line 37: <tr> Line 38: <td colspan="2" height="22" bgcolor="#FBE590" align="right"><a href="contact.html" class="navlink"> Line 39: contact us</a> | Line 40: <a href="sitemap.html" class="navlink">sitemap</a> </td> Line 41: </tr> Line 42: </table> Line 43: </td> Line 44: </tr> Line 45: <tr> Line 46: <td width="195" height="6" bgcolor="#FBE590"></td><td bgcolor="#84C55F"></td> Line 47: </tr> Line 48: <tr> Line 49: <td width="195" height="500" bgcolor="#FBE590" valign="top"> Line 50: <table border="0" cellpadding="0" cellspacing="0"><tr><td bgcolor="#FBE590" width="5"></td> Line 51: <td bgcolor="#FBE590"> Line 52: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> That is all the contents found in the DataInputStream
I also had the following contents in the console: The request header : GET /styles/style.css HTTP/1.0 User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4 Referer:http://127.0.0.1:8080/?getURL=www.naturenet.com Accept: text/css,*/*;q=0.1 Host: 127.0.0.1:8080 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: __qca=1223192728-64253405-47139338; __utma=96992031.3524520648312145400.1227907344.1227907344.1227907344.1; __utmz=96992031.1227907344.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
The urls you indicated to me do everything in bytes, but what i want to achieve in the end is the html of any url so i can manipulate it on my web server before outputting the results. But i can not do that with bytes right?
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35241
7
posted
0
You can't convert the bytes to strings until you know which encoding they're in, and you won't know that until you've inspected the META tag that specifies it.
I tried this way but i am still getting only half the html
Thank you for your patience, i am new to networking and would really like to manage in low-level sockets...
I am working with low-level sockets since the class URL can only do POST and GET requests from the HTTP methods, is this true? can it do other HTTP methods sych as DELETE?
mj zammit
Ranch Hand
Joined: Nov 16, 2008
Posts: 49
posted
0
When you say you have to inspect the META tag does that mean to find charset value?
mj zammit
Ranch Hand
Joined: Nov 16, 2008
Posts: 49
posted
0
I tried to encode using UTF-8 but this if i am not mistaken is for text/html content the code it as follows:
but i am still getting only half the html
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35241
7
posted
0
but i am still getting only half the html
Read the ReadDoesntDoWhatYouThinkItDoes page I linked to; it explains the problem.