• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Quickest way to read in a Web page

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm using the following code to read in the contents of a Web page:

URL SECURL = new URL(formURL);
URLConnection SECCon = SECURL.openConnection();

InputStream input = SECCon.getInputStream();
InputStreamReader IReader = new InputStreamReader(input);
BufferedReader BReader = new BufferedReader(IReader);

String BString;

while ((BString = BReader.readLine()) != null){
/* processing code here */
}

Is there a faster way to read the web page? This code segment seems to take longer than I would like.

Thanks.
 
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I can't speak for what's fastest, but here's the code I use:



In terms of efficiency, one important thing is to make sure you're using StringBuffers not Strings (as String concatenation is expensive). However, it may be that the speed of your network is sufficiently slow that the concatenation isn't what's causing the slowness.

BTW, this code is hackish ... I've only used it when playing around. I haven't put much time into writing it particularly prettily. In particular, the while loop is nasty and C-like. I would rewrite it in production code...but you get the idea



--Tim
[ June 27, 2004: Message edited by: Tim West ]
 
Ranch Hand
Posts: 3178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Tim West explanation is reasonable... Using StringBuffer is better than using String at least in long characters processing...

Craig Sullivan, what do u mean by "faster way to read the web page"? Do u mean which readers or inputstreams are supposed to be used in ur code? Could you provide more info about your code so that we can help you much more than you can imagine?
 
Craig Sullivan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to know the fastest way to get the content of the page from the server to my Java client.
 
Tim West
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, you're limited by two things:

  • The speed of the connection between the remote server and your local box.
  • The speed of your Java code.


  • For the latter, implement a decent solution that uses a BufferedReader and StringBuffers not Strings. I'm not aware of anything else that will significantly increase your code speed in this situation. If there is anything, I'm sure someone else will point it out soon.

    Then, unless your connection is really fast, I'd say it's highly likely that your connection, not the code, is the performance bottleneck. So, upgrade your inter|intranet facilities

    In any case, it should be relatively simple to profile your code to work out which methods are taking most time. Then you can decide where to optimise next.


    --Tim
     
    Craig Sullivan
    Greenhorn
    Posts: 4
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    My main concern is with reducing round-trips to and from the server. Is there a certain method of downloading the data from the server that will reduce round-trips?

    For example, does BufferedReader.readLine() use more round trips than BufferedReader.read()? I tried changing the buffer size, but the largest download I could get was 2555 bytes. Is the max buffer size dependent on HTTP or is there some parameter within the JDK that I can change?
     
    Tim West
    Ranch Hand
    Posts: 539
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hmm. I'm not qualified to give a definitive answer at this point, but I can offer some more thoughts.

    Firstly, the size of any given packet (at the lowest level) is determined by your Maximum Transfer Unit, or MTU. This is an OS-level concern, and something Java has no control over. For NICs, it's generally around 1480 bytes (I think. At least, it is for me).

    This is the maximum packet size. It includes all the HTTP/TCP/IP headers, checksums and whatever else the different layers on the network stack put in. So, you don't get a huge amount of data in an individual packet. I'm not familiar enough with the various protocols to know, but I think any network connection always involves round trips of a sort - the TCP 3-way handshake to start, then the process of accepting each packet from the source and requesting more data. Do you want to reduce this sort of round trip, or have I missed something?

    However, all this is transparent to a Java app. As far as Java's concerned, you get a byte stream (well, URL.getStream() returns an InputStream) and read happily away.

    I think from a Java POV, all you can do is use a larger buffer in the BufferedReader. Then you avoid the possibility that the buffer could fill and the connection would have to stall. That said, I'd guess most OSs would buffer network connections themselves, but that is complete speculation.

    Anyway, there are some random thoughts that may or may not help.

    I'm curious though - what do you mean the largest download you could get was 2555 bytes? Is that one packet or the total download size?

    Dunno whether I helped or not jus' then, but there ya go



    --Tim
     
    Craig Sullivan
    Greenhorn
    Posts: 4
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Here's the deal as far as I know: HTTP has a flexible window. HTTP will send more or less packets at one time without an ACK from the client depending on the speed and stability of the network.

    After I created a BuffereReader, I called bytesAvailable and got back 2555, or some such. This tells me that I can only read in 2555 bytes at one time.

    When I use BufferedReader.readLine() to read a 8 MB web page, it takes 25 seconds. When I used my web browser, it takes 5 seconds. Somewhere, I don't know where, the amount I can read in at one time is being limited to 2555 bytes. I don't believe my TCP/IP stack is limiting my download. I believe there is some parameter in Java that is limiting the number of bytes I can DL at one time to 2555. If the download size were bigger, not as many ACKs would be sent from my client, and the download would be faster.

    I may need to dig into the JDK to see what's going on.
    [ July 02, 2004: Message edited by: Craig Sullivan ]
     
    Tim West
    Ranch Hand
    Posts: 539
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hmm, this is out of my depth now :-)

    To confirm your ideas on packets, you might like to use Ethereal (or something) to see if there are any obvious differences between the way Java is doing TCP/IP as compared to your web browser.

    Also, did you play with the size of the buffer in the BufferedReader? Make it 8Mb and see if you get speeds comparable with the browser.

    Anyway, what I'm writing now is speculation more than well-founded advice, so take it at your peril Would be interesting to know what the cause of all this is, though.

    Whoa, just a thought after all this - should we be using BufferedInputStream, not BufferedReader? I would think we want buffering as "close to the network" as possible. Erm, maybe someone else can comment. I dunno what the relative merits of a BufferedReader vs. BufferedInputStream are (I mean, besides the obvious).


    --Tim
    [ July 01, 2004: Message edited by: Tim West ]
     
    (instanceof Sidekick)
    Posts: 8791
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Your code will likely be much faster than the Internet. I have a little program that downloads files and shows the bytes per second after every 1k bytes. I can run one thread or five and the BPS is the same for each. My code is not the bottleneck. If I had a need for 50 threads it might be.
     
    Author and all-around good cowpoke
    Posts: 13078
    6
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    BufferedReader.readLine()
    That has a huge overhead - converting a byte stream to characters, building a line, finally converting to string.
    For speed -
    1. Never convert to characters - stay with bytes
    2. Start with a monsterous byte[] and read directly into it - probably
    with the read( buf, offset, length) method, where length is the result of calling available.
    Or you might use the ServletInputStream readLine(buf,off.length) method which will return -1 at the eof, and will let you count lines.
    Bill
     
    Tim West
    Ranch Hand
    Posts: 539
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hmm, so based on William's post, using a BufferedInputStream over a BufferedReader is definitely a good thing - you get the advantages of buffering without the overhead of character/String conversion.



    -Tim
     
    William Brogden
    Author and all-around good cowpoke
    Posts: 13078
    6
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Right - but I think that reading directly from the ServletInputStream would be best. Remember, the operating system TCP/IP stack already has a buffer to hold a packet (?or maybe more than one?) - there is no need to introduce another buffer, just grab the bytes as they become available.
    Bill
    reply
      Bookmark Topic Watch Topic
    • New Topic