So I am currently trying to figure out a problem I am having. I have a client that can request and receive text based files from a server. The requests are pipelined so knowing the exact length of a response body is important. If the content type is "text/<anything>" I will read the body into a character array and save that array to file. Even with large files I am able to keep track of the content-length and number of chars I've currently read in. However when the file is not text, I am having problems with saving it. If the file is a pdf for example I will read the body into a byte array and then write that to file. However the number of bytes read in seems to be shorter than the "Content-Length."
For example I have a pdf that has a content-length of 36975 bytes. After reading in 35799 of that my variables say I have 2159 left, but only 983 bytes are read in. With my logic, upon the next read there is nothing left. Initially I thought my logic was incorrect, but I use the same logic for text files. The only difference is for text I use a BufferedReader and anything else uses InputStream for reading from the socket.
Really, if you're just receiving files and saving them locally, you should be treating them all as streams of bytes. Converting the bytes could possibly cause problems if you use the wrong encoding. Converting bytes to chars and then back to bytes is at best a waste of effort.
However if you say your code for copying the stream of bytes has a problem, and you're asking for ideas, then my idea would be to investigate the problem and fix it if necessary.
I say "if necessary" because you don't say that the files are being truncated, you just say your logic doesn't seem to process the number of bytes you think it should process. So, first step, find out if the files are actually being truncated. For example try to open one of the PDFs in Acrobat Reader. If they aren't being truncated then you don't actually have a problem.
Or if they are being truncated, then I would recommend looking at the code to see why. If you can't see why then you could post it here and ask about it.
Joined: Feb 10, 2011
From what I have found I agree I need to keep the data as a stream of bytes. The following is the code I have for doing that.
For any binary file, index always returns -1 before bodyReadIn = bodyLength. This results in a corrupted file. What am I missing here? I really appreciate the help.
James McIntyre wrote:From what I have found I agree I need to keep the data as a stream of bytes.
But you're using a Reader. That directly contradicts the idea of keeping it as a stream of bytes.
And worse, it's a BufferedReader. Which means it uses a buffer. So the BufferedReader reads in a few hundred characters from the InputStream you gave it, and you go through that buffer looking for something. Eventually you decide your found it, so then you start reading from the InputStream. You think you will start reading immediately after the last data which you got from the BufferedReader. But no. It has some more data in its buffer which it already took from the InputStream. You won't get that by reading from the InputStream.
Joined: Feb 10, 2011
I misunderstood what you said initially. Thank you for clearing that up for me!
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: HTTP receiving problems with non text files.