File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes How get full Stream of document into other InputStream? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "How get full Stream of document into other InputStream?" Watch "How get full Stream of document into other InputStream?" New topic
Author

How get full Stream of document into other InputStream?

Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
I have a ZipInputStream. I move to a ZipEntry and then I use the ZipInputStream to read the document data until zis.read() == -1. In normal cases this works fine, however, I am using Apache's POI API and for some reason it cannot correctly read from the ZipInputStream (don't worry you don't need to know ANYTHING about POI to answer my question).
Now, when I do the following (real rough code):

It works fine and reads the InputStream no problem. So I know I just need to give it a "regular" input stream and I'm ok. My question(s) then are this:
1. I know that a file can be larger than an int in size (this is why File.getSize() returns long), but a byte array can only be an int in length. So the InputStream solution I have above ONLY works for files up to the limit of int. What would be a better solution that can handle larger streams (I have no problem extending InputStream).
2. What happens if I can keep reading from the inputstream and the ByteArrayOutputStream's buffer is full? Will it just dump what's at the beginning?
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi Robert,

1. I know that a file can be larger than an int in size (this is why File.getSize() returns long), but a byte array can only be an int in length. So the InputStream solution I have above ONLY works for files up to the limit of int. What would be a better solution that can handle larger streams (I have no problem extending InputStream).

Using channels instead of streams should guarantee the ability to read a file of any size. In particular java.nio.FileChannel. Of course you'll need the 1.4 SDK for that.

2. What happens if I can keep reading from the inputstream and the ByteArrayOutputStream's buffer is full? Will it just dump what's at the beginning?

I'm guessing an ArrayIndexOutOfBoundsException would be thrown.
Hope this help,
Michael Morris


Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction. - Ernst F. Schumacher
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
What happens if I can keep reading from the inputstream and the ByteArrayOutputStream's buffer is full? Will it just dump what's at the beginning?
No, it should keep resizing the buffer as necessary to accommodate the increased size. Unless you overload it so much it's forced to throw OutOfMemoryError. Much like a StringBuffer.
Using channels instead of streams should guarantee the ability to read a file of any size. In particular java.nio.FileChannel. Of course you'll need the 1.4 SDK for that.
You can read a huge file with a FileInputStream too. The problem is, you can't store the thing all in memory at once (e.g. in a byte[] array using a ByteInputStream) if the memory required exceeds certain numbers - even if you're using nio classes. For a byte array, its max size is Integer.MAX_VALUE. The nio equivalent would probably be to use a MappedByteBuffer - but here again, the indices used for the buffer are of type int, so the max size is the same. Even with other possible constructs, the max JVM stack memory you can ever use is I think eight times that, Math.pow(2, 32) * 4 = 8 GB -- because the JVM instruction set uses no more than 32 bits to refer to its internal memory addresses, and the word length is 4 bytes. (Unless there's some sort of multi-paging scheme I've overlooked.)
What would be a better solution that can handle larger streams
Basically, if you can't fit all your data into a byte[] array or maybe an int[] array (and what sort of huge beast are you running your program on, anyway?) you'll need to break the data into smaller chunks and figure out how to process it as you go. Read some, process it somehow, write the processed results to a file or something, and repeat (dumping what you just read). This will usually improve efficiency a lot too, as your program is less of a memory hog. How you achieve this depends on what sort of data you have and what sort of processing you need to do.


"I'm not back." - Bill Harding, Twister
Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
First off, let me just say Jim is a genius.
I solved my problem! I ended up not storing the data, but I figured out what was wrong. It turns out that:
1. ZipInputStream.available() will return either 0 or 1, NEVER more than that.
2. For some reason (I believe to do with inflating) when you call ZipInputStream.read(byte[],int,int) it will frequently not be able to read back the length of bytes you want even though there ARE more left in the stream. If you call read again, it will then read it.
So my solution was this:

This solves the problem and allows it to work with the code that likes to read from a stream in blocks. OH! And make sure that the check for the count is first! NOT the reading of the byte, I had it the opposite and was losing a byte every read.
[ February 20, 2003: Message edited by: Robert Paris ]
[ February 20, 2003: Message edited by: Robert Paris ]
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451

Using channels instead of streams should guarantee the ability to read a file of any size. In particular java.nio.FileChannel. Of course you'll need the 1.4 SDK for that.

I wasn't suggesting that you can't handle a huge file with streams, nor was I suggesting that with nio you could somehow bypass the hard constraints of the JVM and read the whole file into memory. I am learning nio and my understanding is that a FileChannel can read in any byte in the file regardless of its position, even it is byte number 2^32 + 1, without loading the previous 2^32 bytes into physical memory. I am seeing a Channel as analagous to an array in virtual memory. Is my understanding correct?
Michael Morris
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I wasn't suggesting that you can't handle a huge file with streams, nor was I suggesting that with nio you could somehow bypass the hard constraints of the JVM and read the whole file into memory.
Okey-doke. I guess I conflated what Robert seemed to be asking for, with what you were saying.
I am learning nio
Me too to be honest, so let the reader beware.
and my understanding is that a FileChannel can read in any byte in the file regardless of its position, even it is byte number 2^32 + 1, without loading the previous 2^32 bytes into physical memory.
That seems to be right. But you can also do much the same with any InputStream, using skip(int) and a little extra logic:

It might be slower "only" skipping Integer.MAX_VALUE bytes each time (or less for some streams maybe) but it will almost certainly be a lot faster than reading everything along the way. So FileChannel doesn't seem vastly different in this ability - though it's a bit simpler to use, and probably faster in its internal implementation.
I think the main thing FileChannel has here that FileInputStream doesn't, is the ability to move backwards to an earlier position in the file. This simply isn't possible with a FileInputStream; you'd need to dump it and create a new one. You can sorta do it with a PushbackInputStream, but you're limited by (theoretically) Integer.MAX_VALUE range, or (practically) the amount of memory you want to devote to the pushback buffer - and these are bytes that have to be read into memory; if you skip() them, you cant push back. And let's not even get into the horrid RandomAccessFile, which attempts to do too many different things, none of them well. On occasions where I had to move back in a file, I found it was much quicker to just make a new FileInputStream positioned at the beginning, and skip() to where I wanted. RAF was just abysmally slow. And it encourages people to ignore encoding issues, but that's another issue.
Sorry, not what you were talking about. I just like to rant about RAF. Onward...
I am seeing a Channel as analagous to an array in virtual memory. Is my understanding correct?
Yeah, I think so.
[ February 20, 2003: Message edited by: Jim Yingst ]
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: How get full Stream of document into other InputStream?