I am using an external library that reads bytes from an InputStream and assumes that it is UTF-8 text while it is actually text in another char encoding. So I need to translate the actual bytes from ISO-8859-15 (or so) to UTF-8. What I want to avoid is to have to read the entire stream into a String first. Any ideas? I hoped that there was a commons-lang or commons-io util that would do this for me, but I didn't find any.
Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Joined: Mar 31, 2003
I saw that, but then I've got a Reader, and I really need an InputStream. Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case. Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.
Jeroen Kransen wrote:Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case.
Yeah, welcome to the world of Java I/O. It's like that a lot.
Jeroen Kransen wrote:Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.
I'm not sure where that idea came from. Unicode includes both UTF-8 and UTF-16 within its standard, and internally Java uses both forms in various ways. But there are classes to handle either of these encodings, and many more.
The main problem is, as you've said, you can get a Reader but what you need is an InputStream. You could do this by writing everything to a byte array or file, and then rereading it. Or if you want something quicker and/or with lower memory requirements (assuming the file is fairly large), then you can try something like this:
Which is more work than we might like, but oh well. It may be possible to do this faster with NIO, but the basic idea would be the same.