aspose file tools*
The moose likes I/O and Streams and the fly likes Translate CharSet of InputStream Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Translate CharSet of InputStream" Watch "Translate CharSet of InputStream" New topic
Author

Translate CharSet of InputStream

Jeroen Kransen
Greenhorn

Joined: Mar 31, 2003
Posts: 6
I am using an external library that reads bytes from an InputStream and assumes that it is UTF-8 text while it is actually text in another char encoding. So I need to translate the actual bytes from ISO-8859-15 (or so) to UTF-8. What I want to avoid is to have to read the entire stream into a String first. Any ideas? I hoped that there was a commons-lang or commons-io util that would do this for me, but I didn't find any.

Jeroen
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1761
    
    7

You could wrap the InputStream with an InputStreamReader and use a CharsetDecoder to decode from ISO-8859-15 to Unicode. If you browse through InputStreamReader's Javadoc page you should find an appropriately overloaded constructor.

Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Jeroen Kransen
Greenhorn

Joined: Mar 31, 2003
Posts: 6
I saw that, but then I've got a Reader, and I really need an InputStream. Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case. Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Jeroen Kransen wrote:Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case.

Yeah, welcome to the world of Java I/O. It's like that a lot.

Jeroen Kransen wrote:Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.

I'm not sure where that idea came from. Unicode includes both UTF-8 and UTF-16 within its standard, and internally Java uses both forms in various ways. But there are classes to handle either of these encodings, and many more.

The main problem is, as you've said, you can get a Reader but what you need is an InputStream. You could do this by writing everything to a byte array or file, and then rereading it. Or if you want something quicker and/or with lower memory requirements (assuming the file is fairly large), then you can try something like this:

Which is more work than we might like, but oh well. It may be possible to do this faster with NIO, but the basic idea would be the same.
 
 
subject: Translate CharSet of InputStream