• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Translate CharSet of InputStream

 
Jeroen Kransen
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using an external library that reads bytes from an InputStream and assumes that it is UTF-8 text while it is actually text in another char encoding. So I need to translate the actual bytes from ISO-8859-15 (or so) to UTF-8. What I want to avoid is to have to read the entire stream into a String first. Any ideas? I hoped that there was a commons-lang or commons-io util that would do this for me, but I didn't find any.

Jeroen
 
Jelle Klap
Bartender
Posts: 1951
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could wrap the InputStream with an InputStreamReader and use a CharsetDecoder to decode from ISO-8859-15 to Unicode. If you browse through InputStreamReader's Javadoc page you should find an appropriately overloaded constructor.
 
Jeroen Kransen
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I saw that, but then I've got a Reader, and I really need an InputStream. Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case. Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.

 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeroen Kransen wrote:Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case.

Yeah, welcome to the world of Java I/O. It's like that a lot.

Jeroen Kransen wrote:Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.

I'm not sure where that idea came from. Unicode includes both UTF-8 and UTF-16 within its standard, and internally Java uses both forms in various ways. But there are classes to handle either of these encodings, and many more.

The main problem is, as you've said, you can get a Reader but what you need is an InputStream. You could do this by writing everything to a byte array or file, and then rereading it. Or if you want something quicker and/or with lower memory requirements (assuming the file is fairly large), then you can try something like this:

Which is more work than we might like, but oh well. It may be possible to do this faster with NIO, but the basic idea would be the same.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic