wood burning stoves 2.0*
The moose likes I/O and Streams and the fly likes Unicode Streams Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Unicode Streams" Watch "Unicode Streams" New topic
Author

Unicode Streams

Nagendra Prasad
Ranch Hand

Joined: Jul 11, 2002
Posts: 219
Hello,
I needed some clarification on a few concepts surrounding Unicode streams within a java program.
The following is the scenario:
I have an XML file which is encoded as UTF-8. I need to read this as
a unicode stream. For characters that lie outside a particular
unicode range, I need to replace them with thier hex equivalents.
I was thinking of reading the input stream byte at a time and comparing
if it was within the range or outside it.
Now, depending on the character, it could be represented in more than one
byte (UTF-8 i believe could use between 1 and 4 bytes). How can I be assured that the byte I am reading is on its own (i.e single byte rep) or it requires me to read the next one to make sense of what character it is?
Could anyone please throw some light.. my head is spinning!


Best Regards,<br />Nagendra Prasad.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
If you want to interpret a stream as Unicode chars, you probably want a Reader or Writer. I recommend using an InputStreamReader to convert a stream using a particular specified encoding:


"I'm not back." - Bill Harding, Twister
Nagendra Prasad
Ranch Hand

Joined: Jul 11, 2002
Posts: 219
Jim,
Thanks for the reponse.
This seems to be the way forward.
Do you have any comment on the performance impact of these actions
if the size of the XML is moderate (10-12K)?
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Do you have any comment on the performance impact of these actions
if the size of the XML is moderate (10-12K)?

Not really. 10-12 k doesn't sound very big to me; I don't think you'll notice any performance problem. (Unless you're processing a lot of 10-12 k files.) If you find you need to speed things up, then if you're using 1.4 you can use a FileChannel instead, and use the Charset class to encode and decode bytes/chars. It's probably unnecessary though.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Unicode Streams
 
Similar Threads
Byte streams
Some Questions About IO
character encodings in streamReaders/Writers
character Encoding issues
any hints for creating &/or using existing UNICODE convertor/processor?