aspose file tools*
The moose likes Java in General and the fly likes character encodings Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "character encodings" Watch "character encodings" New topic
Author

character encodings

sree kov
Greenhorn

Joined: Jul 23, 2001
Posts: 15
Due to my project requirements we need a converter for any generic object to String and String to Object
We have planned to use Serialization for that, and I have a doubt regarding converting the binary data (the serialized output in the form of a byte array) to string.
Currently we have used our own encoding scheme which converts each binary data byte into its two character, hexadecimal representation.
I have a question here. Can I use a character encoding scheme for the same? Will ISO-8859-1 work fine since it uses 8 bit encoding ?
paul wheaton
Trailboss

Joined: Dec 14, 1998
Posts: 20572
    ∞

Well, there's lots of stuff I don't understand about your question, so I'm going to try and make the best of it ...
If you are going to use serialization, doesn't that just take care of everything for you? Once you use it, you sort of don't have to worry about any of that anymore.
Am I on the wrong track?


permaculture Wood Burning Stoves 2.0 - 4-DVD set
sree kov
Greenhorn

Joined: Jul 23, 2001
Posts: 15
Hi Paul,
Basically my problem is once the object is serialized I get a byte array out of it and want to construct a String from it.
At a later point of time, I need to extract the byte array from the String and construct the object back from it.
Now while constructing a String from the byte[] I am using the constructor new String(byte[]). Operating on a windows platform the default encoding scheme is UTF-16. When I get the byte[] back from the String using String.getBytes(), both the byte arrays do not match, there by I am not able to reconstruct the object back.
This I figured out is because UTF-16 is suitable to encode unicode character set where as the byte[] contains the binary data(ie the serialized object).
When I used ISO-8859-1 scheme my problem got solved(my test program worked fine)
I want to ensure the code does not break becuase it could be any object at runtime. So my question is, Is it okay to use ISO-8859-1 encoding scheme to encode binary data, in my case the serialized object in the form of an byte[]/
paul wheaton
Trailboss

Joined: Dec 14, 1998
Posts: 20572
    ∞

I do not understand the need to convert it to a string, but I'll assume it has something to do with some sort of limitations. Perhaps a pipline that uses the eight bit of a byte for parity.
Are you familiar with uuencoding?
Is size an issue? If not, perhaps a simple hex representation would be nice and simple. If so, perhaps some zip compression followed by some form of uuencoding.
sree kov
Greenhorn

Joined: Jul 23, 2001
Posts: 15
Hi Paul,
Thank you. Since size is not an issue, I am using a simple hex representation and that solved the prb.
I have a small doubt here. ISO-8859-1 also uses 8 bit encoding. So will that not serve the purpose ?
Thanks in advance.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: character encodings