Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

character encodings

 
sree kov
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Due to my project requirements we need a converter for any generic object to String and String to Object
We have planned to use Serialization for that, and I have a doubt regarding converting the binary data (the serialized output in the form of a byte array) to string.
Currently we have used our own encoding scheme which converts each binary data byte into its two character, hexadecimal representation.
I have a question here. Can I use a character encoding scheme for the same? Will ISO-8859-1 work fine since it uses 8 bit encoding ?
 
paul wheaton
Trailboss
Pie
Posts: 21484
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, there's lots of stuff I don't understand about your question, so I'm going to try and make the best of it ...
If you are going to use serialization, doesn't that just take care of everything for you? Once you use it, you sort of don't have to worry about any of that anymore.
Am I on the wrong track?
 
sree kov
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Paul,
Basically my problem is once the object is serialized I get a byte array out of it and want to construct a String from it.
At a later point of time, I need to extract the byte array from the String and construct the object back from it.
Now while constructing a String from the byte[] I am using the constructor new String(byte[]). Operating on a windows platform the default encoding scheme is UTF-16. When I get the byte[] back from the String using String.getBytes(), both the byte arrays do not match, there by I am not able to reconstruct the object back.
This I figured out is because UTF-16 is suitable to encode unicode character set where as the byte[] contains the binary data(ie the serialized object).
When I used ISO-8859-1 scheme my problem got solved(my test program worked fine)
I want to ensure the code does not break becuase it could be any object at runtime. So my question is, Is it okay to use ISO-8859-1 encoding scheme to encode binary data, in my case the serialized object in the form of an byte[]/
 
paul wheaton
Trailboss
Pie
Posts: 21484
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I do not understand the need to convert it to a string, but I'll assume it has something to do with some sort of limitations. Perhaps a pipline that uses the eight bit of a byte for parity.
Are you familiar with uuencoding?
Is size an issue? If not, perhaps a simple hex representation would be nice and simple. If so, perhaps some zip compression followed by some form of uuencoding.
 
sree kov
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Paul,
Thank you. Since size is not an issue, I am using a simple hex representation and that solved the prb.
I have a small doubt here. ISO-8859-1 also uses 8 bit encoding. So will that not serve the purpose ?
Thanks in advance.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic