• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

convert uft-8 into ascii format

 
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I have a string which is in UTF-8 format, the requirement is to convert this string to ASCII format before passing it to a database.

any help is a welcome

Thanks
 
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Java Strings are always in UTF-16, so I guess you do not have a String in UTF-8, but instead have some bytes in UTF-8. If you have a String and you think it's in UTF-8, then something has gone wrong somewhere in the design, I reckon.

You can make a String from your bytes, using the constructor String(byte[] bytes, String charSetName). Pass "UTF-8" as the charSetName.

You can then write your string into bytes using a different encoding, via the method of String called getBytes(String charSetName).

If you really want true 7-bit ASCII, be aware that many Unicode characters simply cannot be represented. Also, you might be able to shortcut the above procedure, because UTF-8 has 7-bit ASCII as a subset.
 
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One way to do it would be:



You coud also use the CharsetEncoder and CharsetDecoder classes.
 
Peter Chase
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Edwin Dalorzo:
One way to do it would be:




Er, what? Your string "utf" is UTF-16 encoded, like all Java strings. You can't have a Java String that is UTF-8 encoded. A stream of bytes can be UTF-8 encoded, but not a String.
 
Ranch Hand
Posts: 518
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You didn't mention which database and JDBC driver you are using.

The drivers that I'm familiar with will do any conversions necessary to store Java String objects into the database in the correct character encoding.

If your database is 8 bit ASCII, the driver should handle all conversions from UTF-16 to ASCII.
 
Edwin Dalorzo
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, Peter.

You continue to say that Java String are always in UTF-16. Why do you say that?

The encoding of the Java Strings is determined by the default encoding used by the JVM, declared in the file.encoding property.

Another thing very different is the encoding of the Java files (*.java) which might be UTF. But doest not have anything to do with you your application.

Strings are encoded according to every particular environment and you can just as easily convert a string from one encoding to the other. The String class provides methods for such purposes as well as java.nio.charset package.

So, Peter, how come you say all String in Java are UTF-16?

The example that I wrote is a way to convert a String from whatever format it is into ASCII format. I assumed tha format is UTF-X, not implying by this that is always the case.

Another option to convert a String from one enconding to another is the use of java.nio.charset package by means of using the Encoder and Decoder classes.
 
Ranch Hand
Posts: 547
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Edwin,


the String(byte[]) constructor creates a String object assuming the bytes are in the default platform encoding! so depending on your encoding the string might get messed up. And if you call the getBytes() method again you will not get back ASCII bytes but... plattform default encoding.

UTF-16
i think what Peter is referring to is the "internal encoding", the encoding which is used by the JVM (Peter ?).

String API (1.5):

A String represents a string in the UTF-16 format ...



pascal
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Edwin Dalorzo:
Hi, Peter.

You continue to say that Java String are always in UTF-16. Why do you say that?

The encoding of the Java Strings is determined by the default encoding used by the JVM, declared in the file.encoding property.

Another thing very different is the encoding of the Java files (*.java) which might be UTF. But doest not have anything to do with you your application.

Strings are encoded according to every particular environment and you can just as easily convert a string from one encoding to the other. The String class provides methods for such purposes as well as java.nio.charset package.

So, Peter, how come you say all String in Java are UTF-16?

The example that I wrote is a way to convert a String from whatever format it is into ASCII format. I assumed tha format is UTF-X, not implying by this that is always the case.

Another option to convert a String from one enconding to another is the use of java.nio.charset package by means of using the Encoder and Decoder classes.

Sorry, but this is all totally incorrect. Peter is right, all Java Strings are sequences of chars, and all Java chars are Unicode code-points in UTF-16. (Before Unicode 4.0 it was simpler, a char was just a Unicode character.)

An encoding is a method of converting between a Java String (which consists of chars) and an array of bytes. The String.getBytes(encoding) method maps from chars to bytes, and the new String(bytes, encoding) constructor maps from bytes to chars.

Sometimes people are sloppy and start talking about "UTF-8 strings" when they really have an array of bytes that was encoded using UTF-8, or perhaps a String that was decoded from an array of bytes using UTF-8. But that's misleading and incorrect.

You may not have realized that a file is also an array of bytes. So a Reader converts those bytes into chars, and a Writer converts chars into bytes. You're correct that the default encoding used by Readers and Writers comes from the file.encoding property; you can use different encodings by using an InputStreamReader or an OutputStreamWriter and specifying the encoding.

Likewise the data you get from a socket connection is a stream of bytes; if it is text then it can be converted into String data using some encoding.

If you go to the documentation for that java.nio.charset package that you referred to, you will see it says what I just said. For example a Charset is "A named mapping between sequences of sixteen-bit Unicode characters and sequences of bytes." Its encode() method is "Convenience method that encodes Unicode characters into bytes in this charset." Its decode() method is "Convenience method that decodes bytes in this charset into Unicode characters." There's no string-to-string conversion going on at all. Because Strings don't have encodings. Only sequences of bytes do.
 
Edwin Dalorzo
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I did not know that Peter and Paul.

Thanks for the clarification. I guess I will have to do some research about it.

Thanks!
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic