aspose file tools*
The moose likes Java Micro Edition and the fly likes [J2ME] From Unicode to UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Mobile » Java Micro Edition
Bookmark "[J2ME] From Unicode to UTF-8" Watch "[J2ME] From Unicode to UTF-8" New topic
Author

[J2ME] From Unicode to UTF-8

Adriano Bellavita
Ranch Hand

Joined: Mar 11, 2010
Posts: 37
Hi all,

I have to convert a Unicode String to its UTF-8 encoding.

I'm working with emoticons so:

this is my input:

U+1F600 (or \uD83D\uDE03, chars associated with it)

this should be the output

f0 9f 98 80

How can I get this?

Ty and BR,

Adriano
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14114
    
  16

Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Adriano Bellavita
Ranch Hand

Joined: Mar 11, 2010
Posts: 37
It doesn't work...

If I try this solution, I'm wondering about 2-byte chars. Each chars of "Hello world" String is built with 2 byte.

In my case, my String is "😀": an emoticon!

To better understand what I'm trying to do, I'll make an example:

we can easily convert a String using the getBytes method when the unicode representation of every char of the String is included between 0x0000 and 0xFFFF values.

The "😀" unicode representation overflows: to be char-encoded, we need 2 charts (not one, so more than 2 bytes....) as we can see here:

http://www.utf8-chartable.de/unicode-utf8-table.pl

The "😀" representation is: 0x1F600 (unicode: so something like 0001|F600???) and f0 9f 98 80 (hex)

So I have to represent a single digit ("😀") like it's composed by three (or four???) bytes...

How can I do this?

Adriano Bellavita
Ranch Hand

Joined: Mar 11, 2010
Posts: 37
Jesper de Jong wrote:Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.


Wow.... Give me a moment....

Ok, you use getBytes("UTF-8")...

But then? What you do?

How could you obtain f0 9f 98 83?

If I print the byte array, the "for" returns:

-19
-96
-67
-19
-72
-125

........
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

I don't know how you could get that. You can never get more then two UTF-8 bytes for a Unicode character. When I run that code the bytes in the resulting array are -16, -97, -104, -125. But that's the decimal representation assuming the byte value is signed. The hexadecimal string representation of those bytes is F0, 9F, 98, 83.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

... Well, that's interesting. When I take the six bytes you say you got, and convert them to a String assuming they were UTF-8, I do actually get "\uD83D\uDE03". Here's the code I wrote:



I'm using Java 7. I recall seeing something in some JVM change report about fixing code to use canonical UTF-8, but don't remember when that was. What version of Java are you using?

And just in case we are on the wrong track here, why do you have to convert a String to the hexadecimal representation of its UTF-8 encoding?
Adriano Bellavita
Ranch Hand

Joined: Mar 11, 2010
Posts: 37
Hi,

I'm using Java 1.4, MID profile.

I only want to obtain what this table shows:

http://www.utf8-chartable.de/unicode-utf8-table.pl

If you go to "U+1F600 ... U+1F64F - Emoticons" section, you'll see that Unicode starts from U+1F600 Unicode code point and ends at U+1F6FF.

So I want that each Unicode entry is converted into the relative UTF-8 bytes.

My start point is the Unicode code point (or chars representation), not the String.

My end point is its exadecimal representation.

TY in advance,

Adriano

Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Okay, you're using Java 1.4, which means that you have to use the UTF-16 encoding of the character (as you did) rather than using the character directly, which Java 5 allows you to do.

At any rate it seems that you are generating something which appears to be a UTF-8 version of that character in some way, at least it converts back to the character via new String(bytearray, "UTF-8"). However I still think you need to explain your original problem, rather than trying to discuss a (possibly) failed solution to that unknown problem.
Adriano Bellavita
Ranch Hand

Joined: Mar 11, 2010
Posts: 37
TY for your reply.

Let's take a look to the table showed at this URL

unicode-utf8

I must obtain the result of the third column, strarting from the value of the first one.

That's my problem...
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Let me be more clear, then. The problem I am asking about is the problem to which "I must obtain the result of the third column, strarting from the value of the first one" is your idea of a solution. There may be better ways of solving that unknown problem, but we can't know until we know what that problem is.
 
Don't get me started about those stupid light bulbs.
 
subject: [J2ME] From Unicode to UTF-8