• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

How to find out the UTF-8 value of any String (Arabic or similar languages) at runtime in Java

 
Greenhorn
Posts: 9
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've searched for hours on internet hoping to find a way, but i didn't find anything!
I want to know if there is any way to find out the UTF-8 value of any string ? For example, in JTextField , the user enters a word of my local language
لاهور

Now is there any way to find out UTF-8 value of لاهور ? Something like this \u0644 !
This is my first post in this forum.
If you can't see the word which i posted then check this page www.thekawish.com/‎
 
Bartender
Posts: 1111
Eclipse IDE Oracle VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
would urlencoder be of any use to you?
 
Marshal
Posts: 69810
277
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

That isn't UTF-8. That is UTF-16. You can read about UTF-8 on the Unicode website, Wikipedia, or Joel Spolsky's website and you can find charts here.
Yes, you can print out the characters of a String very simply

[campbell@campbellscomputer java]$ java CharacterPrinter "Campbell Ritchie" e=mc² ر "ℙ(INT × INT)"
C = 0x0043
a = 0x0061
m = 0x006d
p = 0x0070
b = 0x0062
e = 0x0065
l = 0x006c
l = 0x006c
= 0x0020
R = 0x0052
i = 0x0069
t = 0x0074
c = 0x0063
h = 0x0068
i = 0x0069
e = 0x0065
e = 0x0065
= = 0x003d
m = 0x006d
c = 0x0063
² = 0x00b2
ر = 0x0631
ℙ = 0x2119
( = 0x0028
I = 0x0049
N = 0x004e
T = 0x0054
= 0x0020
× = 0x00d7
= 0x0020
I = 0x0049
N = 0x004e
T = 0x0054
) = 0x0029

Of course that isn't UTF-8 but UTF-16. You should be able to use the formulae given in the links to convert UTF-16 to the n bytes required for UTF-8.
 
Sarmad Thebo
Greenhorn
Posts: 9
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i wana make a desktop app ! My idea is to compare my local language words,
if it was english, it would have been a piece of cake ! But since i can't directly use an arabic or similar words in my program
for example:


I'm sure you can't do this ^

 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Sarmad Thebo wrote:
Now is there any way to find out UTF-8 value of لاهور ? Something like this \u0644 !



I'm not sure you really mean UTF8; you would seem to actually mean UTF16 code points. The UTF16 code points values can be obtained by iterating through the string and formatting the characters using something along the lines of


If you do mean UTF8 then you can get the utf-8 bytes using

and then you can iterate through the bytes converting them to the HEX.

Edit: Once again I'm far too slow!
 
Campbell Ritchie
Marshal
Posts: 69810
277
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Of course you can use the equals() method on Arabic text.
But text fields are not intended for entering options. They are intended for entering data. More a case of name = nameField.getText();
You would usually use buttons or menu items for options.

Something went wrong with the copy and paste on the Arabic text when I ran that little program. The original on the terminal looked a lot better. Where the = is at the very left, a space has been missed out by the software.ery
 
Sarmad Thebo
Greenhorn
Posts: 9
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Of course you can use the equals() method on Arabic text.
But text fields are not intended for entering options. They are intended for entering data. More a case of name = nameField.getText();
You would usually use buttons or menu items for options.

Something went wrong with the copy and paste on the Arabic text when I ran that little program. The original on the terminal looked a lot better. Where the = is at the very left, a space has been missed out by the software.ery




dude this language is not arabic ! like is said its my local language, its called Sindhi .. i searched on the internet for Sindhi literals of unicode and there appeared be onlly UTF-8 written on every page i visited ! Now i have to see the differences between them ! *sigh*

i know about controls,

see this,


The code should be something like this, I'm too lazy to write the main method but you can get it
i have tried this and this is not working for me !
i intend to make a text to speech based on comparisons of words, if the word matches a method containing querry would run, playing an audio file.
 
Marshal
Posts: 25677
69
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Arabic (the script), Sindhi (the language), it doesn't matter. You don't need to mess with UTF-8 or UTF-16 for ordinary String processing since both of those are fully supported by Unicode and Java fully supports Unicode.

Now you might have to make sure you've chosen a suitable font for your Swing components. That would be one which supports all of the characters you need to use. (Google "Sindhi font" if you don't already have one.) You should also make sure your keyboard is configured to support those characters too.

You may well find that Sindhi web pages are encoded in UTF-8; there's nothing special about that, I always encode my web pages in UTF-8 even though they are in English. UTF-8 can represent all Unicode characters.
 
Sarmad Thebo
Greenhorn
Posts: 9
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Arabic (the script), Sindhi (the language), it doesn't matter. You don't need to mess with UTF-8 or UTF-16 for ordinary String processing since both of those are fully supported by Unicode and Java fully supports Unicode.

Now you might have to make sure you've chosen a suitable font for your Swing components. That would be one which supports all of the characters you need to use. (Google "Sindhi font" if you don't already have one.) You should also make sure your keyboard is configured to support those characters too.

You may well find that Sindhi web pages are encoded in UTF-8; there's nothing special about that, I always encode my web pages in UTF-8 even though they are in English. UTF-8 can represent all Unicode characters.



Thanks this really helped to clear my confusion, i wish i could give you a +10.
Yes i will download the Sindhi fonts now ! Thank you once again,
 
Campbell Ritchie
Marshal
Posts: 69810
277
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As Paul C says, it doesn't matter which language you use.

The multiple if blocks and use of addActionListener(this); even if you find them in many books, do not look at all good to me.
There is an object‑oriented way to find your audio files. A Map.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    Bookmark Topic Watch Topic
  • New Topic