This week's book giveaway is in the Cloud/Virtualizaton forum.
We're giving away four copies of Mesos in Action and have Roger Ignazio on-line!
See this thread for details.
Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Unicode issue(for Arabic)

 
Ash Kondhalkar
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

In our bilingual application we need to convert arabic numbers to english.
Arabic numbers to English numbers conversion is a 2 step process,
1. Convert Arabic number to Unicode
2. Convert unicode to English number.
I am stuck up with the first step. I have an algorithm for second step. Does anybody have any utility to convert Arabic to Unicode? I have native2ascii.exe available in JDK but I cannot use this as it requires to generate two text files. The exe will read arabic text from one file and will write unicode to another text file. But I cannot use native2ascii.exe due to some limitation.

Will someone suggest a way out?
Regards
Ashwin
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are not different version of Unicode for Arabic and English (see this). The Unicode value will be the same regardless of the encoding a file uses, however it will render differently if saved to a file in a different encoding.

You mention native2ascii.exe - are we talking about properties files or standard text files?
[ December 13, 2006: Message edited by: Paul Sturrock ]
 
Ash Kondhalkar
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Paul,

Thanks for the quick reply. Yes I understand that there will not be different version of Unicode for English and Arabic. I am sorry if I conveyed so in the message.
The native2ascii.exe takes two text files, source & destination one. If I write arabic characters in source file, run the exe giving path of source and destination file, then destination file will contain the unicode value of arabic characters.
But the issue is I cannot use native2ascii.exe.

Regards
Ashwin
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Until now I didn't realise that Arabic languages had their own numbers. Having looked it up on Wikipedia it seems that the only difference is the symbols used to represent each of the digits. If that is correct, then assuming you are holding the Arabic number in a String, can't you just use a HashMap that has the Arabic symbols as keys and the 'Western' symbols as values and then convert each character individually to create a 'Westernized' String ?
Or have I totally misunderstood what you mean by Arabic numbers ?
 
Ash Kondhalkar
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Joanne,

Thanks for the reply. Actually what you have understood is exactly correct. I am using Eclipse for java development. I tried storing arabic characters in a hashmap and when I tried saving the file it gave message as
" Save could not be completed. Reason: Some characters cannot be mapped using "Cp1252" character encoding. Either change the encoding or remove the characters which are not supported by the "Cp1252" character encoding."
And I doubt will this be a correct way to do it? Even if I change the character encoding of my Editor it means everybody in the team has to do so! Kindly let me know your thoughts about the same.

Regards
Ashwin
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As we are only talking about 10 characters can't you just use a simple if/else statement ?

if (character is arabic symbol for 0)
return "0";
else if (character is arabic symbol for 1)
return "1";
else if (character is arabic symbol for 2)
return "2";
else
etc.

I'm not sure I understand your last paragraph. I thought you just wanted to convert a numeric string - where does the editor come into it ?
 
Edwin Dalorzo
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried this code in BeanShell and it worked fine:



It printed:
{٨=8, ٩=9, ٢=2, ٣=3, ٠=0, ١=1, ٦=6, ٧=7, ٤=4, ٥=5}

Now, in order to get the correspoding values in European numbers:

You simply do digits.get('٣') and you should get 3 as the result.

PS.
I used the arabic-indic unicode numbers (U+0660-U+0669).
The Eastern Arabic-Indic unicode numbers correspond with U+06F0-U+06F9
[ December 13, 2006: Message edited by: Edwin Dalorzo ]
 
Ash Kondhalkar
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Edwin,

Thank you for your reply, it did help me and my problem is solved. Actually I needed to use comma as well so I added one more entry in the hashMap with 1548 as value being passed to the Character constructor!

Regards
Ashwin
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic