In our bilingual application we need to convert arabic numbers to english. Arabic numbers to English numbers conversion is a 2 step process, 1. Convert Arabic number to Unicode 2. Convert unicode to English number. I am stuck up with the first step. I have an algorithm for second step. Does anybody have any utility to convert Arabic to Unicode? I have native2ascii.exe available in JDK but I cannot use this as it requires to generate two text files. The exe will read arabic text from one file and will write unicode to another text file. But I cannot use native2ascii.exe due to some limitation.
There are not different version of Unicode for Arabic and English (see this). The Unicode value will be the same regardless of the encoding a file uses, however it will render differently if saved to a file in a different encoding.
You mention native2ascii.exe - are we talking about properties files or standard text files? [ December 13, 2006: Message edited by: Paul Sturrock ]
Thanks for the quick reply. Yes I understand that there will not be different version of Unicode for English and Arabic. I am sorry if I conveyed so in the message. The native2ascii.exe takes two text files, source & destination one. If I write arabic characters in source file, run the exe giving path of source and destination file, then destination file will contain the unicode value of arabic characters. But the issue is I cannot use native2ascii.exe.
Until now I didn't realise that Arabic languages had their own numbers. Having looked it up on Wikipedia it seems that the only difference is the symbols used to represent each of the digits. If that is correct, then assuming you are holding the Arabic number in a String, can't you just use a HashMap that has the Arabic symbols as keys and the 'Western' symbols as values and then convert each character individually to create a 'Westernized' String ? Or have I totally misunderstood what you mean by Arabic numbers ?
Joined: Jun 14, 2006
Thanks for the reply. Actually what you have understood is exactly correct. I am using Eclipse for java development. I tried storing arabic characters in a hashmap and when I tried saving the file it gave message as " Save could not be completed. Reason: Some characters cannot be mapped using "Cp1252" character encoding. Either change the encoding or remove the characters which are not supported by the "Cp1252" character encoding." And I doubt will this be a correct way to do it? Even if I change the character encoding of my Editor it means everybody in the team has to do so! Kindly let me know your thoughts about the same.
Joined: Aug 05, 2005
As we are only talking about 10 characters can't you just use a simple if/else statement ?
if (character is arabic symbol for 0) return "0"; else if (character is arabic symbol for 1) return "1"; else if (character is arabic symbol for 2) return "2"; else etc.
I'm not sure I understand your last paragraph. I thought you just wanted to convert a numeric string - where does the editor come into it ?
Now, in order to get the correspoding values in European numbers:
You simply do digits.get('٣') and you should get 3 as the result.
PS. I used the arabic-indic unicode numbers (U+0660-U+0669). The Eastern Arabic-Indic unicode numbers correspond with U+06F0-U+06F9 [ December 13, 2006: Message edited by: Edwin Dalorzo ]
Joined: Jun 14, 2006
Thank you for your reply, it did help me and my problem is solved. Actually I needed to use comma as well so I added one more entry in the hashMap with 1548 as value being passed to the Character constructor!