aspose file tools *
The moose likes Java in General and the fly likes Unicode issue(for Arabic) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Unicode issue(for Arabic)" Watch "Unicode issue(for Arabic)" New topic
Author

Unicode issue(for Arabic)

Ash Kondhalkar
Ranch Hand

Joined: Jun 14, 2006
Posts: 43
Hi,

In our bilingual application we need to convert arabic numbers to english.
Arabic numbers to English numbers conversion is a 2 step process,
1. Convert Arabic number to Unicode
2. Convert unicode to English number.
I am stuck up with the first step. I have an algorithm for second step. Does anybody have any utility to convert Arabic to Unicode? I have native2ascii.exe available in JDK but I cannot use this as it requires to generate two text files. The exe will read arabic text from one file and will write unicode to another text file. But I cannot use native2ascii.exe due to some limitation.

Will someone suggest a way out?
Regards
Ashwin
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

There are not different version of Unicode for Arabic and English (see this). The Unicode value will be the same regardless of the encoding a file uses, however it will render differently if saved to a file in a different encoding.

You mention native2ascii.exe - are we talking about properties files or standard text files?
[ December 13, 2006: Message edited by: Paul Sturrock ]

JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Ash Kondhalkar
Ranch Hand

Joined: Jun 14, 2006
Posts: 43
Hi Paul,

Thanks for the quick reply. Yes I understand that there will not be different version of Unicode for English and Arabic. I am sorry if I conveyed so in the message.
The native2ascii.exe takes two text files, source & destination one. If I write arabic characters in source file, run the exe giving path of source and destination file, then destination file will contain the unicode value of arabic characters.
But the issue is I cannot use native2ascii.exe.

Regards
Ashwin
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3742
    
  16
Until now I didn't realise that Arabic languages had their own numbers. Having looked it up on Wikipedia it seems that the only difference is the symbols used to represent each of the digits. If that is correct, then assuming you are holding the Arabic number in a String, can't you just use a HashMap that has the Arabic symbols as keys and the 'Western' symbols as values and then convert each character individually to create a 'Westernized' String ?
Or have I totally misunderstood what you mean by Arabic numbers ?


Joanne
Ash Kondhalkar
Ranch Hand

Joined: Jun 14, 2006
Posts: 43
Hi Joanne,

Thanks for the reply. Actually what you have understood is exactly correct. I am using Eclipse for java development. I tried storing arabic characters in a hashmap and when I tried saving the file it gave message as
" Save could not be completed. Reason: Some characters cannot be mapped using "Cp1252" character encoding. Either change the encoding or remove the characters which are not supported by the "Cp1252" character encoding."
And I doubt will this be a correct way to do it? Even if I change the character encoding of my Editor it means everybody in the team has to do so! Kindly let me know your thoughts about the same.

Regards
Ashwin
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3742
    
  16
As we are only talking about 10 characters can't you just use a simple if/else statement ?

if (character is arabic symbol for 0)
return "0";
else if (character is arabic symbol for 1)
return "1";
else if (character is arabic symbol for 2)
return "2";
else
etc.

I'm not sure I understand your last paragraph. I thought you just wanted to convert a numeric string - where does the editor come into it ?
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
I tried this code in BeanShell and it worked fine:



It printed:
{٨=8, ٩=9, ٢=2, ٣=3, ٠=0, ١=1, ٦=6, ٧=7, ٤=4, ٥=5}

Now, in order to get the correspoding values in European numbers:

You simply do digits.get('٣') and you should get 3 as the result.

PS.
I used the arabic-indic unicode numbers (U+0660-U+0669).
The Eastern Arabic-Indic unicode numbers correspond with U+06F0-U+06F9
[ December 13, 2006: Message edited by: Edwin Dalorzo ]
Ash Kondhalkar
Ranch Hand

Joined: Jun 14, 2006
Posts: 43
Hi Edwin,

Thank you for your reply, it did help me and my problem is solved. Actually I needed to use comma as well so I added one more entry in the hashMap with 1548 as value being passed to the Character constructor!

Regards
Ashwin
 
 
subject: Unicode issue(for Arabic)