This week's book giveaway is in the Mac OS forum.
We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line!
See this thread for details.
The moose likes Java in General and the fly likes reading Arabic from a properties file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "reading Arabic from a properties file" Watch "reading Arabic from a properties file" New topic
Author

reading Arabic from a properties file

Skip Cole
Ranch Hand

Joined: Jan 05, 2001
Posts: 175
Hi,

When reading Arabic out of a UTF-8 properties file, the Arabic gets garbled.

I've seen some postings that indicate that one has to use the String.getBytes method to transform the garbled string into proper Arabic.

Has anyone does this, and have a couple of working lines of code?

I've played with it, but haven't gotten it to work. Maybe I'm missing something obvious.

Thanks in advance,
Skip


If you love me, you will visit docs.opensimplatform.org
(FYI, Getting it tattooed on is a bit much.)
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Properties files have a problem with Arabic. See this.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
It is not a problem specific to arabic, as far as I understand.

The thing is that properties files cannot use Unicode character directly. But you could write the strings using \Uxxxx escape sequence:

Like this



This should work. There is also a plugin for Eclipse that let you write the properties file in your language and then convert it automatically to Unicode escape sequences. And of course, there is also native2ascii tool that comes with the JDK.
[ December 14, 2006: Message edited by: Edwin Dalorzo ]
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


It is not a problem specific to arabic, as far as I understand.

Indeed. My response was a little glib. If you look at the source for Properties load method you'll find it uses an InputStreamReader that uses the ISO Latin-1 charset. So its a problem with all languages that contain characters not supported by that charset.
Skip Cole
Ranch Hand

Joined: Jan 05, 2001
Posts: 175
Thanks for these replies.

I'm going to read the text out of a database instead.

Its a lot of text, and its not just numbers, and converting it to escape sequences seems a path wrought with peril. Arabic is cursive, and the shape of the letter depends on if it is the first, middle or last character. Too often I've seen Arabic get represented with each character disjoint, and in the shape that it has when it is the first letter. This is not legible to an Arabic reader.

Java you cruel mistress!!

I'm bummed. I'm one of Java's biggest fans, but this lack of functionality (we should get to dictate the encoding of properties file) really bums me out.

Don't worry. I'll survive. Its time to hit the eggnog ;-)

Skip
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Originally posted by Skip Cole:
Arabic is cursive, and the shape of the letter depends on if it is the first, middle or last character. Too often I've seen Arabic get represented with each character disjoint, and in the shape that it has when it is the first letter. This is not legible to an Arabic reader.
Sure, but that's a font problem, isn't it? Unicode doesn't have separate versions of ARABIC LETTER SHEEN for first, middle, and last position, the font is supposed to deal with that. Somehow.
we should get to dictate the encoding of properties file
I notice that Java 5 lets you use XML-formatted properties files. I haven't tried them or even read up on them but I would assume that since XML deals so nicely with Unicode, all those ugly restrictions that applied to the plain-text versions would have gone away and you could put anything you liked into the XML without having to escape it for Java.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
JDK 6 also adds load() and store() methods that take a Reader and Writer respectively. These allow you to specify whatever encoding you want, without using the XML stuff.

If you're stuck with an earlier JDK and want to specify your own encoding, just abandon Properties and use a plain Map instead. It's not hard to write a method to read each line in a file, trim, ignore comments, look for the first '=' to delimit between key and value, then split the line into two parts and put them in the map. It's a little more complex if you want to allow multiline values or special escape sequences, but shouldn't be too bad.
[ December 14, 2006: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Edwin Dalorzo
Ranch Hand

Joined: Dec 31, 2004
Posts: 961
Have you tried with a java.util.ListResourceBundle? I did it and it worked fine for me even using unicode Strings with arabic characters.

The only thing I had to change was the encoding of my source files in order to support the unicode characters I was defining in the Java file.


Java you cruel mistress!!


In JDK 1.6 the java.util.PropertyResourceBundle let you use a java.io.Reader to load the properties file. This of course would let you read whatever encoding that you want.

Also, if you are working with Eclipse you could get a Resource Bundle Editor plugin that does all the dirty work for you converting all the strings to unicode escape sequences.
[ December 14, 2006: Message edited by: Edwin Dalorzo ]
 
GeeCON Prague 2014
 
subject: reading Arabic from a properties file