• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Mixed Arabic and Latin text (with a number) in a String causes issues

 
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

I hope someone knows the answer to this issue!

I've got an application in Java which calls another app via a callable statement, passing various parameters across. One of these parameters is a buffer containing transaction information.

Now we're working on an Arabic proof of concept, and when I create this buffer containing both Arabic, Latin and numeric data, the order of the string is messed up.

E.g.

The data may be:

I
AED
17
شارة النقاط
1.7
101010
Test seventeen
17D

This appears in the buffer (with some padding) as:
IAED17شارة النقاط 1.7 101010Test seventeen 17D

As you can see, the order has changed - the numeric fields appear to have been included with the Arabic text.

This causes the API we're calling to break, as you can imagine - expected fields are not appearing in the right place.

The string is being built as follows:
StringBuffer paramBuffer = new StringBuffer(BUFFER_INITIAL_CAPACITY);
while (iterator.hasNext()){
String fieldName = iterator.next().toString();
// Get Object using field name from Vector.
FieldApi field = (FieldApi)fields.get(fieldName);

tempCtr = tempCtr + field.getLength();
if (field.getLength() == 0) {
break; // We have reached end of the fields
}

//Add each formatted field to param.
paramBuffer.append(field.getString());
}
}

Now, I suspect that I could get around this by constructing a byte array and using a setBytes method on my callableStatement. However, the set up of my JDBC connection means that I can't use setBytes (translateBinary is set to true, giving a data type mismatch exception), and this set up is something I can't change.

I'm sure this is a problem with the fact that Arabic languages are right to left rather than left to right, and the String I create attempts to format it in this manner (and gets it wrong). Is there a way I can tell the String to leave things exactly as they're added, rather than messing around with them?

Any help would be appreciated! :-)

Cheers,

Dan
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You don't say how you are creating and loading this buffer. So it's hard to comment on that aspect of the question. Perhaps you're just concatenating the strings?

You also don't say how you are looking at the buffer to get the text you are complaining about. Since Arabic (as you know) is read from right to left, software which displays text with mixed Arabic and Latin characters has to make decisions about what's RTL and what's LTR. So what you see there may actually not be in the same order as what's actually in the buffer.
 
Dan Lingard
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul,

Thanks for your response. I create the buffer by appending the various field values to a StringBuffer, and then doing a .toString on it (code snippet in my original post). I've gone so far as to create a byte array working through the characters in each field and then later creating a string from the array, but with the same effect when I come to inspect it in debug (although I think the order is maintained correctly within the array).

Something I have noticed is that if I paste the buffer into MS Word, the order is correct - if I post it into notepad, the order is messed up. However, I have inspected the buffer when it's passed to the stored procedure via a callable statement, and unfortunately it's this messed up version of the buffer that makes its way across the JDBC connection.

Any options or light you (or anyone else) can shed on this would be greatly appreciated.

Dan
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is 3 years late, but may help others.
In Arabic (and I think also Modern Hebrew's number system, they use 2, a old Hebrew one for religious reasons and the European borrowed one for everyday and complex math).. anyway)..
In Arabic, while the letters to r-t-l, the numbers go r-t-l.

In (trying) to write Unicode apps, I find that OS support (even with windows 7 sp1) is lacking for mixed direction text. (where as when editing Arabic/Hebrew, text the left arrow on the keyboard makes the cursor move to the right...)
I am not sure if Java has it's own Unicode handling logic, or if it simply passes that to the host OS for processing.
I am also not sure if Microsoft has Unicode in the Common Controls that is in the OS set of share DLLs, or if Unicode text handling is in the often used Visual C RTL that is fixed with a given app when it is built at compile time.

but to your original problem... I don't have an answer as I have not used jdbc in years, but..

because numbers can go either way, I think the text handler is writing 1.7 and 101010 r-t-l because the previous text was Arabic, and in Arabic, that is how the numbers would be written. it is only when the Latin text is encounter that it realizes the number must now go l-t-r. but as it can not double print "Test seventeen" on top of text on top "101010 1.7", it must high tail it to the next open space. As the stirng begain in latin and it has to write in latin, I assume its logic is to jump to the right side of the string and display "Test seventeen17D"

read this --> http://danielschereck.com/wp2002arabia/wp-arabicnumbers.htm
maybe this too --> http://www.unicode.org/reports/tr9/

I -> latin to the right
AED -> latin to the right
17 -> latin was previous, so to the right
شارة النقاط <- arbaic to the left
1.7 <- arabic was previous, so to the left
101010 <- arabic was previous, so to the left
Test seventeen -> latin to the right, crap, no free space, can't double print, hop the "end"
17D -> latin was previous, so to the right

IAED17شارة النقاط 1.7 101010Test seventeen 17D


In short, better OS and library support for applications is needed for fixing this.. or if it is fixed, better documentation.
 
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

The Windows® command line is notorious for its small range of supported characters.
 
Dan Lingard
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all - thanks for the reply, even if it was a little after the fact!

In the end I reported this as a bug to the JDBC provider, and they fixed it to work as I thought it should... Whether this then broke the driver for others I can't say!
reply
    Bookmark Topic Watch Topic
  • New Topic