my dog learned polymorphism*
The moose likes Java in General and the fly likes UTF8 java + arabic Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "UTF8 java + arabic" Watch "UTF8 java + arabic" New topic
Author

UTF8 java + arabic

Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
Hi - I need to get arabic into a java string. Have saved the arabic as UTF-8, wondering about the correct way to get that into a string? googling gives me lots of suggestions, so just wondering which is the correct one.

Thanks

L
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
How is this data currently stored? Binary in a database?


Java Regular Expressions
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
text file - as utf-8
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Ok, then you want to be concerned with how you're reading in that data. that is, Use a ByteBuffer with UTF-8 encoding, as following:


[ September 15, 2005: Message edited by: Max Habibi ]
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
thanks u r a lifesaver. L
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L
Grahamsmit Smith
Greenhorn

Joined: Nov 09, 2005
Posts: 2
Hi:

I am having some difficulty with UTF-8 encoded
chracaters in Java.

My XML has a question which has cyrillic characters. My Java servlet renders this as HTML with a form for the reply.
The HTML produced
displays OK in the browser (the response type on the
Java servelet has to be set to "text/html;
charset=UTF-8" for this to work).

I have to send cyrillic characters back in the
response to the question, in a text field on the form.
The browser is sending back a byte stream (which I am
printing here as hex): d0b3d0bed180d0bed0b4 (this is a
cyrillic word correctly coded as utf-8).

However, on collecting the response (using
request.getParameterValues(fieldname))the servlet
returns the byte stream: d0b3d0bed13fd0bed0b4.
A mistake in the fifth byte!

Has anyone heard of this problem? I suspect the
problem is in the JAVA UTF-8 converter.

Regards

Graham
Grahamsmit Smith
Greenhorn

Joined: Nov 09, 2005
Posts: 2
I now know the answer, thanks to Bruno Van Haetsdaele .

Before calling request.getParameterValues(fieldname));
one should call request.setCharacterEncoding("UTF-8");

Grahamsmit
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Originally posted by Lucy Sommerman:
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L


Strings are sequences of characters which are 16-bit (UFT-16). You can (and probably need to) convert the String to byte array or write to stream to plug it into "something else". In both cases character encoding can be specified.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: UTF8 java + arabic
 
Similar Threads
URGENT! -- Retrieving data from Oracle database?
JTextField and Arabic
Problems while trying to write arabic letters to a file
Accepting the arabic character from keyboard in swing application !!
Unicode issue(for Arabic)