File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes UTF8 java + arabic Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "UTF8 java + arabic" Watch "UTF8 java + arabic" New topic
Author

UTF8 java + arabic

Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
Hi - I need to get arabic into a java string. Have saved the arabic as UTF-8, wondering about the correct way to get that into a string? googling gives me lots of suggestions, so just wondering which is the correct one.

Thanks

L
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
How is this data currently stored? Binary in a database?


Java Regular Expressions
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
text file - as utf-8
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Ok, then you want to be concerned with how you're reading in that data. that is, Use a ByteBuffer with UTF-8 encoding, as following:


[ September 15, 2005: Message edited by: Max Habibi ]
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
thanks u r a lifesaver. L
Lucy Sommerman
Ranch Hand

Joined: Nov 25, 2004
Posts: 61
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L
Grahamsmit Smith
Greenhorn

Joined: Nov 09, 2005
Posts: 2
Hi:

I am having some difficulty with UTF-8 encoded
chracaters in Java.

My XML has a question which has cyrillic characters. My Java servlet renders this as HTML with a form for the reply.
The HTML produced
displays OK in the browser (the response type on the
Java servelet has to be set to "text/html;
charset=UTF-8" for this to work).

I have to send cyrillic characters back in the
response to the question, in a text field on the form.
The browser is sending back a byte stream (which I am
printing here as hex): d0b3d0bed180d0bed0b4 (this is a
cyrillic word correctly coded as utf-8).

However, on collecting the response (using
request.getParameterValues(fieldname))the servlet
returns the byte stream: d0b3d0bed13fd0bed0b4.
A mistake in the fifth byte!

Has anyone heard of this problem? I suspect the
problem is in the JAVA UTF-8 converter.

Regards

Graham
Grahamsmit Smith
Greenhorn

Joined: Nov 09, 2005
Posts: 2
I now know the answer, thanks to Bruno Van Haetsdaele .

Before calling request.getParameterValues(fieldname));
one should call request.setCharacterEncoding("UTF-8");

Grahamsmit
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Originally posted by Lucy Sommerman:
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L


Strings are sequences of characters which are 16-bit (UFT-16). You can (and probably need to) convert the String to byte array or write to stream to plug it into "something else". In both cases character encoding can be specified.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: UTF8 java + arabic
 
Similar Threads
Unicode issue(for Arabic)
URGENT! -- Retrieving data from Oracle database?
Problems while trying to write arabic letters to a file
JTextField and Arabic
Accepting the arabic character from keyboard in swing application !!