This week's book giveaway is in the OCPJP forum. We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line! See this thread for details.
My understanding is that there are many EBCDIC encodings possible, and you would need to know exactly which EBCDIC encoding is used here. For Arabic, a common choice is apparently Cp420. This is supported in Java - on older JDK versions you may need to include the file charsets.jar in your classpath. Try something like this:
If the encoding is something other than Cp420, you may or may not have to find additional encoding support somewhere. You may find this documentation useful.
"I'm not back." - Bill Harding, Twister
Joined: Jan 24, 2005
Foll is my code. Here I am reading bytes of data,specifying that it is in cp420 (EBCIDC Arabic) format and then writing to the o/p file in UTF-8 format.
However, there seems to be some problem.There are some junk characters getting written to the file esp the one's where the hex value is alphanumeric for ex 8D,8C etc. If the hex value is numeric then the o/p is correct.
What am I doing wrong in the code.
Also I need to insert a carriage return after every bytes of data read.
Originally posted by Nikhil Bansal: But the problem is with Arabic.For example there are hex values like 064E,064F for Arabic characters. When I am sending them as o/p then I am getting some junk characters like ?
If you are sure that those codes are the correct Unicode codes for the characters, then the problem is not in the EBCDIC to Unicode conversion.
Ofcourse you need to have a font that contains those Unicode characters, otherwise you can't display them. Where is your output going, to a Unicode text file? What software are you using to view the output? Are you using a font that contains the Arabic characters?
You have two steps in your little piece of code there. The first reads the bytes and attempts to convert them to chars using the CP420 charset, and the second converts those chars to bytes using the UTF-8 charset and writes them out.
Personally I would have used an InputStreamReader that specified CP420 and an OutputStreamWriter that specified UTF-8 rather than the low-level byte-fiddling that you have there. But that shouldn't matter, because it should end up with the same result.
The problem is that you have "?" appearing in the final result where it should not appear. And this always means an encoding or decoding failure. So, which of the two steps is producing these ? characters?
I've already checked the date of the post, and i know it's old ... But i was facing the same problem so i took some time to find a solution.
I know others will google to find this post so i didn't want them to take some time like me to fix it.