This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
On a Mac OSX system the application takes input from a file, keyboard, or from pasting into a JTextArea. The input is processed from the characters of the JTextArea's javax.swing.text.Document. All works fine until Hebrew characters are entered (0590-05ff in Unicode).
Pasting of Unicode Hebrew (from Firefox) looks correct, but the characters of the document are only single bytes, 0x3F, when taken from the document. When the JTextArea is filled
from an input file, via a CharacterStream, the Hebrew characters are replaced by nonsense. A Hebrew keyboard typing into the JTextArea produces a correct appearance, but the results returned from the Document are still short, 8 bit.
I've read somewhere that some AWT components aren't really Unicode compatible, but that Swing components are.
The '3F' character is a question mark. This suggests to me that at some point in the process your characters are undergoing an encoding to bytes, using a charset which doesn't support those Hebrew characters.
I don't see any place in your description which suggests a conversion from chars to bytes, but then I can't exactly tell what the process is. I started to write a test program (although I don't have a Mac) but then I realized I didn't know exactly what you were doing with the Document. So could you post a small test program which demonstrates the problem?
Thanks for your observation. I thought the '?' was because the system font didn't support Hebrew characters. My app shouldn't convert char to bytes, HOWEVER, the analysis I gave did foolishly use new String(char x).getBytes() rather than getbytes("UTF-16"). When I did this, the Hebrew characters were recognizable from the JTextArea.
I'm now searching for a char to bytes operation in my software. Thanks for the tip that '?' is more than a font failure indication.