I've been scratching my head over this one for months; I would have thought it simple enough but hey. The problem I'm having is that I want to input a UTF8 code, say \u3088, into a JOptionPane.showInput prompt, and then have a showMessageDialog come back with よ, the Japanese character. I've read through and adapted code from this tutorial:
UTF encoding Tutorial and have come up with this:
So when I breakpoint it where strTest1 is set, I see "日本語文字列" which is the correct Japanese text for those utf8 char codes; But, and this is the bit that's causing me the headache, when I enter \u3088, or \u65e5 into the JOptionPane.showInput dialog, it comes back with \u3088. The problem is obviously centered around reading in the string from the showInput dialog box, as when its set explicitly in the code, it works ok.
Why is this?
The text files write "日本語文字列" (test1.txt) and "\u3088" (test2.txt) or whatever you type in; It's frustrating the hell out of me because I sense a layer of obfuscation that I'm not able to get round yet.
Do I have to go down to the byte level?
Is it utf8 / utf16 related?
Looking at the text files with a hex editor reveals that there's no byte order marker either, not that utf8 needs it explicitly, but yet again, after coming up blank when I thought I had a new angle on it, I've hit a wall... Any insight would be appreciated...