wood burning stoves 2.0*
The moose likes Jython/Python and the fly likes Passing UTF-8 strings to Jython PythonInterpreter exec function is not working Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Languages » Jython/Python
Bookmark "Passing UTF-8 strings to Jython PythonInterpreter exec function is not working" Watch "Passing UTF-8 strings to Jython PythonInterpreter exec function is not working" New topic
Author

Passing UTF-8 strings to Jython PythonInterpreter exec function is not working

Suresh Manjunath
Greenhorn

Joined: Nov 14, 2012
Posts: 1
I am trying to use Jython org.python.util.PythonInterpreter to execute some python code within Java.
This input python string is from an external source and thus could (or will?) contain UTF-8, SJIS Japanese etc.

I am always getting the output from Python as ??? instead of any meaningful characters for the Japanese input. It is not a problem with print, I had tried writing Python code to print the exact Hex dump and it was 0x3F 0x3F 0x3F.
Printing the String in Java gives correct output.
The Python code also works correctly.

In short, I need to get the Japanese characters to print in Python when passed in from Java via PythonInterpreter.


Thanks!

Code:

Output:

15
???
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1050
    
  10

Warning - this response presents a frig that should not be used unless absolutely necessary and I hate it. Those of you of a sickly or nervous disposittion please stop reading now.

The problem seems to be that the exec() command passes the content of the Japanese string to the python interpreter as bytes created using one of the single byte character encodings but then uses the bytes as if they are UTF-8 bytes. The frig is illustrated by

What this does is to get the bytes of the string using utf-8 and then treat them as bytes of string encoded as iso-8859-1!!! From experience I know that iso-8859-1 maps all the 256 byte values to and from characters without loss.

I don't know enough about the PythonInterpreter class but on the surface it seems flawed when it comes to character encoding. There has to be a better way of dealing with this.
 
 
subject: Passing UTF-8 strings to Jython PythonInterpreter exec function is not working