This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes XML -> SAX -> MYSQL conversion losing character encoding... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML -> SAX -> MYSQL conversion losing character encoding..." Watch "XML -> SAX -> MYSQL conversion losing character encoding..." New topic
Author

XML -> SAX -> MYSQL conversion losing character encoding...

Ezra Simon
Greenhorn

Joined: Sep 18, 2004
Posts: 9
Hi,

I have a big UTF-8 xml file that contains english, french, german, and japanese text:

<?xml version="1.0" encoding="UTF-8"?>

I parse through it with a sax parser in a standard way:

SAXParser parser = new SAXParser();
parser.parse(xmlFile);

and it get inserted into an mySQL database via a prepared statement:

pstmt = con.prepareStatement("INSERT INTO...

at some point the japanese text loses it encoding and end up in the database as a bunch of question marks "???". Stangly though, the non-english, french and german characters are fine.

I am pretty sure it loses the encoding between XML and Java (not Java and mySQL) becuase when I try printing to an HTML page before going to the DB, the smae problem occurs.

Any ideas? Do I maybe need to explicity set the encoding of the inputSource?

thanks for any help,

E.
[ January 23, 2005: Message edited by: Ezra Simon ]
Ezra Simon
Greenhorn

Joined: Sep 18, 2004
Posts: 9
Actually - after some further testing this seems to be a mySQL problem - not XML. I will post the specifics, but if anyone has any info it would be helpful.

thanks.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: XML -> SAX -> MYSQL conversion losing character encoding...
 
Similar Threads
SAXException: Invalid byte 2 of 2-byte UTF-8 sequence
Confusion in Java encoding
Problems reading an xml from a url
multiple language support in one XML
Problems parsing XML if an "&" occurs