This is (hopefully) a really simple question, but there is such a plethora of information on this topic (a lot of it seemingly suspect), that it's hard to separate the good from the bad.
In a nutshell, I need to read text encoded in ISO-8859-1 and save it in a database as UTF-8.
Specifically, I have an xml file that begins with:
<?xml version="1.0" encoding="ISO-8859-1" ?>
I am parsing it like so:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document
doc = db.parse ( "test.xml" );
I am writing to a mysql database, which I am opening like so:
conn = DriverManager.getConnection (
"jdbc:" + "mysql://" + host + "/" + db
+ "?useUnicode=yes&characterEncoding=UTF-8"
+ "&user=" + user + "&password=" + pass );
which should take care of the database end of things (I think).
What happens in the middle is what concerns me -- how do I convert what I am reading from the ISO-8859-1 encoded xml into strings that can be correctly inserted into my tables?
From what I understand, such a conversion should be possible and perhaps simple -- what I'm looking for is a good idiomatic way of getting the job done.
Thanks in advance for any advice!!