• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Character encoding question

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is (hopefully) a really simple question, but there is such a plethora of information on this topic (a lot of it seemingly suspect), that it's hard to separate the good from the bad.

In a nutshell, I need to read text encoded in ISO-8859-1 and save it in a database as UTF-8.

Specifically, I have an xml file that begins with:

<?xml version="1.0" encoding="ISO-8859-1" ?>

I am parsing it like so:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse ( "test.xml" );

I am writing to a mysql database, which I am opening like so:

conn = DriverManager.getConnection (
"jdbc:" + "mysql://" + host + "/" + db
+ "?useUnicode=yes&characterEncoding=UTF-8"
+ "&user=" + user + "&password=" + pass );

which should take care of the database end of things (I think).

What happens in the middle is what concerns me -- how do I convert what I am reading from the ISO-8859-1 encoded xml into strings that can be correctly inserted into my tables?

From what I understand, such a conversion should be possible and perhaps simple -- what I'm looking for is a good idiomatic way of getting the job done.

Thanks in advance for any advice!!
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"Stephen Dedalus", please check your private messages regarding an important administrative matter.

Thank you.
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That particular question is simple: It's the responsibility of the XML parser to convert from bytes to chars, using the encoding specified in the XML document, if it can.

And since you have just passed the name of the file to the parser, it can read the document, discover the encoding, and then continue to read the file using that encoding. However you could have interfered with the process by passing, for example, a FileReader to the parser. If that FileReader happened to use the wrong encoding, then problems might arise.

In general, apart from the XML context, you use an InputStreamReader to convert an InputStream from bytes to chars, and you provide that InputStreamReader with the desired encoding. There are commonly-used ways to avoid that decision and to just use the system's default encoding, such as the FileReader I mentioned above. That isn't always a good thing, particularly with XML documents whose encoding doesn't match that default.

You should really read Oracle's I/O tutorial, particularly the introductory sections about bytes streams and character streams.
 
William Alfred
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Perfect -- that's what I was hoping. As a matter of fact, yes, I'm just passing the name of the file to the parser. Since the encoding is explicitly specified in the xml declaration, (and since it's a well known one), it looks as if I don't need to do anything.

And thanks for the links -- they are quite helpful.

Cheers!
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic