• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

multilingual JSP/Servlets

 
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I'm new here and not sure if anyone asked this question before. But I'm developing a multilingual site (from english to japanese) and I'm thinking of storing all text and templates in the mySQL database for easy modification. So, the question is, is there anything I should consider/know/read before I proceed? I've read that mySQL supports different char-sets and so does java. So are there any unknown issues I can stumble on?
Thanks,
D.
 
Ranch Hand
Posts: 1258
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My impression is that mySQL has really lagged on the whole i18n thing in the last few years. They've gotten a bit better, but still .... I'm not impressed.
I've thought about the same issue, and one possible solution is to extract (or normalize) your displayable strings in your database to a separate table.
So, if in one language you have:
Table PRODUCTS
with columns NAME, PRODUCT_ID, DESCRIPTION, PRICE
You could extract the displayable things so that you have ...
Table PRODUCTS
with columns PRODUCT_ID, PRICE
Table PRODUCTS_EN
with columns NAME, DESCRIPTION
Table PRODUCTS_JP
with columns NAME, DESCRIPTION
ad nausea.
That's just my impression though. Microsoft's Commerce Server I believe sets up their databases so it's internationalized most likely similar to this, but I'm not quite sure.
ALSO, mySQL's collation (sorting) routines will most likely freak out still if you try and do comparisons in WHERE clauses on internationalized strings. In some cases (you'll have to check their website for sure), I don't think unicode is totally supported. So, you may just have to store your strings as some byte array or blob or whatever -- which means your selects will be rather rudimentary if you need to do WHERE clauses for japanese strings and so forth.
I'm rambling -- I hope this gets you a bit started. I'm afraid with issues like this, pleas for guidance and help will mostly fall on deaf ears here.
 
Daniil Sosonkin
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanx, i got it figured out (i think). although some japanese characters still don't show up after storing in mySQL. here is what I do:
1) Set content-type in JSP to desired charset (UTF-8, ISO-8859-1, etc...)
2) Before stuffing a value into database i convert it into ISO:
String value = new String(submitted.getBytes("ISO-8859-1));
3) Store...
4) Retreave and convert back into desigred charset:
String value = new String(rs.getString("value").getBytes("ISO-8859-1"), charset);
5) display in JSP
this assuming you know the desired charset, and I do.
do you know if Java fully supports UTF-8 or do I need some special extended library & stuff?
thanx
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
although some japanese characters still don't show up after storing in mySQL.
Is that to say that many/most Japanese characters do show up correctly? Are you sure they're the right characters? Have you showed the result to someone who understand Japanese (assuming you don't)?
Your code looks rather strange to me, so I'm initially rather skeptical.
1) Set content-type in JSP to desired charset (UTF-8, ISO-8859-1, etc...)
Sounds good.
2) Before stuffing a value into database i convert it into ISO:
String value = new String(submitted.getBytes("ISO-8859-1));

That sounds, at best, unnecessary, and more likely, seriously wrong. Well, at best it might be a necessary hack to compensate for something seriously wrong elsewhere, but I would really hope that it would be possible to fix the problem elsewhere, not here.
Java Strings are sequences of Unicode characters, period. You can't change the encoding of a String. You can change the encoding used when you read the String from some binary source, or when you write it somewhere else. If you find it necessary to use getBytes() and create a new String using a different encoding, that's a strong indication the original Sting was not read correctly in the first place. I'd say for simplicity you should configure the JSP to UTF-8, period. ISO-8859-1 can not, will not transfer Japanese characters correctly.
I suspect that the real problem is with the database though. MySQL uses ISO-8859-1 by default - which again, is not what you want if you wish to store Japanese characters. (Unless you're putting them in BLOBS or something, then you can do whatever you want, but lose the ability to do things like search for text.) You should be able to tell MySQL to use a different encoding. If the DB is set up correctly, and if your data was read from the JSP correctly, then you shouldn't need to think about the DB's character encoding at all in your Java code. The driver and DB will handle that for you. You just use PreparedStatement's setString() method to load the data into your insert or update statement, and execute it. The driver and DB will translate that Unicode String into whatever encoding the DB is configured to use.
One other thing - in the unlikely event that you really do need the

you should probably look into explicitly specifying the character set to be used by the String constructor. Right now you're relying on the platform default encoding, which means if you move to a different machine you might get a very different result.
3) Store...
4) Retreave and convert back into desigred charset:
String value = new String(rs.getString("value").getBytes("ISO-8859-1"), charset);

Same objections as before. If the DB is configured correctly, and the data was inserted correctly, then String value = rs.getString("value") is all you should need here.
5) display in JSP
Which should work fine if the JSP is still using an encoding that cna handle Japanese.
 
Daniil Sosonkin
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The requirement is that I need to store more charsets than one in one table. Some of those would be, english, russian, hebrew, japanese, etc... any suggestions to do that?
reply
    Bookmark Topic Watch Topic
  • New Topic