File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes what should the default charset be? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "what should the default charset be?" Watch "what should the default charset be?" New topic

what should the default charset be?

Mike Curwen
Ranch Hand

Joined: Feb 20, 2001
Posts: 3695

I'm having a bit of trouble with charsets and encodings. My problem is specifically related to JavaMail and webapps, but I'm posting in the general forum , because I think my difficulty is in a general misunderstanding of charsets/encoding.

I've got a website that I am in the process of i18n-enabling.

The two languages are French and English. So far, I've had no real trouble with the french accented characters. Everything just appears to work the way I'd expect.

In TextPad, I can see my � and � (and any other accents) fine. I view the file info and it tells me my document "code set" is ANSI. Not sure what 'code set' is, perhaps they mean char set?

Anyways.. I upload the i18n properties file containing French words (and thus, special characters) to my web server. I then use the java.util.Locale to retrieve the localized text and it all works. The web pages have the �, etc, etc.

Another part of the site I'm i18n'ing is generated/feedback emails. The body of the emails contain static text as well as dynamic. The static text is being pulled out of the properties file as well. When I pull these out of the file, and send them through JavaMail, I get message bodies that look like:

"D?sol?s. Le syst?me d?extraction des mots de passe est pr?sentement hors d?usage."

When it should read:
"D�sol�s. Le syst�me d'extraction des mots de passe est pr�sentement hors d'usage."

The special characters are not being properly decoded? It's using the wrong charset?

I view the message headers, and observe:
Content-Type: text/plain; charset=ANSI_X3.4-1968

I was under the impression that UTF-8 was Java's 'default' ?

Investigating my System properties programmatically, I discover:

file.encoding = ANSI_X3.4-1968

Hmm.. the same as my email.

To make matters worse, there are other emails the system generates that have a different header (just text/plain, with no charset specified), and *these* emails manage to output the correct special characters.

Where might my encodings/charsets be off?
I agree. Here's the link:
subject: what should the default charset be?
It's not a secret anymore!