jQuery in Action, 3rd edition
The moose likes Java in General and the fly likes what should the default charset be? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Introducing JavaFX 8 Programming this week in the JavaFX forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "what should the default charset be?" Watch "what should the default charset be?" New topic

what should the default charset be?

Mike Curwen
Ranch Hand

Joined: Feb 20, 2001
Posts: 3695

I'm having a bit of trouble with charsets and encodings. My problem is specifically related to JavaMail and webapps, but I'm posting in the general forum , because I think my difficulty is in a general misunderstanding of charsets/encoding.

I've got a website that I am in the process of i18n-enabling.

The two languages are French and English. So far, I've had no real trouble with the french accented characters. Everything just appears to work the way I'd expect.

In TextPad, I can see my � and � (and any other accents) fine. I view the file info and it tells me my document "code set" is ANSI. Not sure what 'code set' is, perhaps they mean char set?

Anyways.. I upload the i18n properties file containing French words (and thus, special characters) to my web server. I then use the java.util.Locale to retrieve the localized text and it all works. The web pages have the �, etc, etc.

Another part of the site I'm i18n'ing is generated/feedback emails. The body of the emails contain static text as well as dynamic. The static text is being pulled out of the properties file as well. When I pull these out of the file, and send them through JavaMail, I get message bodies that look like:

"D?sol?s. Le syst?me d?extraction des mots de passe est pr?sentement hors d?usage."

When it should read:
"D�sol�s. Le syst�me d'extraction des mots de passe est pr�sentement hors d'usage."

The special characters are not being properly decoded? It's using the wrong charset?

I view the message headers, and observe:
Content-Type: text/plain; charset=ANSI_X3.4-1968

I was under the impression that UTF-8 was Java's 'default' ?

Investigating my System properties programmatically, I discover:

file.encoding = ANSI_X3.4-1968

Hmm.. the same as my email.

To make matters worse, there are other emails the system generates that have a different header (just text/plain, with no charset specified), and *these* emails manage to output the correct special characters.

Where might my encodings/charsets be off?
I agree. Here's the link: http://aspose.com/file-tools
subject: what should the default charset be?
It's not a secret anymore!