File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other JSE/JEE APIs and the fly likes Urgent -Need an API to identify the character encoding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Urgent -Need an API to identify the character encoding" Watch "Urgent -Need an API to identify the character encoding" New topic
Author

Urgent -Need an API to identify the character encoding

Anu Manian
Greenhorn

Joined: May 31, 2007
Posts: 3
Hi,

Wondering if there are any JAVA APIs available to identify the character encoding of a content without the charset parameter on the content-type header. Please help needed immediately. I tried using NGramJ,

http://www.i18nfaq.com/chardet.html

http://www.jetbrains.com/idea/openapi/5.0/com/intellij/openapi/vfs/CharsetToolkit.html

I used this but the CharsetToolKit identifies only among UTF-8, UTF-16LE and UTF-16 not any other encodings like TIS-620 etc. I am new to this as well, so not sure whether I am doing it right. Please advise.

Also, if any samples of chardet would be appreciated.

One thing not sure, is when I send a message has Thai characters from Hotmail having my browser setting to Thai encoding(TIS-620) but my Hotmail account language is English and sent to one of my exchange accounts. In the outlook, the message looks gibbrish.

So I need the charset encoding detector to let me know what type of encoding is done on the content (as if you choose English as the language option, the Hotmail server doesn't have charset parameter in the content-type header) so that I can decode and re-encode to UTF-8.

Any immediate response would be appreciated.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Urgent -Need an API to identify the character encoding
 
Similar Threads
Encoding On Browser Problems
Charset Conversion
content type overides my encoding
reading and parsing fixed length file
Thai and struts