• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Diff b/w UTF-8 and ANSI

 
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

First of all I intend to know what is the difference between ANSI encoding and UTF-8 encoding. Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file or how do I prove that a given file is a UTF-8 file.

Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

I intend to know more about ANSI,ASCII,unicode,utf-8 etc. Any basic tutorial, plz give the link.

Regards

Nikhil Bansal
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Wikipedia: UTF-8

UTF-8 is a variable-length character encoding. ASCII characters are always 7 bits and are usually stored in 8-bit bytes. Text in UTF-8 uses between 1 and 4 bytes per character, not always 1 byte like ASCII.
 
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The one true place to look is http://www.unicode.org/.

Text files on disk may sometimes have a few bytes at the start, indicating their Unicode encoding. This is called the Byte Order Mark, or BOM. If it is there, you can determine the encoding with certainty, and hence decode the text file correctly. Java is pretty good at that stuff: see String, Reader, Writer etc.

If there is no BOM in a text file, there is no 100% guaranteed way to determine its encoding. Some Third-party frameworks (don't know any URLs - try Google) will try to guess it for you.

Oh, and please don't use this text-messaging abbreviation garbage in your posts. You have a real keyboard, so type "difference between", not "diff b/w". Using unnecessary abbreviations annoys some readers, and also makes it harder for people who have limited English.
[ June 26, 2006: Message edited by: Peter Chase ]
 
Nikhil Bansal
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot Peter.I found your inputs useful.

I have one more question : Why are there so many types on encoding. I mean why not just 1 encoding.

Regards

Nikhil Bansal
 
Peter Chase
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Many encodings are just a result of history. Most computer applications have to work with other applications or operating systems that carry a legacy of such old encodings. Therefore, we need to continue supporting them.

If we could start from scratch, we'd probably all choose to use Unicode. I suspect that the need for a fixed-size (like UTF-16) and a variable-size version (like UTF-8) would continue.

Maybe the problems will gradually subside. Unicode seems to be getting widely adopted and perhaps other encodings will gradually fade away. We can only hope!

But remember, the fact that computing is complicated and difficult is why we get to keep our jobs. If it was easy, the management and marketing could do it!
[ June 27, 2006: Message edited by: Peter Chase ]
 
I wasn't selected to go to mars. This tiny ad got in ahead of me:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic