File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

UTF-8 and database.

 
Rahul Bhattacharjee
Ranch Hand
Posts: 2308
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I was just thinking like , if the encoding is set to UTF-8 for my database and I have defined a field with varchar(4) , then how many bytes the database reserver for this field.

As UTF-8 can be 1,2,3 .. any number of bytes.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This may depend on the specific database you're using, but in my experience, it means the number of bytes, not the number of characters. Which means it can be difficult to know in advance whether a given string is too long for a field, if the characters are not ASCII chars. If necessary, you can convert a String to a byte[] (using the appropriate encoding) and measure its length, to find out if it's too big for a VARCHAR field. You should check the documentation for the specific database you're using to be sure.
[ May 29, 2007: Message edited by: Jim Yingst ]
 
Rahul Bhattacharjee
Ranch Hand
Posts: 2308
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I too think it should be like that way.

But its very confusing when the character set of the database is something like UTF-8 (multibyte) and we define the column type as varchar(4) , which means it can store 4 characters.(can be any number of bytes in case of UTF-8).

Might be the databases that support UTF-8 as encoding doesnot allocate bytes for columns initially and might have data structure for accommodating variable chunks of bytes.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic