File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes UTF-8 and database. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "UTF-8 and database." Watch "UTF-8 and database." New topic

UTF-8 and database.

Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
Hi all,

I was just thinking like , if the encoding is set to UTF-8 for my database and I have defined a field with varchar(4) , then how many bytes the database reserver for this field.

As UTF-8 can be 1,2,3 .. any number of bytes.

Rahul Bhattacharjee
LinkedIn - Blog
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
This may depend on the specific database you're using, but in my experience, it means the number of bytes, not the number of characters. Which means it can be difficult to know in advance whether a given string is too long for a field, if the characters are not ASCII chars. If necessary, you can convert a String to a byte[] (using the appropriate encoding) and measure its length, to find out if it's too big for a VARCHAR field. You should check the documentation for the specific database you're using to be sure.
[ May 29, 2007: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
I too think it should be like that way.

But its very confusing when the character set of the database is something like UTF-8 (multibyte) and we define the column type as varchar(4) , which means it can store 4 characters.(can be any number of bytes in case of UTF-8).

Might be the databases that support UTF-8 as encoding doesnot allocate bytes for columns initially and might have data structure for accommodating variable chunks of bytes.
I agree. Here's the link:
subject: UTF-8 and database.
It's not a secret anymore!