This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes UTF-8 and database. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "UTF-8 and database." Watch "UTF-8 and database." New topic
Author

UTF-8 and database.

Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
Hi all,

I was just thinking like , if the encoding is set to UTF-8 for my database and I have defined a field with varchar(4) , then how many bytes the database reserver for this field.

As UTF-8 can be 1,2,3 .. any number of bytes.


Rahul Bhattacharjee
LinkedIn - Blog
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
This may depend on the specific database you're using, but in my experience, it means the number of bytes, not the number of characters. Which means it can be difficult to know in advance whether a given string is too long for a field, if the characters are not ASCII chars. If necessary, you can convert a String to a byte[] (using the appropriate encoding) and measure its length, to find out if it's too big for a VARCHAR field. You should check the documentation for the specific database you're using to be sure.
[ May 29, 2007: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
I too think it should be like that way.

But its very confusing when the character set of the database is something like UTF-8 (multibyte) and we define the column type as varchar(4) , which means it can store 4 characters.(can be any number of bytes in case of UTF-8).

Might be the databases that support UTF-8 as encoding doesnot allocate bytes for columns initially and might have data structure for accommodating variable chunks of bytes.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: UTF-8 and database.
 
Similar Threads
NX: URLyBird 1.3.3 -- EOFException
Passing MultiLingual characters from a jsp
Data File Format & Schema File
Special character handling
Assignment on it's way.....