File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes UTF-8 and database. Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "UTF-8 and database." Watch "UTF-8 and database." New topic
Author

UTF-8 and database.

Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2300
Hi all,

I was just thinking like , if the encoding is set to UTF-8 for my database and I have defined a field with varchar(4) , then how many bytes the database reserver for this field.

As UTF-8 can be 1,2,3 .. any number of bytes.


Rahul Bhattacharjee
LinkedIn - Blog
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18670
This may depend on the specific database you're using, but in my experience, it means the number of bytes, not the number of characters. Which means it can be difficult to know in advance whether a given string is too long for a field, if the characters are not ASCII chars. If necessary, you can convert a String to a byte[] (using the appropriate encoding) and measure its length, to find out if it's too big for a VARCHAR field. You should check the documentation for the specific database you're using to be sure.
[ May 29, 2007: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2300
I too think it should be like that way.

But its very confusing when the character set of the database is something like UTF-8 (multibyte) and we define the column type as varchar(4) , which means it can store 4 characters.(can be any number of bytes in case of UTF-8).

Might be the databases that support UTF-8 as encoding doesnot allocate bytes for columns initially and might have data structure for accommodating variable chunks of bytes.
 
I agree. Here's the link: http://jrebel.com/download
 
subject: UTF-8 and database.
 
Similar Threads
Assignment on it's way.....
NX: URLyBird 1.3.3 -- EOFException
Data File Format & Schema File
Special character handling
Passing MultiLingual characters from a jsp