File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Fit or not fit (Oracle varchar2 problem) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Fit or not fit (Oracle varchar2 problem)" Watch "Fit or not fit (Oracle varchar2 problem)" New topic
Author

Fit or not fit (Oracle varchar2 problem)

D Rog
Ranch Hand

Joined: Feb 07, 2004
Posts: 472

I met a strange problem, although a field is defined as varchar2(4000), it can fit different number of character, because Unicode, UTF-8 or whatever. I guess Oracle field size is defined in bytes, not characters. So is any way to figure out upfront if certain string will fit varchar2 field without actual executing INSERT or UPDATE? Another problem is, if I know that a string can't fit and going to truncate it, then the process seems should be iterative. So an ideal solution for me can be just call some function which with minimum spending of computer resources can tell me how to truncate a string to fit a particular varchar2 field.Or perhaps you can suggest something clever?


Retire your iPod and start with HD Android music player Kamerton | Minimal J2EE container is here | Light weight full J2EE stack | and build tool | Co-author of "Windows programming in Turbo Pascal"
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3716
    
    5

When you get over 2,000 characters, encoding becomes a big issue (been there). It might be better to declare objects that large as BLOBs (or in the case of Oracle, CLOBs)


My Blog: Down Home Country Coding with Scott Selikoff
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
Originally posted by D Rog:
I met a strange problem, although a field is defined as varchar2(4000), it can fit different number of character, because Unicode, UTF-8 or whatever. I guess Oracle field size is defined in bytes, not characters. So is any way to figure out upfront if certain string will fit varchar2 field without actual executing INSERT or UPDATE? Another problem is, if I know that a string can't fit and going to truncate it, then the process seems should be iterative. So an ideal solution for me can be just call some function which with minimum spending of computer resources can tell me how to truncate a string to fit a particular varchar2 field.Or perhaps you can suggest something clever?


Actually your problem is not that strange, but it is likely the result of how UTF-8 characters are encoded. Characters beyond U+FFFF are stored as four bytes. This storage can be even more confusing if your Oracle installation is using AL32UTF8 as its Oracle character set. AL32UTF8 stores these same (greater than U+FFFF) characters as two UTF-16 surrogates encoded in UTF-8 that take up 6 bytes per surrogate character.

As previously stated, a CLOB is likely a better way for you to go.
D Rog
Ranch Hand

Joined: Feb 07, 2004
Posts: 472

Thanks for responses. Yes, CLOB is what I am currently mostly use. My concern was that CLOB can perform not quite well.

Another concern, how effectively Oracle build index against CLOB column?
[ October 10, 2008: Message edited by: D Rog ]
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
Originally posted by D Rog:
Thanks for responses. Yes, CLOB is what I am currently mostly use. My concern was that CLOB can perform not quite well.

Another concern, how effectively Oracle build index against CLOB column?

[ October 10, 2008: Message edited by: D Rog ]



All LOB columns have special segments created for them by Oracle. These columns require a special type of indexing.

To define a LOB index:

http://www.dba-oracle.com/t_table_blob_lob_storage.htm

You can also create a text index if the situation warants it:

http://www.dba-oracle.com/oracle_tips_like_sql_index.htm
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3716
    
    5

You're right that CLOBs may not perform well but when you go over 2,000 characters, you don't have many options.
Avi Abrami
Ranch Hand

Joined: Oct 11, 2000
Posts: 1135

Rog,
Have you read the Oracle documentation?
VARCHAR2 columns can be defined as "bytes" (the default) or "characters".
The documentation also details the size calculations Oracle makes for storing VARCHAR2 column data.

Good Luck,
Avi.
Paul Campbell
Ranch Hand

Joined: Oct 06, 2007
Posts: 338
Originally posted by Avi Abrami:
Rog,
Have you read the Oracle documentation?
VARCHAR2 columns can be defined as "bytes" (the default) or "characters".
The documentation also details the size calculations Oracle makes for storing VARCHAR2 column data.

Good Luck,
Avi.


That is true... you can override the default length semantics defined by the NLS_LENGTH_SEMANTICS parameter which defaults to byte(i.e., if you don't specify VARCHAR2(4000) is 4000 bytes).

You can set it to VARCHAR2(CHAR 4000) and Oracle will store the 4000 characters regardless of the number of bytes used to store the 4000 characters.

Your maximum size limit for VARCHAR2 is 4000 regardless of character or bytes.
[ October 15, 2008: Message edited by: Paul Campbell ]
Valentin Jacquemin
Ranch Hand

Joined: Feb 15, 2006
Posts: 35
Paul Campbell wrote:
You can set it to VARCHAR2(CHAR 4000) and Oracle will store the 4000 characters regardless of the number of bytes used to store the 4000 characters.

Your maximum size limit for VARCHAR2 is 4000 regardless of character or bytes.


Just to precise to correct syntax is: VARCHAR2(4000 CHAR).
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Fit or not fit (Oracle varchar2 problem)