aspose file tools*
The moose likes Servlets and the fly likes i18n Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Servlets
Bookmark "i18n" Watch "i18n" New topic
Author

i18n

Satish Kumar Kara
Greenhorn

Joined: Feb 29, 2008
Posts: 8
I am working in i18n for my application. For that i need to put some master data to my database and I need to write sql scripts for that. Please let me know the standard format. Till now I have tried with two formats.

1. I put "\uxxxx" format in database. But it was coming as it is in database. I am using this format for property file and resource boundle is working fine.

2. I put "&#xxxx;" format in database. It is working fine in UI. I also have to take PDF using itext API for those data.

I am also storing data from application which is storing as "¿¿¿¿¿ ¿¿¿ ¿¿¿¿¿" format and it is coming fine to UI.

I am using jsp, servlet and oracle 10g database.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42632
    
  65
Both the Java-specific \uxxxx and the HTML-specific &#xxxx; are bad choices for storing data in the DB. Why not store the actual Unicode (as UTF-8) in the DB and convert it to something else in case that becomes necessary (which it shouldn't - both web pages and PDFs can handle Unicode just fine) ?


Ping & DNS - my free Android networking tools app
Satish Kumar Kara
Greenhorn

Joined: Feb 29, 2008
Posts: 8
I am able to store data in unicode(utf-8) from application using JDBC. But not able to insert directly from command prompt. Can you give some hint; how to do that..
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42632
    
  65
I'm not sure what you mean by "insert directly from command prompt" - are you using a command line utility to access the DB? If so, be aware that most consoles do not handle Unicode (or much of anything besides US-ASCII, or ISO-8859 at best, actually). But there are any number of GUI DB clients that can be used instead.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Command prompts don't support unicode, so you can either use a client that does or manually encode it.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Satish Kumar Kara
Greenhorn

Joined: Feb 29, 2008
Posts: 8
Hi am using the following steps to insert in to database

C:\> set NLS_LANG=.AL32UTF8
C:\> sqlplus username/password@database

SQL>INSERT INTO TESI_CODETABLE ( CTCODETYPE, CTCODEID, CTLANGID, CTCODENAME, CTCODEDESC,
CTRELATEDCODEID, CTCREATEDATETIME, CTOBSOLETEFLAG, CTOBSOLETEDATETIME ) VALUES (
'StampDutyType', 'MH-RG-14', 'hi_IN', 'हिन्दी में_Gift', 'NULL', 'NULL', TO_Date( '02/11/2008 04:43:57 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'N', NULL);

//It is showing the character as "??? ??_Gift" in editor

Now when i retrieved the data using select query it is showing the same "??? ??_Gift".

my db NSL properties

SQL> select * from nls_database_parameters where parameter like '%SET';

PARAMETER
------------------------------
VALUE
-------------------------------------------------------------------------------
NLS_NCHAR_CHARACTERSET
AL16UTF16

NLS_CHARACTERSET
AL32UTF8

@poul: some reference will be appreciated.. (I prefer manual encode. Please suggest some tools. So that i can directly put the encoded values for particular entry in the insert statement.)
Satish Kumar Kara
Greenhorn

Joined: Feb 29, 2008
Posts: 8
I got the solution to the problem..

If the database character set is AL32UTF8 (SELECT value FROM nls_database_parameters WHERE parameter='NLS_CHARACTERSET'), then it makes no sense to use NVARCHAR2. You should store data in VARCHAR2.

You can store non-English data in two ways, if you want to use SQL*Plus scripts:

1. Create a script in Notepad, just writing the foreign characters using an appropriate keyboard layout. Store the script in UTF-8 (Save As...->Encoding->UTF-8). Use any hex editor to remove the first three bytes of the file (0xEF 0xBB 0xBF), set the NLS_LANG environment variable to .AL32UTF8, and run the script in SQL*Plus.

2. Create a script in any text editor. Instead of entering character literals directly, put them as arguments to the UNISTR function. Encode non-ASCII characters using their Unicode codes, e.g. the Hindi word "Patra"=pa+ta+virama+ra (letter) should be written as UNISTR('\092a\0924\094d\0930')

Any way thanks a lot...
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: i18n