Forum:

Java in General

Excel encoding / charset to read multibyte characters from java

Greenhorn

Posts: 4

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I have to read multibyte (Japanese and Chinese) characters of an excel sheet and store in database table. I am setting encoding and character set to Excel Driver but its not displaying only ??? question marks. How to read the Japanese and Chinese Characters using Excel jdbcodbc driver. Any help in this regard is appreciated.

Thanks

Sivaraman Lakshmanan

Ranch Hand

Posts: 231

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi Phani,
Try setting the character encoding to UTF-8.

Regards,
Sivaraman.L

phani dar

Greenhorn

Posts: 4

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thanks for your reply. I have tried with UTF-8, setting the charSet to driver. Here is the sample code, that I am working on. Could you go through the code, and you can provide any help.

import java.io.*;
import java.sql.*;
import java.text.*;
import java.util.*;
 
public class TestXLSInput{
    public static void main(String args[]){
    Connection conn = null;
    String DATABASE_URL = "jdbc :o dbc :D river={Microsoft Excel Driver (*.xls)};DriverID=22;READONLY=false;";
    String DRIVER_NAME = "sun.jdbc.odbc.JdbcOdbcDriver";
    String infile="ChinJap.xls";
    String tabSheet = "multibyte";
 
        try {
Properties info = new Properties();
        info.put("encoding", "utf-8");
        //info.put("charset", "ISO-8859-1");
 
            Class.forName(DRIVER_NAME);
            conn = DriverManager.getConnection(DATABASE_URL + "DBQ=" + infile,info);
            Statement stmt = conn.createStatement();
 
            ResultSet rs = stmt.executeQuery("select English,Japanese,Chinese from [" + tabSheet+ "$]");
            FileOutputStream fos=new FileOutputStream("temp33.txt");
            Writer writer = new OutputStreamWriter(fos,"utf-8");
 
            if(rs.next()){
do{
try{
String temp = rs.getString(2)+"\n";
System.out.println(temp);
writer.write(temp);
}catch(Exception ex){
System.out.println(ex.getMessage());
}
}while(rs.next());
writer.close();
}//end of if
}catch(Exception ex){
System.out.println(ex.getMessage());
}//end of try
finally{
try{
if(conn!=null){
conn.close();
}//end of if
}catch(Exception ex){
System.out.println("ex "+ex.getMessage());
}//end of try
}
   }
}

[ November 20, 2008: Message edited by: Martijn Verburg ]

Ulf Dittmer

Rancher

Posts: 43081

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Where are you printing this - some kind of console/terminal? Most of those only support displaying ASCII (or maybe ISO-8859), but not something like Chinese/Japanese.

phani dar

Greenhorn

Posts: 4

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I am writing the strings to text file and opening the text file in Edit Pro that supports UTF-8. But when I open the file, text is displayed as ??? question marks. I need to store the characters as Japanese and Chinese not as ???. How to do this using Excel Driver. Any property or logic is required to do this?
[ November 20, 2008: Message edited by: phani dar ]

Ulf Dittmer

Rancher

Posts: 43081

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Does the editor recognize the file as Unicode? The code you posted doesn't write a BOM, so the editor may be treating it as some other encoding.

Also, does the editor have access to fonts that can display those characters? Some editors use only monospaced fonts (which would rule out Japanese/Chinese fonts, unless they are especially constructed as monospaced, and recognizable as such by the font engine).

phani dar

Greenhorn

Posts: 4

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Before sending the data to database, I am trying with text file. Once the file opens properly in the editor, I will go with Database. But that is not happening even for text file. The editor has support for unicode.

When I use HSSF POI pakcage the work is done. But POI consumes more memory. The hunch is for JDBCODBC driver to Connect Excel.

Ulf Dittmer

Rancher

Posts: 43081

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

The editor having support for Unicode is not the same as the editor recognizing a file as being in Unicode. It takes a BOM (or smart code in the editor) to determine that.

In the Java code -before you save it to a file- have you checked whether the characters are the correct Unicode characters? That would tell you whether the problem is getting the data through JDBC, or saving the data to a file. (Assuming that the data is not saved correctly; from your description it's not clear to me that it isn't.)

John Dowling

Greenhorn

Posts: 2

posted 14 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi Phani,

Did you locate an odbc driver for excel which supports multibyte characters? I am having same issue, characters read from excel displaying as ???

Thanks,

John Maxall

Greenhorn

Posts: 2

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I found a solution to the problem of the JDBC-ODBC driver conversion of multibyte characters like Chinese or Japanese. In my case the JDBC driver worked just fine with resultSet.getString(columnName); but not the ODBC driver.

Here is what did work:

Swathi Sriramaneni

Greenhorn

Posts: 1

posted 12 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi John,

Could you please provide in detail solution for reading chinese characters from JDBC?

Campbell Ritchie

Marshal

Posts: 79239

377

posted 12 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Welcome to the Ranch

Java 8 (verified skill)
Skill verified by Jeanne Boyarsky

John Maxall

Greenhorn

Posts: 2

posted 11 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

A little more detail on my "solution" (that may not be the solution in all cases, of course)...

The idea is that java.lang.String knows how to do byte conversions: http://docs.oracle.com/javase/tutorial/i18n/text/string.html.

In my example, if I remember or guess correctly, the ODBC driver put the bytes in the database. My Java code reads the bytes, then uses the String constructor to convert the bytes to a String.

So instead of
String firstName = resultSet.getString(colIndex);
use
byte[] bytes = resultSet.getBytes(colIndex);
String firstName = new String(bytes, "utf-8");