aspose file tools*
The moose likes Other Open Source Projects and the fly likes Problem with è character in Java. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Problem with è character in Java." Watch "Problem with è character in Java." New topic
Author

Problem with è character in Java.

Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
Hi,
I have a string in database 'Anikèt'. Now when I fetch this in Java it returns 'Anik�t'. How can I get the actual value or something else is there in java which can give me 'e'?
Thanks in advance!!!
Aniket
Raghavan Muthu
Ranch Hand

Joined: Apr 20, 2006
Posts: 3344

One thing you can do with the appropriate locale for preserving the character set.

Other thing comes to my mind at the right moment is to use getBytes() method which gives you the byte equivalent of the actual data stuffed.


Everything has got its own deadline including one's EGO!
[CodeBarn] [Java Concepts-easily] [Corey's articles] [SCJP-SUN] [Servlet Examples] [Java Beginners FAQ] [Sun-Java Tutorials] [Java Coding Guidelines]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41477
    
  51
What is the "it" in "it returns"? Where are you seeing this? Note that consoles, terminals and simple-minded text editors may not be able to display umlauts properly.


Ping & DNS - my free Android networking tools app
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
Thanks Ulf for the reply.

Actually I have a prepared statement in Java eg.



So if the name is 'Anikèt' the value I get is 'Anik�t'.
Also I have tried getBytes(UTF-8); but returns some garbage value.

Note : The name column in DB is varchar.

Let me know if you need more info.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41477
    
  51
You're missing what I was getting at; where do you see the character? Is it a piece of software that supports accented characters?
Raghavan Muthu
Ranch Hand

Joined: Apr 20, 2006
Posts: 3344

Adding on top of what Ulf asks, he indeed means to ask how and where do you really use the fetched data stored in the variable name?.

Do you use some System.out.println for printing in the console or use any logger to print in the file etc?
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
I write this value on a excel file. And the wrong values is shown in excel itself.
Also I use Apache POI to do that.But the I could see the string object that I writing contains the wrong value when it is fetched from DB itself.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41477
    
  51
I write this value on a excel file. And the wrong values is shown in excel itself.

This may or may not preserve accented characters correctly; I wouldn't count it as proof that something is wrong.

But the I could see the string object that I writing contains the wrong value when it is fetched from DB itself.

I still don't know where you "see" that - have you examined the character code? If so, post the code you used to do that.
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
What I mean from "see" is when I debug the code at :
String name = rs.getString(1);
When I inspect the value of name it shows 'è' as '�.
I tried :

Now this converts '�' -> '?'. Obviously this is not correct.
And this value I can see when I inspect the object, or I do sysout or when I print this on an excel.
I hope this time its clear. :P
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41477
    
  51
Yes, things are clearer now - unfortunately, what has become clear is that you're making a couple of mistakes :-)

or I do sysout

See my earlier comments about consoles and terminals not being able to handle accented characters. So this counts for nothing until you've convinced us that wherever the sysout output goes to -which you haven't told us- can display those characters.

byte[] cn = name.getBytes();
String nm = new String(cn);

Never, ever, use "String.getBytes()" or "new String(byte[])", unless you know what the platform default encoding is, and that encodings do not matter - which is not the case if you're dealing with accented characters. You simply must specify the encoding by using "String.getBytes(String)" and "new String(byte[], String)". (I don't actually understand what the point of those two lines of code is - they're doing the inverse of one another. Or rather, they would, if encodings were handled correctly.)

The fastest way to check if things are OK is to print out the numerical character codes of "name".
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
I have also tried :





But no luck . What else should I try?

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41477
    
  51
It seems that you're now trying random things in order to make this work. That's not the way to success with software development. For example, US-ASCII does not contain accented characters, so whatever you're doing with that can't possibly work. And as I said, the getBytes call and the String constructor call are completely unnecessary. What you should do is what I've advised twice by now: print the numerical character codes of all characters of the string in question (without any unnecessary transformations in between).

I also strongly advise to beef up your knowledge of character encodings; start with The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
Thanks Ulf
Raghavan Muthu
Ranch Hand

Joined: Apr 20, 2006
Posts: 3344

Ulf Dittmer wrote:
Never, ever, use "String.getBytes()" or "new String(byte[])", unless you know what the platform default encoding is, and that encodings do not matter - which is not the case if you're dealing with accented characters. You simply must specify the encoding by using "String.getBytes(String)" and "new String(byte[], String)". (I don't actually understand what the point of those two lines of code is - they're doing the inverse of one another. Or rather, they would, if encodings were handled correctly.)

The fastest way to check if things are OK is to print out the numerical character codes of "name".


Thank you Ulf for the info and tips!
Martijn Verburg
author
Bartender

Joined: Jun 24, 2003
Posts: 3274
    
    5

Ulf Dittmer wrote:I also strongly advise to beef up your knowledge of character encodings; start with The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).


That article is awesome.


Cheers, Martijn - Blog,
Twitter, PCGen, Ikasan, My The Well-Grounded Java Developer book!,
My start-up.
Samir Banerjee
Ranch Hand

Joined: Jun 21, 2010
Posts: 72
Truly...resolved my issue too ...Thanks....
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem with è character in Java.