aspose file tools*
The moose likes Java in General and the fly likes Language doubt Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Language doubt" Watch "Language doubt" New topic
Author

Language doubt

Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12


The input text is in different language.Lets say , Russian .

When i try to generate unicode hexadecimal value for this ; it didn't recognize the text and display '? ' in place of the text.
The default system file encoding is Cp1252.

Can any1 explain, how to do this or change the file encoding?

Thanks.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39828
    
  28
Welcome to JavaRanch

Please read this about why we don't like people writing "any1" or similar.
By no means an easy beginner's question: moving thread.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14345
    
  22

Welcome to JavaRanch.

How are you displaying the text that comes out of your program? Are you printing it in a console window? Be aware that the console window on the English version of Windows by default uses a font that does not support most of the characters that are in Unicode. If you try to print a character on the console that's not in the font, you'll get a '?'.

Changing the file encoding will not solve that problem; the console simply isn't able to display those characters with the default font.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
The problem is, i have to take data from database that may b in some other language and then i have to display that data in a rtf document.

The steps that i have come up with are like this:
1. Fetch data from database;
2. Convert it into unicode hex value
3. Pass the unicode hex value to the rtf as a string from the java code.

For French,German,Italian its working, but , for other languages like Greek or Russian , its not.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39828
    
  28
Try it with a very small dataset in Russian, print out the hex values with the %x tags on the command lines, then compare the output with a Unicode set to confirm you actually have Russian letters. Russian is included on this Unicode page.
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
I am trying to take the russian text in a string ; but when i try to convert the char into unicode hex value, it take those characters as a '?' and display the unicode hex value of '?'.

So how can i make the code recognize those characters.
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
I am posting the code which i am using..





And the output i am getting is:

original = ???
[B@765291

roundTrip = ???
un=\'3f\'3f\'3f\'3f\'3f\'3f
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39828
    
  28
Try the String.toCharArray() method to split your writing into chars.
Iterate through the char[] with a for/for-each loop.
Print each character to screen with the %x tags. It may need an int cast.When you compare the hex values with the Unicode page I showed you yesterday, you can check that the correct numbers are shown.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39828
    
  28
I put my code into a simple class, and executed it, copying and pasting your original word as a test
java RussianDemo цитата
The char ц has the hex value 446
The char и has the hex value 438
The char т has the hex value 442
The char а has the hex value 430
The char т has the hex value 442
The char а has the hex value 430
You will have to check against the Unicode page, but that seems to be working. It is on a Linux box; the shell supports Unicode.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14345
    
  22

Ankit Saxena wrote:I am posting the code which i am using..

I'll repeat what I already wrote above: you are printing Russian characters to a console window with System.out.println(). The console window (in a normal, English version of Windows) by default uses a font that does not support the Russian characters, so you get question marks instead.

Your program might produce the right output, but if you display it in a console window, you won't see it, because the console window can't display it.
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
printf is not working for me . i have jdk 1.3 version.
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10523
    
    9

printf() was added in 1.5


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
But then how it could be done, because i have only 1.3 and 1.4 version.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

System.out.println("The char " + c + " has the hex value " + (int)c);

The printf method basically just puts the last arguments into the first argument, starting from left to right, replacing any part that starts with % (%% is used to show a single % character). Of course it does allow some more formatting (e.g. %04d to print a number with zeros padded to the left if smaller than 1000), but for the rest it's as easy as the above code.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 375
You could use Integer.toHexString(c); to printout the unicode code point in hex.

Also you could write an html file with unicode code point values in decimal as html entities like ц or in hex like ц and open it in say the latest version of firefox and you should see the unicode characters rendered in the browser.

Another way would be to display the unicode in some Swing component.
Ankit Saxena
Greenhorn

Joined: Jul 06, 2009
Posts: 12
First of all, thanks a lot for the suggestions.

I am getting the unicode value for the characters but its like for char 'ц' the value is 446 but in Windows-1251 encoding it's value is 'f6'. and i need this value to pass it to rtf file such that it can display that character properly.

So,how to do this?
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 375
I guess this is what you have to do.

OutputStream out = new FileOutputStream("russian.rft");
OutputStreamWriter os = new OutputStreamWriter(out, "Cp1251");

then use one of the write methods of the OutputStreamWriter to write to the file.

Since you are using jdk 1.3 there is a complication. You will need to get the i18n.jar distributed with the international version of the 1.3 jdk. I am not sure whether 1.3 is now available for download. Hopefully you have it already.

Check the suppported character encodings for java 1.3
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Language doubt