jQuery in Action, 2nd edition*
The moose likes I/O and Streams and the fly likes convert UTF8 encoded file to Unicode Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "convert UTF8 encoded file to Unicode" Watch "convert UTF8 encoded file to Unicode" New topic
Author

convert UTF8 encoded file to Unicode

Doug Cyporyn
Greenhorn

Joined: Jun 03, 2003
Posts: 2
I need to read in a record and replace a character in the record with another character.
The file I am reading is encoded in UTF8 format. In Java, I can read the file and specify the encoding that is used.

//specify file and create input stream with proper encoding
File f = new File("c:\\gme_test.txt");
FileInputStream file = new FileInputStream(f);
InputStreamReader inReader = new InputStreamReader(file, "UTF8");
It is my understanding that when Java reads the file it will convert it to Unicode. So, when I do this:
//read a record from the file into a String object
BufferedReader inBuffer = new BufferedReader(inReader);
String aRecord = inBuffer.readLine();
The result should be a Unicode String. Now I should be able to convert a character in the string. I am using the Unicode literal for the character as follows:

//replace � with u
aRecord.replace('\u00FC', 'u');
The problem is that the character I am trying to replace is not found in the String. The string looks more like this:
R��sselsheim
Reading about UTF8, I have learned that some characters can only represented by two bytes. Others, while encoded by two bytes, can easily convert without loosing translation. It seems to me that the second two characters in the string above make up the UTF8 character I wish to convert.
Using UltraEdit-32 I was able to open my file and convert the encoding from UTF8 to Unicode. When I did this "R��sselsheim" became "R�sselsheim". Then converting from Unicode to ASCII, "R�sselsheim" remained "R�sselsheim".
Any ideas?


Doug
Jose Botella
Ranch Hand

Joined: Jul 03, 2001
Posts: 2120
Welcome to the Ranch Dough,
for me this code worked ok

Though the first line was printed in netbeans, not in a DOS window.
[ May 01, 2004: Message edited by: Jose Botella ]

SCJP2. Please Indent your code using UBB Code
Doug Cyporyn
Greenhorn

Joined: Jun 03, 2003
Posts: 2
thanks for taking the time to reply. i came back to my code a while later and realized it was working. I needed to provide a String to accept the return from the replace method.
Sorry for the hassle.
Doug
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: convert UTF8 encoded file to Unicode
 
Similar Threads
problem while unzipping the non-ascii characters
Byte vs Character streams
IO Questions
ASCII to UNICODE
IO Question