my dog learned polymorphism*
The moose likes Java in General and the fly likes Save File As UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Save File As UTF-8" Watch "Save File As UTF-8" New topic
Author

Save File As UTF-8

Rudy Simon Yeung
Greenhorn

Joined: Jun 06, 2003
Posts: 15
Appreciated if someone can provide me the code snippets for saving a file as UTF-8 instead of ANSI.
Maulin Vasavada
Ranch Hand

Joined: Nov 04, 2001
Posts: 1871
hi Rudy
try java.io.DataOutputStream's writeUTF(String) method...
regards
maulin
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
If you look at the API for writeUTF(), it also writes two bytes of non-text data representing the length of the string. For general applications this probably isn't what you want - it's only good if you plan to use readUTF() later to read the data. Typically you're beter off with something like:
Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("out.txt"), "UTF-8"));


"I'm not back." - Bill Harding, Twister
hinzsa hinzsa
Greenhorn

Joined: May 14, 2009
Posts: 4
I am having the same problem, I cannot save a file as utf-8 encoded, specifically from Ansi code to utf-8 -

here is my test example... I would appreciate any help..

import java.io.*;

public class UTFFileWriter {

public static void writeUTFToFile(String path, InputStream in) throws Exception{
BufferedReader buff_in;
BufferedWriter buff_out;
InputStreamReader sin;
OutputStreamWriter sout;


try{
sin = new InputStreamReader(in);
buff_in = new BufferedReader(sin);

sout = new OutputStreamWriter (new FileOutputStream(new File(path)),"UTF8");
buff_out = new BufferedWriter(sout);

}catch(FileNotFoundException ex){
ex.printStackTrace();
throw new Exception("File "+path+" not found");
}

try{
int c;
while ((c = buff_in.read()) != -1) buff_out.write(c);

}catch(IOException ex){
ex.printStackTrace();
throw new Exception("Exception while copying");
}

try{
buff_in.close();
buff_out.flush();
buff_out.close();
}catch(IOException ex){
ex.printStackTrace();
throw new Exception("Exception while closing");
}
}

public static void main(String[] args){

try{
UTFFileWriter.writeUTFToFile("C:\\test\\my_utf8_file.xml", new FileInputStream("C:\\test\\my_ansi_file.xml"));

}catch(Exception e){
e.printStackTrace();
}
}
}
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41154
    
  45
What does "I cannot save" mean? What happens if you run this code?

Note that the encoding is called "UTF-8", not "UTF8".

Since you're not specifying any encoding with the InputStream - are you sure it's in the platform default encoding (whatever that may be)?


Ping & DNS - my free Android networking tools app
hinzsa hinzsa
Greenhorn

Joined: May 14, 2009
Posts: 4
Sorry I said "save" I meant testing to copy a file and change the character encoding (from ANSI to UTF-8)...
I corrected the code adding "UTF-8" in the output stream, the input stream is window encoding "Cp1252".
I first I open the ansi file with text pad I check its encode by selecting "save as" and check the encode, it says ANSI.
then I execute the test program and test the newly created utf file, open it with text pad, check the encode by selecting "save as"
it still ANSI. why is not UTF? thanks in advance for your help
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41154
    
  45
The important thing is not what Textpad thinks, the important thing is whether the file *is* encoded in UTF-8. Does it contain any characters that are *not* part of ASCII/ANSI? Since UTF-8 files have no distinguishing characteristics that would mark them as UTF-8 *unless* they include actual Unicode characters, no editor could recognize them as UTF-8 in that case. (Unless you include a BOM, of course, but your code doesn't do that.)
hinzsa hinzsa
Greenhorn

Joined: May 14, 2009
Posts: 4


I have a situation where I have an ANSI file containing Welsh characters, here is an example:

"A55 Eb Onslip From A550 Jct 35","","Penarlâg","Sir Y Fflint","CYM"

I am trying to convert it into an utf-8 encoded file once I run the test the same line change to:

"A55 Eb Onslip From A550 Jct 35","","Penarlâg","Sir Y Fflint","CYM"

I would apreciate any suggention, otherwise thank you for the information, very usefull
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41154
    
  45
Where are you seeing these characters - in a console? Most consoles can't handle Unicode. Or in some other program? If so, does it understand Unicode, and is it using a font that has that character?
hinzsa hinzsa
Greenhorn

Joined: May 14, 2009
Posts: 4
This is the scenario, file with welsh chars was uploaded throgh web application to a linux box (red hat)
I checked the encoding (using file --mime filename) and is utf-8
file is pick up by another java application, in running in the same box and stream it to an oracle db running on windows
now oracle save the clob content in to a file in the same box, for further processing, here is where I have the problem, the encoding is ANSI.

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41154
    
  45
Who says the encoding is ANSI? Textpad? Again, that needn't be correct. Have you looked at the file with a hex editor, and determined that the character is, in fact, not a UTF-8 character?
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Save File As UTF-8
 
Similar Threads
utf-8 encoding
File Encoding
URIENcoding in server.xml not working in IE6
problem with Euro symbol
UTF8 java + arabic