If you look at the API for writeUTF(), it also writes two bytes of non-text data representing the length of the string. For general applications this probably isn't what you want - it's only good if you plan to use readUTF() later to read the data. Typically you're beter off with something like: Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("out.txt"), "UTF-8"));
"I'm not back." - Bill Harding, Twister
hinzsa hinzsa
Greenhorn
Joined: May 14, 2009
Posts: 4
posted
0
I am having the same problem, I cannot save a file as utf-8 encoded, specifically from Ansi code to utf-8 -
here is my test example... I would appreciate any help..
Sorry I said "save" I meant testing to copy a file and change the character encoding (from ANSI to UTF-8)...
I corrected the code adding "UTF-8" in the output stream, the input stream is window encoding "Cp1252".
I first I open the ansi file with text pad I check its encode by selecting "save as" and check the encode, it says ANSI.
then I execute the test program and test the newly created utf file, open it with text pad, check the encode by selecting "save as"
it still ANSI. why is not UTF? thanks in advance for your help
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35242
7
posted
0
The important thing is not what Textpad thinks, the important thing is whether the file *is* encoded in UTF-8. Does it contain any characters that are *not* part of ASCII/ANSI? Since UTF-8 files have no distinguishing characteristics that would mark them as UTF-8 *unless* they include actual Unicode characters, no editor could recognize them as UTF-8 in that case. (Unless you include a BOM, of course, but your code doesn't do that.)
hinzsa hinzsa
Greenhorn
Joined: May 14, 2009
Posts: 4
posted
0
I have a situation where I have an ANSI file containing Welsh characters, here is an example:
"A55 Eb Onslip From A550 Jct 35","","Penarlâg","Sir Y Fflint","CYM"
I am trying to convert it into an utf-8 encoded file once I run the test the same line change to:
"A55 Eb Onslip From A550 Jct 35","","Penarlâg","Sir Y Fflint","CYM"
I would apreciate any suggention, otherwise thank you for the information, very usefull
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35242
7
posted
0
Where are you seeing these characters - in a console? Most consoles can't handle Unicode. Or in some other program? If so, does it understand Unicode, and is it using a font that has that character?
hinzsa hinzsa
Greenhorn
Joined: May 14, 2009
Posts: 4
posted
0
This is the scenario, file with welsh chars was uploaded throgh web application to a linux box (red hat)
I checked the encoding (using file --mime filename) and is utf-8
file is pick up by another java application, in running in the same box and stream it to an oracle db running on windows
now oracle save the clob content in to a file in the same box, for further processing, here is where I have the problem, the encoding is ANSI.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35242
7
posted
0
Who says the encoding is ANSI? Textpad? Again, that needn't be correct. Have you looked at the file with a hex editor, and determined that the character is, in fact, not a UTF-8 character?
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to
run our stuff on 16 servers instead of 3.