File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes Encoding problem when writing to file system Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Encoding problem when writing to file system" Watch "Encoding problem when writing to file system" New topic
Author

Encoding problem when writing to file system

sven coleman
Greenhorn

Joined: Nov 02, 2006
Posts: 2
Hi,

I'm generating a String containing some HTML that contains many accents. When I try to save it to file, the accents and cyilic characters are mangled. I would like to be able to save a file like this one:

http://www.columbia.edu/kermit/utf8.html


I've tried various encodings: MacRoman (i'm in front of a Mac for now) and the UTF8 encodings for the OutputStreamWriter without any success.

in String -> Study after Vel�zquez

in file -> Vel�zquez (MacRoman) Vel��zquez (UTF8)

From what i understood, the Mac filesystem is not UTF8 based so i just can't ouput correctly that kind of characters...

I guess there is a big issue i'm missing here but is it possible to/how could I produce some UTF-8 that could be correctly rendered on Linux/Windows/Mac?



Thanks
[ November 02, 2006: Message edited by: sven coleman ]
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8927
    
    9

How are you viewing the file? Are you certain that the application can properly render UTF-8?


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
sven coleman
Greenhorn

Joined: Nov 02, 2006
Posts: 2
Hi Joe,

Thanks a lot for your answer. I'm new to these encoding problems and I'm lost with it...
As my end target is a browser, I've tried to see the file in Firefox as rendered HTML and as source. I get the same hieroglyphs.
If i look at it using JEdit, the same characters appear. I guess (?) at least JEdit is using system default encoding (MacRoman). As for Firefox, i guess (???) it should be able to interpret it correctly if the file was really in UTF-8...

As a attempt to find a workaround, I've succesfully converted my messy HTML in XHTML using TagSoup. As XML is mainly UTF-8 encoded, I naively thought 3rd party XML libraries could handle it for me. Special characters look great in Eclipse console but when i save the file using the DOM4J XMLWriter:

org.dom4j.io.OutputFormat format = org.dom4j.io.OutputFormat.createCompactFormat();
format.setEncoding("UTF-8");
format.setNewlines(true);
format.setIndentSize(2);
format.setTrimText(false);

org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(new
FileWriter("/mydir/tagSoup2.html"), format);
xmlWriter.write(docXHtml);
xmlWriter.flush();

xmlWriter.close();
}

I get ? in place of the accents...
I'm on Mac, the file systems encoding isn't UTF-8... i read it could be the problem...
Would it help if i ran the code on Windows or Linux?

Thanks
[ November 02, 2006: Message edited by: sven coleman ]
alban maillere
Greenhorn

Joined: Nov 06, 2006
Posts: 6
hello sven,
i'm not used to mac systems but i can tell you I usually resolve all the accents problems (for european languages) by using ISO-8859-15 (or ISO-8859-1 if the first is not supported)

Hope it helps


while(true){<br /> this.put(BeerFactory.newInstance());<br />}
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Filesystem support is only needed for file names. For file data, support in target program is needed. Any modern other browser support utf-8.

But the program (browser) must know the encoding of file somehow.
In HTML, encoding is specified this way.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Encoding problem when writing to file system