aspose file tools*
The moose likes I/O and Streams and the fly likes Encoding problem when writing to file system Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Encoding problem when writing to file system" Watch "Encoding problem when writing to file system" New topic

Encoding problem when writing to file system

sven coleman

Joined: Nov 02, 2006
Posts: 2

I'm generating a String containing some HTML that contains many accents. When I try to save it to file, the accents and cyilic characters are mangled. I would like to be able to save a file like this one:

I've tried various encodings: MacRoman (i'm in front of a Mac for now) and the UTF8 encodings for the OutputStreamWriter without any success.

in String -> Study after Vel�zquez

in file -> Vel�zquez (MacRoman) Vel��zquez (UTF8)

From what i understood, the Mac filesystem is not UTF8 based so i just can't ouput correctly that kind of characters...

I guess there is a big issue i'm missing here but is it possible to/how could I produce some UTF-8 that could be correctly rendered on Linux/Windows/Mac?

[ November 02, 2006: Message edited by: sven coleman ]
Joe Ess

Joined: Oct 29, 2001
Posts: 8997

How are you viewing the file? Are you certain that the application can properly render UTF-8?

[How To Ask Questions On JavaRanch]
sven coleman

Joined: Nov 02, 2006
Posts: 2
Hi Joe,

Thanks a lot for your answer. I'm new to these encoding problems and I'm lost with it...
As my end target is a browser, I've tried to see the file in Firefox as rendered HTML and as source. I get the same hieroglyphs.
If i look at it using JEdit, the same characters appear. I guess (?) at least JEdit is using system default encoding (MacRoman). As for Firefox, i guess (???) it should be able to interpret it correctly if the file was really in UTF-8...

As a attempt to find a workaround, I've succesfully converted my messy HTML in XHTML using TagSoup. As XML is mainly UTF-8 encoded, I naively thought 3rd party XML libraries could handle it for me. Special characters look great in Eclipse console but when i save the file using the DOM4J XMLWriter: format =;
format.setTrimText(false); xmlWriter = new
FileWriter("/mydir/tagSoup2.html"), format);


I get ? in place of the accents...
I'm on Mac, the file systems encoding isn't UTF-8... i read it could be the problem...
Would it help if i ran the code on Windows or Linux?

[ November 02, 2006: Message edited by: sven coleman ]
alban maillere

Joined: Nov 06, 2006
Posts: 6
hello sven,
i'm not used to mac systems but i can tell you I usually resolve all the accents problems (for european languages) by using ISO-8859-15 (or ISO-8859-1 if the first is not supported)

Hope it helps

while(true){<br /> this.put(BeerFactory.newInstance());<br />}
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Filesystem support is only needed for file names. For file data, support in target program is needed. Any modern other browser support utf-8.

But the program (browser) must know the encoding of file somehow.
In HTML, encoding is specified this way.
I agree. Here's the link:
subject: Encoding problem when writing to file system