my dog learned polymorphism*
The moose likes Java in General and the fly likes Problem preserving accented characters when writing text to file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Problem preserving accented characters when writing text to file" Watch "Problem preserving accented characters when writing text to file" New topic
Author

Problem preserving accented characters when writing text to file

Chris Gage
Greenhorn

Joined: Mar 23, 2005
Posts: 17

I saved a large Microsoft Word document containing many accented characters, n-dashes, inverted commas, etc as HTML, and then using java code I converted it to XML, and split it up into smaller pieces. I then tried to write these pieces out to the filesystem as ISO-8859-1. Since the original MS Word document is CP1252 which is supposed to be a superset of ISO-8859-1, I though this should be straightforward. But when I reopen these files, all the accented characters, inverted commas etc have been converted to question marks.

I did some intensive googling and found many recommendations as to how to resolve this problem, and tried several of them, and was not able to get any of them to work.

Here is my last iteration (still unsuccessful):



What am I doing wrong?
Ramon Anger
Ranch Hand

Joined: Apr 19, 2011
Posts: 56

Could you please provide a sample string or your getInput() method?


Blackbelt on BlackBeltFactory.com.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

The problem is that all of those characters you mentioned are not defined in ISO-8859-1, so they are rendered as question marks. You should choose a more suitable encoding. (I would suggest UTF-8.)
Chris Gage
Greenhorn

Joined: Mar 23, 2005
Posts: 17

That was it, I was so focused on the fact that ISO-8859-1 DOES support the accented characters of most European languages, and as a result I missed the fact that it DOES NOT have the punctuation characters such as proper inverted commas, mdashes, ndashes etc..

Thanks for your help, much appreciated.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem preserving accented characters when writing text to file
 
Similar Threads
[JAVA] Endoding an InputStream
Accented Characters Displayed Wrongly
Converting from ISO-8859-1 to UTF-8
Changing charset from UTF-16 to ISO 8859-1
XML parse error