This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I saved a large Microsoft Word document containing many accented characters, n-dashes, inverted commas, etc as HTML, and then using java code I converted it to XML, and split it up into smaller pieces. I then tried to write these pieces out to the filesystem as ISO-8859-1. Since the original MS Word document is CP1252 which is supposed to be a superset of ISO-8859-1, I though this should be straightforward. But when I reopen these files, all the accented characters, inverted commas etc have been converted to question marks.
I did some intensive googling and found many recommendations as to how to resolve this problem, and tried several of them, and was not able to get any of them to work.
That was it, I was so focused on the fact that ISO-8859-1 DOES support the accented characters of most European languages, and as a result I missed the fact that it DOES NOT have the punctuation characters such as proper inverted commas, mdashes, ndashes etc..
Thanks for your help, much appreciated.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: Problem preserving accented characters when writing text to file