I saved a large Microsoft Word document containing many accented characters, n-dashes, inverted commas, etc as HTML, and then using java code I converted it to XML, and split it up into smaller pieces. I then tried to write these pieces out to the filesystem as ISO-8859-1. Since the original MS Word document is CP1252 which is supposed to be a superset of ISO-8859-1, I though this should be straightforward. But when I reopen these files, all the accented characters, inverted commas etc have been converted to question marks.
I did some intensive googling and found many recommendations as to how to resolve this problem, and tried several of them, and was not able to get any of them to work.
That was it, I was so focused on the fact that ISO-8859-1 DOES support the accented characters of most European languages, and as a result I missed the fact that it DOES NOT have the punctuation characters such as proper inverted commas, mdashes, ndashes etc..
Thanks for your help, much appreciated.
subject: Problem preserving accented characters when writing text to file