This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
The readChar() method of DataInputStream reads exactly two bytes and assumes that they are a Unicode representation of a character. The problem is, most text files aren't in Unicode - they're usually in your system's default encoding. On Windows in the Americas and Europe this is usually Cp-1252, which is Microsoft's version for latin-1 encoding (a variant of ASCII). It's a one-byte encoding - which means that the DataInputStream is grabbing two two characters in Cp-1252 and reinterpreting them as one Unicode char, which results in gibberish. Instead of DataInputStream, try a FileReader wrapped in a BufferedReader:
The FileReader takes char of translating the system default encoding into characters, and the BufferedReader takes care of reading one line at a time. What you do with each line you've read is up to you...
"I'm not back." - Bill Harding, Twister
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com