It might have something to do with character encodings, depending how the text is stored in the text files. Once you pull it into
Java Strings though, everything should be Unicode, and those special characters should be single chars, not a combination of two. How exactly do you read the input files? A Reader? An InputStream? Some combination of the two?