Hi All, I have a problem wherein I have two text files containing non English text (say, Hindi for instance). I have to compare the contents of the two files. Can anybody help me as to how I can do this using Java? Thanks, Naresh
Joined: Mar 22, 2005
Why would comparing non-English text be any different than comparing English text? Java uses Unicode internally, so once the text is memory, it's all the same anyway.
The char data type in Java is a 16-bit Unicode character. It can contain Hindi characters as well as English (Latin-1) characters. There should be no difference in handling these character sets.
How exactly do you need to compare the files? Do you just have to check if they are exactly the same or not? If that's the case, you don't need to worry about character encodings at all; you can just read the files byte by byte and compare the bytes.