Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How do I read a file containing non-english text?

 
Vasudhaiv Naresh
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
I have a problem wherein I have two text files containing non English text (say, Hindi for instance). I have to compare the contents of the two files. Can anybody help me as to how I can do this using Java?
Thanks,
Naresh
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why would comparing non-English text be any different than comparing English text? Java uses Unicode internally, so once the text is memory, it's all the same anyway.
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15272
37
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The char data type in Java is a 16-bit Unicode character. It can contain Hindi characters as well as English (Latin-1) characters. There should be no difference in handling these character sets.

How exactly do you need to compare the files? Do you just have to check if they are exactly the same or not? If that's the case, you don't need to worry about character encodings at all; you can just read the files byte by byte and compare the bytes.
 
Vasudhaiv Naresh
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks that helps me.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic