Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Identifying Japanese Character.

 
Gaurav Mac Mathur
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How can we Identify if the String has Japanse Characters. One way can be picking Charcater by Character and Identifying if thet Lie between 30A0-30FF and 3040-309F. but this is Crude. Is there any facility available with Java to do this?
Cheers
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If there were such a facility, it would have to examine each character anyway. However I do not know of one, so you would probably be best to do it this way yourself.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could do something like this:

You'll have to look at the API for Character.UnicodeBlock to make a more complete list of blocks. And many of the chars you need are in various CJK blocks (Chinese/Japanese/Korean unified) which means that they may contain some chars that aren't really appropriate to Japanese-only usage. I think. This is a muddy issue which I don't understand much. I think you'll need to test using a lot of data, and consulting with people who know the language well (assuming you do not) to be sure your list of chars is appropriate.
 
Gaurav Mac Mathur
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jim,
This was exactly what i wanted to do.

Cheers
Gaurav
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This was exactly what i wanted to do.
Wel, make sure you've gone through the whole list of "CJK" blocks before you say that. I have no idea what they all do, but if they say CJK they probably have something to do with Japanese...
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic