I'm using java.net.URI#getHost() method to extract the top level domain from an URL.
but this method is failed to extract the TLD of such URLs "http://www.schönesdresden.de/resources/internet.2jpg.jpg" It's failed due to umlaut "ö" in it.
You can try to use java.net.URL instead of java.net.URI. That will not make getHost() return null. It still can't handle the ö though. For some reason it gets translated from ö (int value 148) to ÷ (int value 246). This conversion already happens in the URL constructor. I've also tried the URL(String, String, String) constructor but I get the same result. The code for URL shows me that the HTTP specific URLStreamHandler is doing this.
Wasn't there some change recently in which you could have domain names in all kinds of character sets? (Goes off and searches the web...) Yes, there was. Here is an example of somebody registering such domains. However I don't know the details of how it works. There must have been an RFC for it.