wood burning stoves 2.0*
The moose likes Java in General and the fly likes fetching top level domain from URL. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "fetching top level domain from URL." Watch "fetching top level domain from URL." New topic
Author

fetching top level domain from URL.

Minhaj Mehmood
Ranch Hand

Joined: Jan 22, 2007
Posts: 400

Hi All,

I'm using java.net.URI#getHost() method to extract the top level domain from an URL.
but this method is failed to extract the TLD of such URLs "http://www.schönesdresden.de/resources/internet.2jpg.jpg" It's failed due to umlaut "ö" in it.

Please Advise.


SCJP6 96% | SCWCD5 81% | SCDJWS5 79%
Sunny Bhandari
Ranch Hand

Joined: Dec 06, 2010
Posts: 448

Not sure about java.net package but you can also do that job by String operations.


Java Experience
Ralph Cook
Ranch Hand

Joined: May 29, 2005
Posts: 479
It would be very helpful if you would post some code and quote an error message, etc. What you say may be accurate, but it's difficult to help without knowing in more detail what is happening.

rc
Minhaj Mehmood
Ranch Hand

Joined: Jan 22, 2007
Posts: 400

Following is my code:

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

You can try to use java.net.URL instead of java.net.URI. That will not make getHost() return null. It still can't handle the ö though. For some reason it gets translated from ö (int value 148) to ÷ (int value 246). This conversion already happens in the URL constructor. I've also tried the URL(String, String, String) constructor but I get the same result. The code for URL shows me that the HTTP specific URLStreamHandler is doing this.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Ralph Cook
Ranch Hand

Joined: May 29, 2005
Posts: 479
Actually, I should have known the answer from the original description -- RFC 1738 specifies the characters that are allowed in URLs, and it doesn't include anything with umlauts.


"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."


So it isn't working because it isn't supposed to work. You need to use a URL encoder to encode the umlauted o.

rc
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Wasn't there some change recently in which you could have domain names in all kinds of character sets? (Goes off and searches the web...) Yes, there was. Here is an example of somebody registering such domains. However I don't know the details of how it works. There must have been an RFC for it.

(Looks back at results of web search...) And of course there's already a Wikipedia page about it: Internationalized domain name.
Minhaj Mehmood
Ranch Hand

Joined: Jan 22, 2007
Posts: 400

I think the answer has been found check this: http://weblogs.java.net/blog/2007/03/29/international-domain-names
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39044
    
  23
Minhaj Mehmood wrote:I think the answer has been found . . .
Well done finding that solution
 
GeeCON Prague 2014
 
subject: fetching top level domain from URL.