| Author |
fetching top level domain from URL.
|
Minhaj Mehmood
Ranch Hand
Joined: Jan 22, 2007
Posts: 400
|
|
Hi All,
I'm using java.net.URI#getHost() method to extract the top level domain from an URL.
but this method is failed to extract the TLD of such URLs "http://www.schönesdresden.de/resources/internet.2jpg.jpg" It's failed due to umlaut "ö" in it.
Please Advise.
|
SCJP6 96% | SCWCD5 81% | SCDJWS5 79%
|
 |
Sunny Bhandari
Ranch Hand
Joined: Dec 06, 2010
Posts: 446
|
|
|
Not sure about java.net package but you can also do that job by String operations.
|
 |
Ralph Cook
Ranch Hand
Joined: May 29, 2005
Posts: 479
|
|
It would be very helpful if you would post some code and quote an error message, etc. What you say may be accurate, but it's difficult to help without knowing in more detail what is happening.
rc
|
 |
Minhaj Mehmood
Ranch Hand
Joined: Jan 22, 2007
Posts: 400
|
|
Following is my code:
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
|
You can try to use java.net.URL instead of java.net.URI. That will not make getHost() return null. It still can't handle the ö though. For some reason it gets translated from ö (int value 148) to ÷ (int value 246). This conversion already happens in the URL constructor. I've also tried the URL(String, String, String) constructor but I get the same result. The code for URL shows me that the HTTP specific URLStreamHandler is doing this.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
Ralph Cook
Ranch Hand
Joined: May 29, 2005
Posts: 479
|
|
Actually, I should have known the answer from the original description -- RFC 1738 specifies the characters that are allowed in URLs, and it doesn't include anything with umlauts.
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
So it isn't working because it isn't supposed to work. You need to use a URL encoder to encode the umlauted o.
rc
|
 |
Paul Clapham
Bartender
Joined: Oct 14, 2005
Posts: 16483
|
|
Wasn't there some change recently in which you could have domain names in all kinds of character sets? (Goes off and searches the web...) Yes, there was. Here is an example of somebody registering such domains. However I don't know the details of how it works. There must have been an RFC for it.
(Looks back at results of web search...) And of course there's already a Wikipedia page about it: Internationalized domain name.
|
 |
Minhaj Mehmood
Ranch Hand
Joined: Jan 22, 2007
Posts: 400
|
|
|
I think the answer has been found check this: http://weblogs.java.net/blog/2007/03/29/international-domain-names
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32687
|
|
Minhaj Mehmood wrote:I think the answer has been found . . .
Well done finding that solution
|
 |
 |
|
|
subject: fetching top level domain from URL.
|
|
|