Forum:

Java in General

fetching top level domain from URL.

Minhaj Mehmood

Ranch Hand

Posts: 400

I like...

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi All,

I'm using java.net.URI#getHost() method to extract the top level domain from an URL.
but this method is failed to extract the TLD of such URLs "http://www.schönesdresden.de/resources/internet.2jpg.jpg" It's failed due to umlaut "ö" in it.

Please Advise.

SCJP6 96% | SCWCD5 81% | SCDJWS5 79%

Sunny Bhandari

Ranch Hand

Posts: 448

I like...

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Not sure about java.net package but you can also do that job by String operations.

Java Experience

Ralph Cook

Ranch Hand

Posts: 479

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It would be very helpful if you would post some code and quote an error message, etc. What you say may be accurate, but it's difficult to help without knowing in more detail what is happening.

rc

Minhaj Mehmood

Ranch Hand

Posts: 400

I like...

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Following is my code:

SCJP6 96% | SCWCD5 81% | SCDJWS5 79%

Rob Spoor

Sheriff

Posts: 22784

131

I like...

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

You can try to use java.net.URL instead of java.net.URI. That will not make getHost() return null. It still can't handle the ö though. For some reason it gets translated from ö (int value 148) to ÷ (int value 246). This conversion already happens in the URL constructor. I've also tried the URL(String, String, String) constructor but I get the same result. The code for URL shows me that the HTTP specific URLStreamHandler is doing this.

SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6 - OCEJPAD 6
How To Ask Questions How To Answer Questions

Ralph Cook

Ranch Hand

Posts: 479

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Actually, I should have known the answer from the original description -- RFC 1738 specifies the characters that are allowed in URLs, and it doesn't include anything with umlauts.

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

So it isn't working because it isn't supposed to work. You need to use a URL encoder to encode the umlauted o.

rc

Paul Clapham

Marshal

Posts: 28226

I like...

posted 13 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Wasn't there some change recently in which you could have domain names in all kinds of character sets? (Goes off and searches the web...) Yes, there was. Here is an example of somebody registering such domains. However I don't know the details of how it works. There must have been an RFC for it.

(Looks back at results of web search...) And of course there's already a Wikipedia page about it: Internationalized domain name.