Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

fetching top level domain from URL.

 
Minhaj Mehmood
Ranch Hand
Posts: 400
Hibernate Java Spring
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I'm using java.net.URI#getHost() method to extract the top level domain from an URL.
but this method is failed to extract the TLD of such URLs "http://www.schönesdresden.de/resources/internet.2jpg.jpg" It's failed due to umlaut "ö" in it.

Please Advise.
 
Sunny Bhandari
Ranch Hand
Posts: 448
Eclipse IDE Firefox Browser Tomcat Server
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure about java.net package but you can also do that job by String operations.
 
Ralph Cook
Ranch Hand
Posts: 479
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It would be very helpful if you would post some code and quote an error message, etc. What you say may be accurate, but it's difficult to help without knowing in more detail what is happening.

rc
 
Minhaj Mehmood
Ranch Hand
Posts: 400
Hibernate Java Spring
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Following is my code:

 
Rob Spoor
Sheriff
Pie
Posts: 20381
46
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can try to use java.net.URL instead of java.net.URI. That will not make getHost() return null. It still can't handle the ö though. For some reason it gets translated from ö (int value 148) to ÷ (int value 246). This conversion already happens in the URL constructor. I've also tried the URL(String, String, String) constructor but I get the same result. The code for URL shows me that the HTTP specific URLStreamHandler is doing this.
 
Ralph Cook
Ranch Hand
Posts: 479
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, I should have known the answer from the original description -- RFC 1738 specifies the characters that are allowed in URLs, and it doesn't include anything with umlauts.


"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."


So it isn't working because it isn't supposed to work. You need to use a URL encoder to encode the umlauted o.

rc
 
Paul Clapham
Sheriff
Pie
Posts: 20191
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wasn't there some change recently in which you could have domain names in all kinds of character sets? (Goes off and searches the web...) Yes, there was. Here is an example of somebody registering such domains. However I don't know the details of how it works. There must have been an RFC for it.

(Looks back at results of web search...) And of course there's already a Wikipedia page about it: Internationalized domain name.
 
Minhaj Mehmood
Ranch Hand
Posts: 400
Hibernate Java Spring
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think the answer has been found check this: http://weblogs.java.net/blog/2007/03/29/international-domain-names
 
Campbell Ritchie
Sheriff
Pie
Posts: 47288
52
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Minhaj Mehmood wrote:I think the answer has been found . . .
Well done finding that solution
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic