Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing an HTML document. URL format problem

 
Daniel Cote
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm parsing HTML pages from a web site. I use this piece of code:

URL url = new URL(urlStr);
content = url.openStream();

'urlStr' is String type and I mount it before. I've discovered that I got and HTTP 400 error (malformed url exception I think) when I use spaces inside the URL, i.e:

http://www.yahoo.es?section=user mail&index=5

I've changed the code to change spaces by '%20':

urlStr = StringUtils.replace(urlStr," ","%20");
URL url = new URL(urlStr);
content = url.openStream();

and now works fine. My question is if there's a way (method, class...) that could validate URL formats and change invalid chars like spaces (in this case) to understandable chars and format to do the 'url.openStream()' in any case.

Thanks in advance.
 
Paul Clapham
Sheriff
Posts: 20711
29
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes. The class java.net.URLEncoder exists to do precisely that. Remember to encode only the part of the URL after the ? character.
 
Daniel Cote
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, thanks for your help
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic