aspose file tools*
The moose likes I/O and Streams and the fly likes Parsing an HTML document. URL format problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Parsing an HTML document. URL format problem" Watch "Parsing an HTML document. URL format problem" New topic
Author

Parsing an HTML document. URL format problem

Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
I'm parsing HTML pages from a web site. I use this piece of code:

URL url = new URL(urlStr);
content = url.openStream();

'urlStr' is String type and I mount it before. I've discovered that I got and HTTP 400 error (malformed url exception I think) when I use spaces inside the URL, i.e:

http://www.yahoo.es?section=user mail&index=5

I've changed the code to change spaces by '%20':

urlStr = StringUtils.replace(urlStr," ","%20");
URL url = new URL(urlStr);
content = url.openStream();

and now works fine. My question is if there's a way (method, class...) that could validate URL formats and change invalid chars like spaces (in this case) to understandable chars and format to do the 'url.openStream()' in any case.

Thanks in advance.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

Yes. The class java.net.URLEncoder exists to do precisely that. Remember to encode only the part of the URL after the ? character.
Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
Ok, thanks for your help
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Parsing an HTML document. URL format problem