This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes I/O and Streams and the fly likes Parsing an HTML document. URL format problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Parsing an HTML document. URL format problem" Watch "Parsing an HTML document. URL format problem" New topic
Author

Parsing an HTML document. URL format problem

Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
I'm parsing HTML pages from a web site. I use this piece of code:

URL url = new URL(urlStr);
content = url.openStream();

'urlStr' is String type and I mount it before. I've discovered that I got and HTTP 400 error (malformed url exception I think) when I use spaces inside the URL, i.e:

http://www.yahoo.es?section=user mail&index=5

I've changed the code to change spaces by '%20':

urlStr = StringUtils.replace(urlStr," ","%20");
URL url = new URL(urlStr);
content = url.openStream();

and now works fine. My question is if there's a way (method, class...) that could validate URL formats and change invalid chars like spaces (in this case) to understandable chars and format to do the 'url.openStream()' in any case.

Thanks in advance.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Yes. The class java.net.URLEncoder exists to do precisely that. Remember to encode only the part of the URL after the ? character.
Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
Ok, thanks for your help
 
 
subject: Parsing an HTML document. URL format problem
 
Similar Threads
Reading files from classpath searching with a pattern
Parsing and HTML document. Char set problem.
https URLConnection
how to write content to an XML file ?
Why does "connnection timed out" appear when trying to read remote xml file.