aspose file tools*
The moose likes I/O and Streams and the fly likes Parsing an HTML document. URL format problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Parsing an HTML document. URL format problem" Watch "Parsing an HTML document. URL format problem" New topic
Author

Parsing an HTML document. URL format problem

Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
I'm parsing HTML pages from a web site. I use this piece of code:

URL url = new URL(urlStr);
content = url.openStream();

'urlStr' is String type and I mount it before. I've discovered that I got and HTTP 400 error (malformed url exception I think) when I use spaces inside the URL, i.e:

http://www.yahoo.es?section=user mail&index=5

I've changed the code to change spaces by '%20':

urlStr = StringUtils.replace(urlStr," ","%20");
URL url = new URL(urlStr);
content = url.openStream();

and now works fine. My question is if there's a way (method, class...) that could validate URL formats and change invalid chars like spaces (in this case) to understandable chars and format to do the 'url.openStream()' in any case.

Thanks in advance.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18155
    
    8

Yes. The class java.net.URLEncoder exists to do precisely that. Remember to encode only the part of the URL after the ? character.
Daniel Cote
Greenhorn

Joined: May 20, 2004
Posts: 9
Ok, thanks for your help
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing an HTML document. URL format problem
 
Similar Threads
how to write content to an XML file ?
Parsing and HTML document. Char set problem.
Why does "connnection timed out" appear when trying to read remote xml file.
Reading files from classpath searching with a pattern
https URLConnection