I had started working on something for a JavaServer Page that would read the contents between the html <title></title> tags to find the title of the page. What I was looking at was the java.net.URL class and the related classes in the Javadoc. There is a method: object URL.getContent() in the URL class. The object returned has to do with the Content Type and the MIME, etc. That's where I realized it was more trouble than it was worth for my purposes. Since all the URLs I were interested in where on my server, I just used the Servlet methods to obtain the real path of the files and accessed them with the java.io file classes. I doubt that would work with files that don't belong to you, however I'm sure it can be done, it's just gonna take some research either on the web or throught the javadoc. Good Luck.
Just do something like this: <code> URL url = new URL("http://www.javaranch.com"); HttpUrlConnection conn = new HttpUrlConnection(url); url.connect(); InputStream in = url.getInputStream(); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); </code> then you can iterate over reader, calling readLine() which will return each line of the html returned.