aspose file tools*
The moose likes Sockets and Internet Protocols and the fly likes retrieve images from the web Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "retrieve images from the web" Watch "retrieve images from the web" New topic
Author

retrieve images from the web

mj zammit
Ranch Hand

Joined: Nov 16, 2008
Posts: 49
I have built a client socket to get the web page from the server. When i output the result the images will not be shown. How can i get the images to show of the requested web page?
this is the code of my client:
try {
InetAddress addr = InetAddress.getByName(args);
URL url = new URL(args);
int getport = url.getPort();
int port = 80;
SocketAddress sockaddr = new InetSocketAddress(addr,port);
Socket socket = new Socket();
int timeoutMS = 2000;
socket.connect(sockaddr,timeoutMS);

boolean autoflash = true;
PrintWriter out = new PrintWriter(socket.getOutputStream(),autoflash);
InputStream inputStream = socket.getInputStream();
InputStreamReader isReader = new InputStreamReader(inputStream);
BufferedReader rd = new BufferedReader(isReader);

out.println("GE" + args + "HTTP/1.1");
out.println("HOST: localhost:80");
out.println("connection: closed");
out.println();

String s = null;
while ((s = rd.readLine()) != null)
System.out.println(s);
rd.close();
} catch (MalformedURLException ex) {
} catch (UnknownHostException ex) {
} catch (IOException ex) {
}
}
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
You'll need to parse the HTML page you got, and then download the images in separate calls. Note that there various ways of adding images to web pages (IMG tags, via CSS, via JavaScript), so it's not trivial if you need to be sure you're downloading all the images.

} catch (MalformedURLException ex) {
} catch (UnknownHostException ex) {
} catch (IOException ex) {
}

This is a bad idea. How will you know if there are any problems? At least write the error message to System.err or System.out.


Ping & DNS - my free Android networking tools app
mj zammit
Ranch Hand

Joined: Nov 16, 2008
Posts: 49
Yes it is a bad idea. I have now added the System.out.println to each of them to see what error was caught.
Going back to the images, so i will not only be have to call for the html file but the images the server uses??
any ideas on how to do it please
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18675
    
    8

Originally posted by mj zammit:
any ideas on how to do it please
Depends what "it" means in that question. Are you asking how to parse the HTML and identify the images you need to download? Are you asking how to download an image? Are you asking how to display the image?

You only said "When i output the result" and didn't say anything about what that "output" looked like, so I don't have any context for answering questions number 1 and 3 of the possibilities. For #2, how to download an image, you do that just like downloading the HTML, but don't use a Reader because that's meant for reading text. Use an InputStream instead.
mj zammit
Ranch Hand

Joined: Nov 16, 2008
Posts: 49
In the above code my output would be the HTML of any website (example www.yahoo.com), but when i come to rendering this HTML the images do not come up and i would like to rectify that problem...
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
The page in all likelihood has relative paths to images - you need to replace those with absolute paths that point to the original site.

Or, as was mentioned above, you need to download the images. In that case, you still need to correct the links in the HTML page, or mimic the server's directory structure locally.
mj zammit
Ranch Hand

Joined: Nov 16, 2008
Posts: 49
hmmm i see...
How do I find out what is the absolute path of a website? Are there special functions in Java?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
In general, just prepend the domain name. For example, if the relative image path is "/images/foobar.gif", then the absolute path is "http://www.yahoo.com/images/foobar.gif".
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18675
    
    8

It's also possible that the HTML has a <base> tag which specifies that the base is something other then the URL of the page itself.

Also, the URL class in Java has methods for producing a URL from a relative path and a base URL.
mj zammit
Ranch Hand

Joined: Nov 16, 2008
Posts: 49
Thanks very much for all your replies, they have helped me a great deal
I am now having problems getting the contents of a particular html tag (for example i want the img tag with the src's value). I would like to use regular expressions. Does anyone have any ideas?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
Using regexps will cause a variety of problems. I'd use a library like TagSoup or NekoXNI to transform the HTML to XML, and then use the SAX API to work with the XML.
 
 
subject: retrieve images from the web