POI supports Word files, but only in a very basic way (yet). It can extract text from
DOC files, but not images.
I'm not aware of any Java library that can convert DOC to HTML. You could try OpenOffice, which can open DOC files, and which has a Java API. Some information about that is
linked here.
Update: Looks like I spoke too soon. There's now a 3.0 alpha version which seems to have the necessary classes and methods to extract images from DOC files. Have a look at the getPicturesTable method in HWPFDocument.
[ January 09, 2007: Message edited by: Ulf Dittmer ]