wood burning stoves*
The moose likes Java in General and the fly likes Extracting images and figures from Word Doc Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Extracting images and figures from Word Doc" Watch "Extracting images and figures from Word Doc" New topic
Author

Extracting images and figures from Word Doc

Ashish Vegaraju
Ranch Hand

Joined: Aug 19, 2004
Posts: 47
hi,

i m doing a small project on converting Word files to Pdf, i can extract the text, styles, and table data from a Word file, and save them in a pdf.....my problem is how to extract images or/and figures from a Word doc.....

dont suggest Jakarta POI project....the image extraction part is not yet supported in the latest release!!

Thanks in advance.
Ashish.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12803
    
    5
Open Office can export MS Word .doc files as PDF - I don't know how well it handles embedded images though...
http://www.openoffice.org/
Bill
Ashish Vegaraju
Ranch Hand

Joined: Aug 19, 2004
Posts: 47
hi,

thanks for the advice Mr William....right now i m searching openoffice site but havent found anything of my interest yet...it will be kind if u be more specefic and tell me the link....

and also i have done half of my project using POI...so its difficult for me to completely migrate to open office...what to do??

waiting for replies....
Ashish.
[ September 24, 2004: Message edited by: Ashish Vegaraju ]
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12803
    
    5
I just suggested Open Office in case you only needed to do a few documents - I suspect the PDF exporting function is not in Java.
Maybe you are going to be in the position of contributing to the POI project - can POI at least detect the parts of the Word document that represent the images?
Bill
Ashish Vegaraju
Ranch Hand

Joined: Aug 19, 2004
Posts: 47
Hi,

currently the work is going on to extract the image....yes it can detect the area where the actual image is....but no methods to extract the image.
and it is a real tough job to understand the format of image in a word file...

is their any thing that i can do now?
Ashish.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12803
    
    5
There are web sites such as wotsit! that are devoted to uncovering the details of MS and other application file formats. Try a google search for "word file format".
Bill
Ashish Vegaraju
Ranch Hand

Joined: Aug 19, 2004
Posts: 47
hi.

Thank u Mr William for ur reply...i found some sites where the file formats r explained in very comprehensive manner...now it seems that i can extract jpeg, gif as well as png images from the word doc, becoz now i know the header address of these images in the file......thanks for the support.

but if i find any problem in future, i will again ask for ur help
thanks
Ashish
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extracting images and figures from Word Doc