| Author |
Extracting images and figures from Word Doc
|
Ashish Vegaraju
Ranch Hand
Joined: Aug 19, 2004
Posts: 47
|
|
hi, i m doing a small project on converting Word files to Pdf, i can extract the text, styles, and table data from a Word file, and save them in a pdf.....my problem is how to extract images or/and figures from a Word doc..... dont suggest Jakarta POI project....the image extraction part is not yet supported in the latest release!! Thanks in advance. Ashish.
|
 |
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 12325
|
|
Open Office can export MS Word .doc files as PDF - I don't know how well it handles embedded images though... http://www.openoffice.org/ Bill
|
Java Resources at www.wbrogden.com
|
 |
Ashish Vegaraju
Ranch Hand
Joined: Aug 19, 2004
Posts: 47
|
|
hi, thanks for the advice Mr William....right now i m searching openoffice site but havent found anything of my interest yet...it will be kind if u be more specefic and tell me the link.... and also i have done half of my project using POI...so its difficult for me to completely migrate to open office...what to do?? waiting for replies.... Ashish. [ September 24, 2004: Message edited by: Ashish Vegaraju ]
|
 |
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 12325
|
|
I just suggested Open Office in case you only needed to do a few documents - I suspect the PDF exporting function is not in Java. Maybe you are going to be in the position of contributing to the POI project - can POI at least detect the parts of the Word document that represent the images? Bill
|
 |
Ashish Vegaraju
Ranch Hand
Joined: Aug 19, 2004
Posts: 47
|
|
Hi, currently the work is going on to extract the image....yes it can detect the area where the actual image is....but no methods to extract the image. and it is a real tough job to understand the format of image in a word file... is their any thing that i can do now? Ashish.
|
 |
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 12325
|
|
There are web sites such as wotsit! that are devoted to uncovering the details of MS and other application file formats. Try a google search for "word file format". Bill
|
 |
Ashish Vegaraju
Ranch Hand
Joined: Aug 19, 2004
Posts: 47
|
|
hi. Thank u Mr William for ur reply...i found some sites where the file formats r explained in very comprehensive manner...now it seems that i can extract jpeg, gif as well as png images from the word doc, becoz now i know the header address of these images in the file......thanks for the support. but if i find any problem in future, i will again ask for ur help thanks Ashish
|
 |
 |
|
|
subject: Extracting images and figures from Word Doc
|
|
|