• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Extracting images and figures from Word Doc

 
Ashish Vegaraju
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,

i m doing a small project on converting word files to Pdf, i can extract the text, styles, and table data from a Word file, and save them in a pdf.....my problem is how to extract images or/and figures from a Word doc.....

dont suggest Jakarta POI project....the image extraction part is not yet supported in the latest release!!

Thanks in advance.
Ashish.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Open Office can export MS Word .doc files as PDF - I don't know how well it handles embedded images though...
http://www.openoffice.org/
Bill
 
Ashish Vegaraju
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,

thanks for the advice Mr William....right now i m searching openoffice site but havent found anything of my interest yet...it will be kind if u be more specefic and tell me the link....

and also i have done half of my project using POI...so its difficult for me to completely migrate to open office...what to do??

waiting for replies....
Ashish.
[ September 24, 2004: Message edited by: Ashish Vegaraju ]
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just suggested Open Office in case you only needed to do a few documents - I suspect the PDF exporting function is not in Java.
Maybe you are going to be in the position of contributing to the POI project - can POI at least detect the parts of the Word document that represent the images?
Bill
 
Ashish Vegaraju
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

currently the work is going on to extract the image....yes it can detect the area where the actual image is....but no methods to extract the image.
and it is a real tough job to understand the format of image in a word file...

is their any thing that i can do now?
Ashish.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are web sites such as wotsit! that are devoted to uncovering the details of MS and other application file formats. Try a google search for "word file format".
Bill
 
Ashish Vegaraju
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi.

Thank u Mr William for ur reply...i found some sites where the file formats r explained in very comprehensive manner...now it seems that i can extract jpeg, gif as well as png images from the word doc, becoz now i know the header address of these images in the file......thanks for the support.

but if i find any problem in future, i will again ask for ur help
thanks
Ashish
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic