aspose file tools*
The moose likes Java in General and the fly likes Read images in PDF document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Read images in PDF document" Watch "Read images in PDF document" New topic
Author

Read images in PDF document

John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Hi,

I need to read PDF files in Java. I have used PDFBox.jar file and have successfully read the contents of the PDF file.

But I want to read the text content in the Images present in the PDF File. I am not able to achieve this.

I used both iText and PDFBox to read the images in the PDF file. Could you please suggest a way to read the text content of images in PDF file?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

The text isn't stored as text in the PDF, so you can't retrieve it as text from the PDF. You'll need OCR (Optical Character Recognition) to extract text from this (or any) image.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Hi Rob,

OCR (Optical Character Recognition) - can this be done using java? if yes is there any jar files you can suggest?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

It probably can be done, but I haven't done it myself before.
Wim Vanni
Ranch Hand

Joined: Apr 06, 2011
Posts: 96

Ron Cemer's Java OCR

This one could be of use. Haven't tried it myself, so any feedback of your own experience when trying this out is surely welcome on this forum!

Cheers,
Wim
Ove Lindström
Ranch Hand

Joined: Mar 10, 2008
Posts: 326

Have you tested Asprise version??

http://asprise.net/product/javapdf/
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Read images in PDF document