This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes Problem while converting PDF to text convertion in Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Problem while converting PDF to text convertion in Java" Watch "Problem while converting PDF to text convertion in Java" New topic
Author

Problem while converting PDF to text convertion in Java

knazeer ahmed
Greenhorn

Joined: Sep 09, 2008
Posts: 8
hi,
I am facing the below problem in converting PDF to text.
I have a scanned document which is in PDF. I want to extract the data from that PDF. I tried with PDFbox and Fontbox. but it will work only when the content of the PDF is real text (but not text in image).


Can any one help me in this?..


Thanks and Regards,
Nazeer
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

You will need to use an OCR library for retrieving text from any kind of image.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41068
    
  43
... something like Tesseract.


Ping & DNS - my free Android networking tools app
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem while converting PDF to text convertion in Java
 
Similar Threads
Extracting images and figures from Word Doc
Pdf to html
Convert PDF to RTF using itext
create pdf command line
Convert PDF to RTF using itext