File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem while converting PDF to text convertion in Java

 
knazeer ahmed
Greenhorn
Posts: 8
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I am facing the below problem in converting PDF to text.
I have a scanned document which is in PDF. I want to extract the data from that PDF. I tried with PDFbox and Fontbox. but it will work only when the content of the PDF is real text (but not text in image).


Can any one help me in this?..


Thanks and Regards,
Nazeer
 
Rob Spoor
Sheriff
Pie
Posts: 20381
46
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will need to use an OCR library for retrieving text from any kind of image.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... something like Tesseract.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic