Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem while converting PDF to text convertion in Java

 
knazeer ahmed
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I am facing the below problem in converting PDF to text.
I have a scanned document which is in PDF. I want to extract the data from that PDF. I tried with PDFbox and Fontbox. but it will work only when the content of the PDF is real text (but not text in image).


Can any one help me in this?..


Thanks and Regards,
Nazeer
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will need to use an OCR library for retrieving text from any kind of image.
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... something like Tesseract.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic