Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem while converting PDF to text convertion in Java

 
knazeer ahmed
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I am facing the below problem in converting PDF to text.
I have a scanned document which is in PDF. I want to extract the data from that PDF. I tried with PDFbox and Fontbox. but it will work only when the content of the PDF is real text (but not text in image).


Can any one help me in this?..


Thanks and Regards,
Nazeer
 
Rob Spoor
Sheriff
Pie
Posts: 20492
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will need to use an OCR library for retrieving text from any kind of image.
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... something like Tesseract.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic