• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading tiff file content

 
Dinesh Pise
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Friends,

I have a tiff file images which are the scanned documents. I have to read the text content for eg barcode, name of applicant, dob etc. Can someone please help with this. Once again my tiff files are scanned images.

Regards,
Dinesh Pise
 
Steve Luke
Bartender
Posts: 4181
21
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dinesh, What you need is to translate images into text, this is done via 'OCR' or 'Optical Character Recognition'. There is no built-in library to do that, so you will need to find an OCR library to help you do the work.
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The best-known Java library for OCR is called Tesseract, you'll find that easily.

There are separate libraries for detecting barcodes; searching for "java barcode detection" or some such phrase will find them.
 
Dinesh Pise
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:The best-known Java library for OCR is called Tesseract, you'll find that easily.

There are separate libraries for detecting barcodes; searching for "java barcode detection" or some such phrase will find them.


Hi Ulf Dittmer,

Can you please provide me some link where I can get some more details of Tesseract as how to implement in java/ how to use it in java. I have googled for Tesseract but did'nt succeed in understanding it.

Thank you.

Regards,
Dinesh Pise
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've never used Tesseract, so I can't help. But I notice that there's an extensive FAQ on the site, and it also has forums. Those should get you going.
 
Dinesh Pise
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have done image reading through aspriseOCR.jar and aspriseTIFF.jar but this is paid version. Is Tesseract a free for commercial use.
I am posting the code below which I used to read tiff content and also we have to put to DevIL.dll,ILU.dll & AspriseOCR.dll in windows/system32.




 
Dinesh Pise
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Friends,

This time I am trying to read tiff image content through tesseract. I have download tesseract.exe and install. I am running the tesseract using command prompt and got the following error.

Tesseract Open Source OCR Engine v3.02 with Leptonica
Cannot open input file: 700466293_00000001.tif

now my query is how to install leptonica, I have downloaded leptonica-1.68-win32-lib-include-dirs.zip, can some one please tell how to implement this with tesseract


I have referred Tesseract

Thanks & regards,
Dinesh Pise
 
Dinesh Pise
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Friends,
The below command is working fine to read tif file and will generate out.txt having tiff text content.
Actually I was giving wrong file name and was Tesseract Open Source OCR Engine v3.02 with Leptonica error.


C:\images\tesseract 700466296_00000002.TIF out

Thanks & regards,
Dinesh Pise
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic