Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

converting png to tiff and character recognition with tesseract

 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
His,

Trying to have tesseract (http://code.google.com/p/tesseract-ocr/) read text from the tiff image (converted from a png image source either with imageio in Linux or Image Converter .EXE in Windows). The outputted text is empty or looks like \\\\\\\\\\\\\\\\\\\\\HHHHHHHHHHHH\\\\\\\\\\\\\\\\\UU\\\\\\\\\\\\\\\H\W

Does anyone have an idea what can cause the problem. I could imagine it is related with low contrast between yellow background and text or some type of attribute that one needs to set when converting from png.

original.png
[Thumbnail for original.png]
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
next image
Filename: converted-with-java-imageio.tif
Description:
File size: 2 Kbytes
[Download converted-with-java-imageio.tif] Download Attachment
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Denis Wen wrote:next image
Filename: converted-with-image-converter.tif
Description:
File size: 12 Kbytes
[Download converted-with-image-converter.tif] Download Attachment
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, I should try that. What's the best way to grayscale an image you would suggest? with ImageIO somehow?
Ulf Dittmer wrote:Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
 
Ulf Dittmer
Rancher
Posts: 42967
73
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic