aspose file tools*
The moose likes Other JSE/JEE APIs and the fly likes converting png to tiff and character recognition with tesseract Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "converting png to tiff and character recognition with tesseract" Watch "converting png to tiff and character recognition with tesseract" New topic
Author

converting png to tiff and character recognition with tesseract

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
His,

Trying to have tesseract (http://code.google.com/p/tesseract-ocr/) read text from the tiff image (converted from a png image source either with imageio in Linux or Image Converter .EXE in Windows). The outputted text is empty or looks like \\\\\\\\\\\\\\\\\\\\\HHHHHHHHHHHH\\\\\\\\\\\\\\\\\UU\\\\\\\\\\\\\\\H\W

Does anyone have an idea what can cause the problem. I could imagine it is related with low contrast between yellow background and text or some type of attribute that one needs to set when converting from png.



[Thumbnail for original.png]

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
next image


[Download converted-with-java-imageio.tif] Download

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Denis Wen wrote:next image


[Download converted-with-image-converter.tif] Download

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41572
    
  54
Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.


Ping & DNS - my free Android networking tools app
Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Ok, I should try that. What's the best way to grayscale an image you would suggest? with ImageIO somehow?
Ulf Dittmer wrote:Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41572
    
  54
This may give you some ideas: http://blog.codebeach.com/2008/03/convert-color-image-to-gray-scale-image.html
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: converting png to tiff and character recognition with tesseract