This week's book giveaway is in the Java 8 forum.
We're giving away four copies of Java 8 in Action and have Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft on-line!
See this thread for details.
The moose likes Other JSE/JEE APIs and the fly likes converting png to tiff and character recognition with tesseract Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "converting png to tiff and character recognition with tesseract" Watch "converting png to tiff and character recognition with tesseract" New topic
Author

converting png to tiff and character recognition with tesseract

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
His,

Trying to have tesseract (http://code.google.com/p/tesseract-ocr/) read text from the tiff image (converted from a png image source either with imageio in Linux or Image Converter .EXE in Windows). The outputted text is empty or looks like \\\\\\\\\\\\\\\\\\\\\HHHHHHHHHHHH\\\\\\\\\\\\\\\\\UU\\\\\\\\\\\\\\\H\W

Does anyone have an idea what can cause the problem. I could imagine it is related with low contrast between yellow background and text or some type of attribute that one needs to set when converting from png.



[Thumbnail for original.png]

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
next image


[Download converted-with-java-imageio.tif] Download

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Denis Wen wrote:next image


[Download converted-with-image-converter.tif] Download

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39578
    
  27
Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.


Ping & DNS - updated with new look and Ping home screen widget
Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Ok, I should try that. What's the best way to grayscale an image you would suggest? with ImageIO somehow?
Ulf Dittmer wrote:Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39578
    
  27
This may give you some ideas: http://blog.codebeach.com/2008/03/convert-color-image-to-gray-scale-image.html
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: converting png to tiff and character recognition with tesseract
 
Similar Threads
Convert PDF Files into PNG, JPEG, TIFF with Page & Text Extraction
ocr from a website - how and how difficult?
How to set photometricInterpretation property while storing TIFF using JAI?
How can I read a text from an image file ?
how to add tiff writer to imageio package?