aspose file tools
The moose likes Other JSE/JEE APIs and the fly likes converting png to tiff and character recognition with tesseract Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Reply Bookmark "converting png to tiff and character recognition with tesseract" Watch "converting png to tiff and character recognition with tesseract" New topic
Author

converting png to tiff and character recognition with tesseract

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
His,

Trying to have tesseract (http://code.google.com/p/tesseract-ocr/) read text from the tiff image (converted from a png image source either with imageio in Linux or Image Converter .EXE in Windows). The outputted text is empty or looks like \\\\\\\\\\\\\\\\\\\\\HHHHHHHHHHHH\\\\\\\\\\\\\\\\\UU\\\\\\\\\\\\\\\H\W

Does anyone have an idea what can cause the problem. I could imagine it is related with low contrast between yellow background and text or some type of attribute that one needs to set when converting from png.



[Thumbnail for original.png]

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
next image


[Download converted-with-java-imageio.tif] Download

Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Denis Wen wrote:next image


[Download converted-with-image-converter.tif] Download

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 35240
    
    7
Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.


Android appsImageJ pluginsJava web charts
Denis Wen
Ranch Hand

Joined: Nov 11, 2008
Posts: 33
Ok, I should try that. What's the best way to grayscale an image you would suggest? with ImageIO somehow?
Ulf Dittmer wrote:Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 35240
    
    7
This may give you some ideas: http://blog.codebeach.com/2008/03/convert-color-image-to-gray-scale-image.html
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: converting png to tiff and character recognition with tesseract
 
Similar Threads
ocr from a website - how and how difficult?
Convert PDF Files into PNG, JPEG, TIFF with Page & Text Extraction
how to add tiff writer to imageio package?
How can I read a text from an image file ?
How to set photometricInterpretation property while storing TIFF using JAI?