aspose file tools*
The moose likes Beginning Java and the fly likes optical character recognition (OCR) with java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "optical character recognition (OCR) with java" Watch "optical character recognition (OCR) with java" New topic
Author

optical character recognition (OCR) with java

Ashish S Yadav
Ranch Hand

Joined: Apr 08, 2012
Posts: 31
Is java suitable for making OCR software, ie one which converts text in a photo to text form? If yes, then how do i do it ? Does it involve advanced java programming and years of experience to make such code ?
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 859

1. Yes

2. Process every pixel in the image, and figure out which character it is.

3. Yes

You're better off using an off-the-shelf or open source software package if you need this in a hurry.

WP
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

I wouldn't think that the programming would be "advanced". But I would think that the algorithms used to recognize and distinguish letters would be "advanced". Programming the algorithms should be quite straightforward once they have been identified and specified.
Ashish S Yadav
Ranch Hand

Joined: Apr 08, 2012
Posts: 31
I am not in a hurry. I want to develop a simple one myself. Any suggestions for what kind of things i should read or try before i can make such code ?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40052
    
  28
Ashish S Yadav wrote:I am not in a hurry. I want to develop a simple one myself. Any suggestions . . . ?
Learn how to live for 1000 years. This is a big problem which has taken thousands of man-years to solve. I don’t know whether the algorithms are published.

You would have to work out the relationship between black and white pixels. If for example you have a box containing your character, and the top left pixel is white, you know it can’t be BDEFHKLMNPRTUVWXYZ, because those all have a black pixel at their top left. If that pixel is black, and you go straight down from that top left black pixel, that means your letter could be T V W X Y or Z because those capital letters have black at the top left and white on the middle left.
Nikolay Shi
Greenhorn

Joined: Apr 13, 2012
Posts: 2
If you need to use OCR in your project, creating your own engine is not the best idead if you ask me :/ However, there's not not much existing developer tools for OCR in Java.

As far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, for example, for one of the most popular opensource OCR engines - Tesseract (http://groups.google.com/group/tesseract-ocr/) - there are some Java wrappers like tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/). That could work for you, but it's rather hard to set up and will require developing image-preprocessing and font training on your side.

One more solution could be a cloud service. It requires end-user application to have the internet connection, but it's independent from your programming language choice and resources limitations (which is importatnt on mobile devices, OCR proccess consumes rather big amount of recources). Have a look at http://www.ocrsdk.com, it's a cloud-based OCR SDK that let you upload an image through web API and returns you the OCRed data.

This Web API based OCR SDK is not free, which may not be suitable for you, but i still recommend you try it out (it has a free 90 days trial without any upfront charges) as its pricing is really affordable in comparison with enterprise solutions while it provides enterprise-level OCR accuracy which is way better than open source. You may also find usefl its codesample repo @github: https://github.com/abbyysdk/ocrsdk.com

Hope it helps!
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40052
    
  28
Welcome to the Ranch

I have used abbyy myself, a long time ago, and it was good. I didn’t try any programming with it, however.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Ashish S Yadav wrote:Any suggestions for what kind of things i should read or try before i can make such code ?


At the risk of stating the obvious, I would suggest doing some research on OCR and its algorithms in general. As already pointed out, you need to understand how OCR works before you can figure out how to put it into Java. Regardless of what language you use, it starts with something like: a matrix of pixels (black/white, greyscale, or color); some math to work out which characters those pixels might represent; the confidence level or probability of it being a given one of those candidate characters. You'll need a decent grasp of those concepts before you can even think about how to write the code.
Ethan Paul
Greenhorn

Joined: Dec 21, 2012
Posts: 1
hello everyone

I want to implement ocr in java for my project.but unfortunately, i dont know how to begin. It would be great if someone could help me out in this. Please tell me from where i can study for this so that i can code in java.
thanks

regards
ethan
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: optical character recognition (OCR) with java