I am trying to figure out how to use OCR from the ground up (and using
java somewhere in the back-end). Our current system is as follows:
1) Scan multiple documents (using a Canon 5020) and save as a PDF format (usually several hundred of the same type of document - each one for a different person).
2) Using a java Swing GUI, the user opens each PDF document and assigns a type to it and a few other parameters). The system then takes the PDF and stores it in an appropriate location and enters appropriate database information (always stored as a PDF).
I want to skip the user section and automatate with OCR. Not even sure where to start. Should I scan the documents and save as PDF and then use some OCR program to read through the PDF or should it be saved as some other format and converted later. What are some good tools, etc...
Thanks for any help!