The moose likes I/O and Streams and the fly likes How can I read a text from an image file ? Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Reply Bookmark "How can I read a text from an image file ?" Watch "How can I read a text from an image file ?" New topic
Author

How can I read a text from an image file ?

Gautam Ry
Ranch Hand

Joined: Dec 30, 2008
Posts: 41
I need to read a Text(Account Number) from a image file (.tif).
I tried the following approach :

try{
File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8021.tif");
BufferedImage buffImage=ImageIO.read(newFile);
ByteArrayOutputStream os= new ByteArrayOutputStream();
ImageIO.write(buffImage,IMAGE_TYPE,os);
byte []data=os.toByteArray();
String imageString=new BASE64Encoder().encode(data);
}catch (Exception e){}


But it was throwing problem. After googling, I found that ImageIO has some limitation to read an editable image.
Then, I tried the following approach :

try{File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8033.tif");
byte[] fileData = new byte[ (int)newFile.length()];
InputStream inStream = new FileInputStream( newFile);
inStream.read(fileData);
inStream.close();
String tempFileData = new String(fileData);
String imageString=new BASE64Encoder().encode(fileData);
}catch (Exception e){}


But i did n't get the desired out put . The Out put is as below.
xTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUx

Please, help me to address the issue.

Thanks and Regards
Gautam

Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
Image files contain binary data - they can't be treated like character data. What's more, bitmap image formats (like TIFF, JPEG, GIF, PNG, etc.) do not contain any text they show in easily extractable form at all. Your best bet is to use an OCR package like Tesseract.
Gautam Ry
Ranch Hand

Joined: Dec 30, 2008
Posts: 41
Hi Lester,

many many thanks for the useful responce.
I need some more details on your post to go ahead.

a) what is OCR package ? can i download it from internet ?
b) what is Tesseract. ?

Could you give me some examples on the issue?

Thanks again for the reply.

Regards
Gautam
Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
OCR = Optical Character Recognition. It has a Wikipedia page that should get you started.

Googling for Tesseract should find its home page pretty quickly; it's not like that's a common word.
 
 
subject: How can I read a text from an image file ?
 
Threads others viewed
How to encode image in parts?
Convert Ascii String to Tif file format content
java.lang.OutOfMemoryError
Multiple File Save
problem with base64Binary
MyEclipse, The Clear Choice