aspose file tools*
The moose likes Other JSE/JEE APIs and the fly likes Java API for PDF to text conversion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Java API for PDF to text conversion" Watch "Java API for PDF to text conversion" New topic
Author

Java API for PDF to text conversion

Ajay Njallacattu
Ranch Hand

Joined: Nov 21, 2006
Posts: 39
Hi,

I'm looking for some free Java API which can help me to convert the pdf to a csv or text file. I need to extract the data from PDF.


Can anyone help?


Regards

Ajay
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42602
    
  65
The http://faq.javaranch.com/java/AccessingFileFormats page points to several libraries that can extract text from a PDF. It'll be unstructured text, though.


Ping & DNS - my free Android networking tools app
Ajay Njallacattu
Ranch Hand

Joined: Nov 21, 2006
Posts: 39

Thanks for the link....

I tried using PDFBox but its not giving the text in a structured manner. Other APIs are commercial. Please do let me know if there is any other open source API for this activity.


Regards

Ajay
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42602
    
  65
JPedal has an open source version as well as a commercial one. But its results will be unstructured text as well.

If you don't have a budget, then you'll need to substitute programming effort for it - check out PDFRenderer. It can display PDFs, so obviously it knows how to access information within them in a structured way. Since its open source, you can find out how it does that.
Ajay Njallacattu
Ranch Hand

Joined: Nov 21, 2006
Posts: 39
Hi,

I have tried with 2-3 API where I'm getting only a scattered test. My PDF is having only tables. I can manupulate the PDFs if needed before its creation, if we can have any functionality in java to get an ordered csv or text format out of the pdf.


Please suggest.

Regards

Ajay
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42602
    
  65
As I said in both previous posts, unstructured text is all you'll get from the existing libraries. You may have to resort to the PDFRenderer approach I suggested if you are prepared to spend time on this, or maybe there are commercial libraries if you prefer to spend money instead.
Ajay Njallacattu
Ranch Hand

Joined: Nov 21, 2006
Posts: 39
Hi,

Can you please let me know which are the commercial libraries which are avaliable? Will we be able to get an evaluation copy to test it to make sure that we will get the preffered output.

Regards

Ajay
Ajay Njallacattu
Ranch Hand

Joined: Nov 21, 2006
Posts: 39
Hi,

I have found a new API ICEPdf which is working perfectly for my requirements. This is giving a structured output as I need.

Thanks a lot for all the help and guidance.


Regards

Ajay Joseph
VenkatSri Sri
Greenhorn

Joined: Sep 02, 2010
Posts: 1
Hi Ajay,

I have same requirement, can you please send me code you used for this task.

Regards,
Sri
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Java API for PDF to text conversion