This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java API for PDF to text conversion

 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm looking for some free Java API which can help me to convert the pdf to a csv or text file. I need to extract the data from PDF.


Can anyone help?


Regards

Ajay
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The http://faq.javaranch.com/java/AccessingFileFormats page points to several libraries that can extract text from a PDF. It'll be unstructured text, though.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Thanks for the link....

I tried using PDFBox but its not giving the text in a structured manner. Other APIs are commercial. Please do let me know if there is any other open source API for this activity.


Regards

Ajay
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JPedal has an open source version as well as a commercial one. But its results will be unstructured text as well.

If you don't have a budget, then you'll need to substitute programming effort for it - check out PDFRenderer. It can display PDFs, so obviously it knows how to access information within them in a structured way. Since its open source, you can find out how it does that.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have tried with 2-3 API where I'm getting only a scattered test. My PDF is having only tables. I can manupulate the PDFs if needed before its creation, if we can have any functionality in java to get an ordered csv or text format out of the pdf.


Please suggest.

Regards

Ajay
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As I said in both previous posts, unstructured text is all you'll get from the existing libraries. You may have to resort to the PDFRenderer approach I suggested if you are prepared to spend time on this, or maybe there are commercial libraries if you prefer to spend money instead.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Can you please let me know which are the commercial libraries which are avaliable? Will we be able to get an evaluation copy to test it to make sure that we will get the preffered output.

Regards

Ajay
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have found a new API ICEPdf which is working perfectly for my requirements. This is giving a structured output as I need.

Thanks a lot for all the help and guidance.


Regards

Ajay Joseph
 
VenkatSri Sri
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ajay,

I have same requirement, can you please send me code you used for this task.

Regards,
Sri
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic