• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Java API for PDF to text conversion

 
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I'm looking for some free Java API which can help me to convert the pdf to a csv or text file. I need to extract the data from PDF.


Can anyone help?


Regards

Ajay
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The http://faq.javaranch.com/java/AccessingFileFormats page points to several libraries that can extract text from a PDF. It'll be unstructured text, though.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Thanks for the link....

I tried using PDFBox but its not giving the text in a structured manner. Other APIs are commercial. Please do let me know if there is any other open source API for this activity.


Regards

Ajay
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
JPedal has an open source version as well as a commercial one. But its results will be unstructured text as well.

If you don't have a budget, then you'll need to substitute programming effort for it - check out PDFRenderer. It can display PDFs, so obviously it knows how to access information within them in a structured way. Since its open source, you can find out how it does that.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have tried with 2-3 API where I'm getting only a scattered test. My PDF is having only tables. I can manupulate the PDFs if needed before its creation, if we can have any functionality in java to get an ordered csv or text format out of the pdf.


Please suggest.

Regards

Ajay
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As I said in both previous posts, unstructured text is all you'll get from the existing libraries. You may have to resort to the PDFRenderer approach I suggested if you are prepared to spend time on this, or maybe there are commercial libraries if you prefer to spend money instead.
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Can you please let me know which are the commercial libraries which are avaliable? Will we be able to get an evaluation copy to test it to make sure that we will get the preffered output.

Regards

Ajay
 
Ajay Njallacattu
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have found a new API ICEPdf which is working perfectly for my requirements. This is giving a structured output as I need.

Thanks a lot for all the help and guidance.


Regards

Ajay Joseph
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ajay,

I have same requirement, can you please send me code you used for this task.

Regards,
Sri
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic