• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

How to parse PDF File

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to create one component which takes pdf or word file as a input and create xml file as a output, i want to know which java API can useful to do this.
 
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
PDFBox may help you.
 
Bartender
Posts: 6663
5
MyEclipse IDE Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Manoj Maniraj wrote:PDFBox may help you.



The project looks very promising and it is in its early stages. I know of commercial products that can do this, but are you looking to do something specific with this XML ? Is text extraction from PDF ok ? PDFBox seems to support that. Or do you need some sort of meaningful hierarchical data ?
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are other APIs for working with PDF files. iText is another well-known Java PDF library.
 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for
 
Deepak Bala
Bartender
Posts: 6663
5
MyEclipse IDE Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ravi Vakka wrote:I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for



To my knowledge PDF -> XML is not possible with FOP
 
Bartender
Posts: 11445
19
Android Google Web Toolkit Mac Eclipse IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Moving to a more appropriate forum
 
Priya Khamitkar
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have used PDFBox.And my program is working fine.
Thanks to everyone
 
reply
    Bookmark Topic Watch Topic
  • New Topic