• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to parse PDF File

 
Priya Khamitkar
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to create one component which takes pdf or word file as a input and create xml file as a output, i want to know which java API can useful to do this.
 
Manoj Maniraj
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PDFBox may help you.
 
Deepak Bala
Bartender
Posts: 6663
5
Firefox Browser Linux MyEclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Manoj Maniraj wrote:PDFBox may help you.


The project looks very promising and it is in its early stages. I know of commercial products that can do this, but are you looking to do something specific with this XML ? Is text extraction from PDF ok ? PDFBox seems to support that. Or do you need some sort of meaningful hierarchical data ?
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15216
36
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are other APIs for working with PDF files. iText is another well-known Java PDF library.
 
Ravi Vakka
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for
 
Deepak Bala
Bartender
Posts: 6663
5
Firefox Browser Linux MyEclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ravi Vakka wrote:I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for


To my knowledge PDF -> XML is not possible with FOP
 
Maneesh Godbole
Saloon Keeper
Posts: 11021
12
Android Eclipse IDE Google Web Toolkit Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to a more appropriate forum
 
Priya Khamitkar
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have used PDFBox.And my program is working fine.
Thanks to everyone
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic