I want to create one component which takes pdf or word file as a input and create xml file as a output, i want to know which java API can useful to do this.
The project looks very promising and it is in its early stages. I know of commercial products that can do this, but are you looking to do something specific with this XML ? Is text extraction from PDF ok ? PDFBox seems to support that. Or do you need some sort of meaningful hierarchical data ?
I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/
Let me know if this is the one you are looking for
Ravi Vakka wrote:I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/
Let me know if this is the one you are looking for
To my knowledge PDF -> XML is not possible with FOP