wood burning stoves 2.0*
The moose likes Other Open Source Projects and the fly likes How to parse PDF File Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "How to parse PDF File " Watch "How to parse PDF File " New topic
Author

How to parse PDF File

Priya Khamitkar
Greenhorn

Joined: Aug 11, 2009
Posts: 4
I want to create one component which takes pdf or word file as a input and create xml file as a output, i want to know which java API can useful to do this.
Manoj Maniraj
Ranch Hand

Joined: Mar 25, 2009
Posts: 38
PDFBox may help you.


http://manojmaniraj.blogspot.com
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6661
    
    5

Manoj Maniraj wrote:PDFBox may help you.


The project looks very promising and it is in its early stages. I know of commercial products that can do this, but are you looking to do something specific with this XML ? Is text extraction from PDF ok ? PDFBox seems to support that. Or do you need some sort of meaningful hierarchical data ?


SCJP 6 articles - SCJP 5/6 mock exams - More SCJP Mocks
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14074
    
  16

There are other APIs for working with PDF files. iText is another well-known Java PDF library.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Ravi Vakka
Greenhorn

Joined: Nov 06, 2006
Posts: 6
I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for


Thanks & Regards
Ravi Vakka
SCJP,SCJD,SCWCD
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6661
    
    5

Ravi Vakka wrote:I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for


To my knowledge PDF -> XML is not possible with FOP
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10170
    
    8

Moving to a more appropriate forum


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Priya Khamitkar
Greenhorn

Joined: Aug 11, 2009
Posts: 4
I have used PDFBox.And my program is working fine.
Thanks to everyone
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to parse PDF File
 
Similar Threads
how to create checkbox in pdf using FDFDoc
Publishing directory in Websphere App Server
I want to create PDF file from HTML file
where to use jar files in a jsp project
pdf file