aspose file tools*
The moose likes Other Open Source Projects and the fly likes How to parse PDF File Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "How to parse PDF File " Watch "How to parse PDF File " New topic
Author

How to parse PDF File

Priya Khamitkar
Greenhorn

Joined: Aug 11, 2009
Posts: 4
I want to create one component which takes pdf or word file as a input and create xml file as a output, i want to know which java API can useful to do this.
Manoj Maniraj
Ranch Hand

Joined: Mar 25, 2009
Posts: 38
PDFBox may help you.


http://manojmaniraj.blogspot.com
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6662
    
    5

Manoj Maniraj wrote:PDFBox may help you.


The project looks very promising and it is in its early stages. I know of commercial products that can do this, but are you looking to do something specific with this XML ? Is text extraction from PDF ok ? PDFBox seems to support that. Or do you need some sort of meaningful hierarchical data ?


SCJP 6 articles - SCJP 5/6 mock exams - More SCJP Mocks
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14420
    
  23

There are other APIs for working with PDF files. iText is another well-known Java PDF library.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Ravi Vakka
Greenhorn

Joined: Nov 06, 2006
Posts: 6
I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for


Thanks & Regards
Ravi Vakka
SCJP,SCJD,SCWCD
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6662
    
    5

Ravi Vakka wrote:I remember we used Apache FOP to convert XML files to PDF and PDF files to XML in our earlier project.
More info can be found at :
http://xmlgraphics.apache.org/fop/

Let me know if this is the one you are looking for


To my knowledge PDF -> XML is not possible with FOP
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10532
    
    9

Moving to a more appropriate forum


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Priya Khamitkar
Greenhorn

Joined: Aug 11, 2009
Posts: 4
I have used PDFBox.And my program is working fine.
Thanks to everyone
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to parse PDF File