aspose file tools*
The moose likes Other Open Source Projects and the fly likes How to Read PDF files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "How to Read PDF files" Watch "How to Read PDF files" New topic
Author

How to Read PDF files

Rakesh Jhamb
Ranch Hand

Joined: Jun 18, 2003
Posts: 154
Hello All

My java application wants to read pdf files and put some of pdf details in database.Can any one suggest me which one is the best free framework/tool for this task?

Thanks
[ November 12, 2008: Message edited by: Veenu Kumar ]

SCJP2, SCWCD
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1280

Hi Veenu,

iText is a very good and easy to use library for building and manipulating PDFs. Unfortunately I can't tell you exactly how to read PDFs because I've only used iText to create new PDF files but the documentation will surely give you more details on this subject

Marco
Rakesh Jhamb
Ranch Hand

Joined: Jun 18, 2003
Posts: 154
Thanks for your comments Macro.

I have already did some reasearch on IText and its a very good tool for creating PDF files,but i have not found anything in IText that is helpful in reading particlular data from PDF files.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41874
    
  63
iText can perform some operations on PDFs -"manipulating" as Marco says is the right word, I think-, but not read them in a general sense. It can't help you get at the text or formatting information contained in it.

What information, exactly, are you trying to extract from the PDF? If it's the text, check out the JPedal library. If it's the formatting information, you're very likely out of luck.


Ping & DNS - my free Android networking tools app
Rakesh Jhamb
Ranch Hand

Joined: Jun 18, 2003
Posts: 154
I want to read date,account number etc information from PDF file
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41874
    
  63
If those are in the document attributes, then I think iText can help you do that.

If those are in the document content -remember that we have no idea how your documents look like- then check out JPedal. But that'll give you a single stream of text, not some nicely structured content. Maybe you can use string search or regexps to extract the text parts you're interested in.
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1280

Hi Veenu,

is it inevitable for you to have PDF files as input for your application or is it maybe easier to switch to any other input format?

As Ulf said it won't be easy to find the information in the content of a PDF if you can't find a library which gives you the possibility to analyze the structure of a PDF file. Even then things get worse if you have PDFs with a different structure or if this structure changes. Simple pattern matching is of course an option but you should probably first think about if it's possible to find exactly what you want in a entire document.

Marco
Rakesh Jhamb
Ranch Hand

Joined: Jun 18, 2003
Posts: 154
Thanks Macro,

Yes, user will upload the pdf files only and format of PDF files is fixed. but i don't know how to read a particular information from the pdf file.
P
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41874
    
  63
I'd suggest that you start by checking out the text extraction capability of the JPedal library, so that you get a feeling for what it can do for you. Based on that you may decide that it is (or is not) sufficient.
jose raja
Greenhorn

Joined: Feb 08, 2009
Posts: 5
Hi,

I need to write print preview function(java), which means pdf generation using iText or anyother library(ultimate goal is to generate pdf files).

I am using the environment (java+struts+Eclipse).

Please anyone help me.

Thanks,
joe
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41874
    
  63
jose,
please start a new topic instead of hijacking an existing one. Extracting text from a PDF really has nothing to do with creating a print preview.
jose raja
Greenhorn

Joined: Feb 08, 2009
Posts: 5
[ UD: As I said, please start a new topic instead of hijacking an existing one. ]
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: How to Read PDF files