File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes PDF Text Content extraction using iText5.0.5 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply locked New topic

PDF Text Content extraction using iText5.0.5

Divya Kambhatla

Joined: Jan 25, 2011
Posts: 13

I want to extract the text out of a PDF using iText5.0.5. The problem is when i extract text, all the text,including page numbers, figure titles, pae titles get extracted. I am completely new to the iText api. Could anyone please let me know if there is any method/interface in iText which could help extract ONLY the text content (or) atleast let me know how i could identify if the page numbers, page titles, figure titles also come under as page text?

Thanks in advance!
Paul Clapham

Joined: Oct 14, 2005
Posts: 19452

Please read this: CarefullyChooseOneForum. Your duplicate post is in a suitable forum so I have locked this one.
I agree. Here's the link:
subject: PDF Text Content extraction using iText5.0.5