This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Java in General and the fly likes PDF Text Content extraction using iText5.0.5 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Reply locked New topic

PDF Text Content extraction using iText5.0.5

Divya Kambhatla

Joined: Jan 25, 2011
Posts: 13

I want to extract the text out of a PDF using iText5.0.5. The problem is when i extract text, all the text,including page numbers, figure titles, pae titles get extracted. I am completely new to the iText api. Could anyone please let me know if there is any method/interface in iText which could help extract ONLY the text content (or) atleast let me know how i could identify if the page numbers, page titles, figure titles also come under as page text?

Thanks in advance!
Paul Clapham

Joined: Oct 14, 2005
Posts: 18541

Please read this: CarefullyChooseOneForum. Your duplicate post is in a suitable forum so I have locked this one.
Don't get me started about those stupid light bulbs.
subject: PDF Text Content extraction using iText5.0.5
Similar Threads
Header content extraction from all pages of a pdf using pdfbox.
Extract only the PDF Page Text Content using iText5.0.5
Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF.
iText 5.0.5: spaces between words
Identify Header, trailer (footer) , Watermark and Body Sections for Existing PDF using iText5.0.5