File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes PDF Text Content extraction using iText5.0.5 Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


JavaRanch » Java Forums » Java » Java in General
Reply locked New topic
Author

PDF Text Content extraction using iText5.0.5

Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi,

I want to extract the text out of a PDF using iText5.0.5. The problem is when i extract text, all the text,including page numbers, figure titles, pae titles get extracted. I am completely new to the iText api. Could anyone please let me know if there is any method/interface in iText which could help extract ONLY the text content (or) atleast let me know how i could identify if the page numbers, page titles, figure titles also come under as page text?

Thanks in advance!
Divya.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 16487
    
    2

Please read this: CarefullyChooseOneForum. Your duplicate post is in a suitable forum so I have locked this one.
 
I agree. Here's the link: http://zeroturnaround.com/jrebel - it saves me about five hours per week
 
subject: PDF Text Content extraction using iText5.0.5
 
Similar Threads
Identify Header, trailer (footer) , Watermark and Body Sections for Existing PDF using iText5.0.5
Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF.
iText 5.0.5: spaces between words
Header content extraction from all pages of a pdf using pdfbox.
Extract only the PDF Page Text Content using iText5.0.5