• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Extract only the PDF Page Text Content using iText5.0.5

 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I want to extract the text out of a PDF using iText5.0.5. The problem is when i extract text, all the text,including page numbers, figure titles, pae titles get extracted. I am completely new to the iText api. Could anyone please let me know if there is any method/interface in iText which could help extract ONLY the text content (or) atleast let me know if the page numbers, page titles, figure titles also come under as page text?

Thanks in advance!
Divya.
 
Divya Kambhatla
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I addition to the above, could anyone please help me know how the Pdf page could be split into the header, footer and its trailer. When i analysed the iText source code , i came across the above info and also a PdfBody class. But i am not understanding how exactly i could go about creating a PdfBody and extract the text content out of it.

Thank You,
Divya.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic