Text is just that - text. It does not include formatting or layout information. It is notoriously hard to extract that information from PDFs; I'm not aware of any free tool that can do that. If you can spend lots of time on this, check out the PDF-Renderer project. It can render PDFs in Swing, so obviously it has code that knows how to handle layout and styling.
It sounds as if what you actually is to convert the PDF to some other file format?
roshan sinha wrote:i extracted text from pdf using pdf box......
but the format of text and alignment and format of text is not there in the extracted text.
How to extract the text from pdf in same formt and alignment ?
May be Apache Tika is well and one of the solution and more ever PDFBox is embedded in tika.
Joined: Mar 22, 2005
Sudheer- As I pointed out to you elsewhere, Apache Tika does nothing with respect to text extraction for PDFs beyond what PDFBox does. Please don't confuse others by suggesting that it can do things that it can't do.