I am working on a project, where I need to convert PDF to XML & XSLT.
I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.
I have tried using iText & PDFBox but not able to derive a solution.
The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.