File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other JSE/JEE APIs and the fly likes Reading PDF text with font styles Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Reading PDF text with font styles" Watch "Reading PDF text with font styles" New topic
Author

Reading PDF text with font styles

Sanjoo Singh
Ranch Hand

Joined: May 08, 2010
Posts: 33
I am working on a project, where I need to convert PDF to XML & XSLT.

I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.

I have tried using iText & PDFBox but not able to derive a solution.

Any help in this regard is highly appreciated.


Solution Spider
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42951
    
  72
iText and PDFBox can't do that.

The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.
Sanjoo Singh
Ranch Hand

Joined: May 08, 2010
Posts: 33
Thank for the help. I am going through it and will update you soon.
Sanjoo Singh
Ranch Hand

Joined: May 08, 2010
Posts: 33
The PDF-renderer project is pretty big and it is consuming considerable amount of time in analysis (cont..).

Is there any other light weight library (jar) available to achieve the same result in a short span of time?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42951
    
  72
No no-commercial ones.
Sanjoo Singh
Ranch Hand

Joined: May 08, 2010
Posts: 33
Thanks. I have got a light weight library and i.e. JPOD PDF library. This is a very small library and provides me everything what I wanted to extract from PDF
Lance Wellspring
Greenhorn

Joined: Feb 06, 2012
Posts: 1
I am trying to do the same thing. Could you share your code, or at least provide an example of how to get started?
Thanks for your time.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading PDF text with font styles