wood burning stoves 2.0*
The moose likes Other JSE/JEE APIs and the fly likes Reading PDF text with font styles Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Reading PDF text with font styles" Watch "Reading PDF text with font styles" New topic
Author

Reading PDF text with font styles

Lokesh Tank
Ranch Hand

Joined: May 08, 2010
Posts: 32
I am working on a project, where I need to convert PDF to XML & XSLT.

I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.

I have tried using iText & PDFBox but not able to derive a solution.

Any help in this regard is highly appreciated.


Solution Spider
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41601
    
  55
iText and PDFBox can't do that.

The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.


Ping & DNS - my free Android networking tools app
Lokesh Tank
Ranch Hand

Joined: May 08, 2010
Posts: 32
Thank for the help. I am going through it and will update you soon.
Lokesh Tank
Ranch Hand

Joined: May 08, 2010
Posts: 32
The PDF-renderer project is pretty big and it is consuming considerable amount of time in analysis (cont..).

Is there any other light weight library (jar) available to achieve the same result in a short span of time?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41601
    
  55
No no-commercial ones.
Lokesh Tank
Ranch Hand

Joined: May 08, 2010
Posts: 32
Thanks. I have got a light weight library and i.e. JPOD PDF library. This is a very small library and provides me everything what I wanted to extract from PDF
Lance Wellspring
Greenhorn

Joined: Feb 06, 2012
Posts: 1
I am trying to do the same thing. Could you share your code, or at least provide an example of how to get started?
Thanks for your time.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading PDF text with font styles