aspose file tools*
The moose likes Other JSE/JEE APIs and the fly likes Reading PDF text with font styles Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Reading PDF text with font styles" Watch "Reading PDF text with font styles" New topic
Author

Reading PDF text with font styles

Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 28
I am working on a project, where I need to convert PDF to XML & XSLT.

I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.

I have tried using iText & PDFBox but not able to derive a solution.

Any help in this regard is highly appreciated.


Solution Spider
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39549
    
  27
iText and PDFBox can't do that.

The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.


Ping & DNS - updated with new look and Ping home screen widget
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 28
Thank for the help. I am going through it and will update you soon.
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 28
The PDF-renderer project is pretty big and it is consuming considerable amount of time in analysis (cont..).

Is there any other light weight library (jar) available to achieve the same result in a short span of time?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39549
    
  27
No no-commercial ones.
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 28
Thanks. I have got a light weight library and i.e. JPOD PDF library. This is a very small library and provides me everything what I wanted to extract from PDF
Lance Wellspring
Greenhorn

Joined: Feb 06, 2012
Posts: 1
I am trying to do the same thing. Could you share your code, or at least provide an example of how to get started?
Thanks for your time.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading PDF text with font styles
 
Similar Threads
How can I use OCR Font A type by the time of writing some text into Pdf file
Use custom font and also make that font available for other applications
CSS ????
Read images in PDF document
how to change the font-family in XSLT file