This week's book giveaway is in the Agile and other Processes forum.
We're giving away four copies of The Mikado Method and have Ola Ellnestam and Daniel Brolund on-line!
See this thread for details.
The moose likes Other JSE/JEE APIs and the fly likes Reading PDF text with font styles Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Reply Bookmark "Reading PDF text with font styles" Watch "Reading PDF text with font styles" New topic
Author

Reading PDF text with font styles

Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 18
I am working on a project, where I need to convert PDF to XML & XSLT.

I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.

I have tried using iText & PDFBox but not able to derive a solution.

Any help in this regard is highly appreciated.


Solution Spider
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 35241
    
    7
iText and PDFBox can't do that.

The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.


Android appsImageJ pluginsJava web charts
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 18
Thank for the help. I am going through it and will update you soon.
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 18
The PDF-renderer project is pretty big and it is consuming considerable amount of time in analysis (cont..).

Is there any other light weight library (jar) available to achieve the same result in a short span of time?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 35241
    
    7
No no-commercial ones.
Lokesh Tank
Greenhorn

Joined: May 08, 2010
Posts: 18
Thanks. I have got a light weight library and i.e. JPOD PDF library. This is a very small library and provides me everything what I wanted to extract from PDF
Lance Wellspring
Greenhorn

Joined: Feb 06, 2012
Posts: 1
I am trying to do the same thing. Could you share your code, or at least provide an example of how to get started?
Thanks for your time.
 
 
subject: Reading PDF text with font styles
 
Similar Threads
CSS ????
how to change the font-family in XSLT file
Use custom font and also make that font available for other applications
Read images in PDF document
How can I use OCR Font A type by the time of writing some text into Pdf file