wood burning stoves 2.0*
The moose likes Java in General and the fly likes PDFBox: pdf's markup, how-to extract the pdf markup... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "PDFBox: pdf Watch "PDFBox: pdf New topic

PDFBox: pdf's markup, how-to extract the pdf markup...

Jim Harrison

Joined: Mar 16, 2007
Posts: 29

I've read alot on http://pdfbox.apache.org but can't find an example or if the tool actually does this.

The pdf file that I'm reading has superscripts. I wanted to get the text and markup content of a pdf file. So a couple of questions:

1. can PDFBox do this? I see on their website the ExtractText (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) but that just displays the text aspect of the pdf.

2. does any one have an example of doing this?

Ulf Dittmer

Joined: Mar 22, 2005
Posts: 41109
No, PDFBox has no notion of extracting layout information.

You could check out at the source code of https://pdf-renderer.dev.java.net/, which can display PDFs, so it must have a way of accessing the layout data.

Ping & DNS - my free Android networking tools app
I agree. Here's the link: http://aspose.com/file-tools
subject: PDFBox: pdf's markup, how-to extract the pdf markup...
Similar Threads
PdfBox, do you have to save the .pdf to a file?
Convert PDF to Image by specifying page range using JPedal
how is the quality of the Lucene ports
pdf to text
Pdf generation from html