GeeCON Prague 2014*
The moose likes Java in General and the fly likes PDFBox: pdf's markup, how-to extract the pdf markup... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "PDFBox: pdf Watch "PDFBox: pdf New topic
Author

PDFBox: pdf's markup, how-to extract the pdf markup...

Jim Harrison
Ranch Hand

Joined: Mar 16, 2007
Posts: 30
Hello,

I've read alot on http://pdfbox.apache.org but can't find an example or if the tool actually does this.

The pdf file that I'm reading has superscripts. I wanted to get the text and markup content of a pdf file. So a couple of questions:

1. can PDFBox do this? I see on their website the ExtractText (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) but that just displays the text aspect of the pdf.

2. does any one have an example of doing this?

Thanks...Jim
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42035
    
  64
No, PDFBox has no notion of extracting layout information.

You could check out at the source code of https://pdf-renderer.dev.java.net/, which can display PDFs, so it must have a way of accessing the layout data.


Ping & DNS - my free Android networking tools app
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: PDFBox: pdf's markup, how-to extract the pdf markup...