This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes PDFBox: pdf's markup, how-to extract the pdf markup... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "PDFBox: pdf Watch "PDFBox: pdf New topic
Author

PDFBox: pdf's markup, how-to extract the pdf markup...

Jim Harrison
Greenhorn

Joined: Mar 16, 2007
Posts: 29
Hello,

I've read alot on http://pdfbox.apache.org but can't find an example or if the tool actually does this.

The pdf file that I'm reading has superscripts. I wanted to get the text and markup content of a pdf file. So a couple of questions:

1. can PDFBox do this? I see on their website the ExtractText (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) but that just displays the text aspect of the pdf.

2. does any one have an example of doing this?

Thanks...Jim
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41017
    
  43
No, PDFBox has no notion of extracting layout information.

You could check out at the source code of https://pdf-renderer.dev.java.net/, which can display PDFs, so it must have a way of accessing the layout data.


Ping & DNS - my free Android networking tools app
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: PDFBox: pdf's markup, how-to extract the pdf markup...
 
Similar Threads
Convert PDF to Image by specifying page range using JPedal
how is the quality of the Lucene ports
pdf to text
Pdf generation from html
PdfBox, do you have to save the .pdf to a file?