This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Is it possible to create 'grouped' highlights using PDFBox?
I am trying to use PDFBox to highlight text within a document. I have extracted the text and have obtained the coordinates, but am having trouble creating annotations. PDFBox annotations (for example: PDAnnotationTextMarkup) only seem to accept one quad and/or rectangle for the highlight area. This means that 2 lines of text cannot be highlighted without creating one large highlight that surrounds both lines. Acrobat itself creates multiple boxes when highlighting multiple lines, and they all are part of the one annotation, so they only have one popup, and will be deleted as a group. Is there a way to replicate this using PDFBox?
The following is a test I did in which I used Adobe Acrobat Reader to make a single annotation which is comprised of two highlights (last line of one paragraph and first line of the other). It extracts each individual highlight, deletes all highlights on the page, then reintroduces them.
The System.print output shows two PDAnnotations for the my one annotation (comprised of two highlight boxes) in Acrobat.
So it seems to treat each highlighted area as a separate PDAnnotation. However, when each PDAnnotation is inserted back into the annotation array separately, they are still grouped together when the document is opened. So they are retaining some kind of connection. My current task is to create two separate annotations in Adobe Acrobat Reader and to try and group them together. I have had no luck over several hours.
Of course, if I have gone off on a tangent and a simpler solution exists, I would love to hear it. But as usual, hope of a clean solution is fading
Joined: Oct 26, 2013
So it seems I was misinterpreting the data.
The popup box for each annotation is listed as a separate annotation. So the two annotations I in my output were actually 1 highlight and 1 popup box. So when extracting the PDAnnotations from a page I need to find out what sub-class it is and then I can go from there.
And as far as multiple quad points. I received a prompt reply from Gilard D. who stated that all quad points are bundled into the one quad. I mislead myself by cramming two quads into a float array but got a very messed up shape drawn on the page, so I assumed it was just ignoring the higher indexes. However, I did test Gilard's information by extracting an existing annotation and displaying the quad points. It showed a float of length 28 or so. Therefore it looks very plausible. Now I just have to work out how each point is arranged in the quad and then I can get back on track.