In my project, I have to identify all text contained within a pdf, and replace it with any other text. Actually I want the replacement text to be meaningful, but since that seems too tedious, I'm thinking of cutting down my project to just *any* text.
The format & styling ( including images ) of the output pdf should be preserved, and the text should not overflow over the images. I have considered PDF manipulation libraries from iText & Apache PDFBox so far.
In Apache PdfBox, there's a program called "ReplaceString", but it needs a specific "string to replace" and a specific "replacement string". The problem here is that since I need to replace all the words of the pdf with *any* text, so a single string replacement doesn't serve the purpose.
Here is the approached I have thought of:
Something which reads every word, counts the number of characters in the word, and replaces it on-the-spot with *any* same character count word. Maybe we can use a test condition for character count from 1 to 15.
My deadline is approaching, and I have not been able to do much because of being off track.
It would be great if someone could guide me as to how I should approach this, and if a similar work has been done in the past which I could use and build on.