aspose file tools*
The moose likes Other Open Source Projects and the fly likes Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF." Watch "Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF." New topic
Author

Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF.

Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi,

I want to extract the text from PDF files, some containing watermarks, some being 2 structured and some being a combinations of rows and columns of data.

I tried with PDFBox(0.7.3) but PDFBox was unable to extract the text from watermarked PDFs. I tried with iText too, but only the latest version of iText that is,5.0.5 can extract text from watermarked PDFs. This version again fails for 2 columned PDFs. It cannot extract data properly from 2 structured PDFs.

My next attempt was to check out the ICEPdf about which i came to know from some of the posts here.

For this, i downloaded the bundles provided in the ICEPdf Downloads site(http://www.icepdf.org/downloads.html) and tried generating the required jars (icepdf-core and icepdf-viewer) as described in the developers guide (http://www.icepdf.org/docs/v4_0_0/pdf/ICEpdf_4.0.0-Developers_Guide.pdf) with ANT version1.8 .

When i try generating the jars(icepdf-core.jar and icepdf-viewer.jar) with ANT 1.8 the build fails.

I guess these jars are not available online anywhere and have to be generated via an ANT script.

If anyone could let me know how to generate the icepdf-core.jar and icepdf-viewer.jar files (or) send them to me (or) point me to a link where these jars are available online, it would be of great help.

Thanks in advance,
Divya.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
You could tell us how, exactly, the build fails, along with information how you're trying to build and specific error information.

Or you could try a current version of PDFBox; 0.7.3 is something like 5 years old.

Ping & DNS - my free Android networking tools app
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi Ulf,

The developer guide (http://www.icepdf.org/docs/v4_0_0/pdf/ICEpdf_4.0.0-Developers_Guide.pdf) has specified the following:

Building ICEpdf from Source
This page last changed on Dec 10, 2009 by ken.fyten.
ICEpdf is bundled with a PDF viewer reference implementation. If you downloaded the binary distribution
of ICEpdf, the viewer application is available as prebuilt JAR file found in [install_dir]/icepdf/lib/
icepdf-viewer.jar.
If you have downloaded the source code distribution of ICEpdf, it is necessary to build the ICEpdf core
and viewer source code using the provided build scripts. The Ant help command can be used to see a list
of available build targets. To build the ICEpdf core and viewer JARs simply execute the command ant in
the [install_dir]/icepdf/ directory.
This command will build and copy the icepdf-core.jar and icepdf-viewer.jar to the [install_dir]/icepdf/
lib/ directory.

So, i went with the following steps:

1. Went to ICEPdf downloads site.
2. Downloaded the bundle 4.0.0 (ICEpdf-4.0.0-src.zip)
3. Unzipped it to a directory
4. Installed ANT 1.8 in my system
5. Checked installation of ANT - was successful
6. Went to the path where the bundle 4.0.0 (ICEpdf-4.0.0-src.zip) was extracted in command prompt.
7. executed command "ant" in this directory as per the developer guide.
8. As per developer guide, the 2 jars should be generated in the lib folder of extracted bundle but i get a build failed message.

The message i got when i ran the ANT script is as follows:

D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpdf-4.0.0-src\
icepdf>ant
Buildfile: D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpdf
-4.0.0-src\icepdf\build.xml

build.all:

update.product.info:
[copy] Copying 1 file to D:\Documents and Settings\KambhaDX\Desktop\INTCON\
ICEPdf\4.0.0\ICEpdf-4.0.0-src\icepdf\core\src\org\icepdf\core\application

compile:
[javac] D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpd
f-4.0.0-src\icepdf\examples\etc\build-common.xml:95: warning: 'includeantruntime
' was not set, defaulting to build.sysclasspath=last; set to false for repeatabl
e builds
[javac] Compiling 217 source files to D:\Documents and Settings\KambhaDX\Des
ktop\INTCON\ICEPdf\4.0.0\ICEpdf-4.0.0-src\icepdf\core\build\classes
[javac] javac: invalid target release: 1.5
[javac] Usage: javac <options> <source files>
[javac] where possible options include:
[javac] -g Generate all debugging info
[javac] -g:none Generate no debugging info
[javac] -g:{lines,vars,source} Generate only some debugging info
[javac] -nowarn Generate no warnings
[javac] -verbose Output messages about what the compiler
is doing
[javac] -deprecation Output source locations where deprecated
APIs are used
[javac] -classpath <path> Specify where to find user class files
[javac] -sourcepath <path> Specify where to find input source files

[javac] -bootclasspath <path> Override location of bootstrap class fil
es
[javac] -extdirs <dirs> Override location of installed extension
s
[javac] -d <directory> Specify where to place generated class f
iles
[javac] -encoding <encoding> Specify character encoding used by sourc
e files
[javac] -source <release> Provide source compatibility with specif
ied release
[javac] -target <release> Generate class files for specific VM ver
sion
[javac] -help Print a synopsis of standard options
[javac]

BUILD FAILED
D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpdf-4.0.0-src\
icepdf\build.xml:83: The following error occurred while executing this line:
D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpdf-4.0.0-src\
icepdf\core\build.xml:58: The following error occurred while executing this line
:
D:\Documents and Settings\KambhaDX\Desktop\INTCON\ICEPdf\4.0.0\ICEpdf-4.0.0-src\
icepdf\examples\etc\build-common.xml:95: Compile failed; see the compiler error
output for details.

Total time: 2 seconds



Thank You,
Divya.


Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42292
    
  64
For starters, I would download the binary bundles that are available on the page you linked to. Sounds simpler than downloading the source and building it yourself. :-)

As to the error, I'd check out the compile task in the build file; it seems that the -source and -target options for the compiler are out of whack. Apparently ICEpdf runs on Java 5, so a target option of "1.5" is reasonable. Maybe the source option is wrong.
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi Ulf,

What you said is right..i should have tried with the bin versions first. I could get the two jars(icepdf-core and icepdf-viewer.jar) when i used the bin bundle of 4.1.1(ICEpdf-4.1.1-bin.zip) and could run my program with ICEPdf .

But, i see that even ICEPdf is not being able to extract content from Watermarked PDFs. Only the watermark gets extracted.

Ulf, could you please suggest me some online tools or web services such that i could send that service my PDF and it would do the extraction for me. The service would then return the content to my application and i would use it for further processing.

Thank You,
Divya.
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
In addition to the above:

I had also tried finding about other APIs which have such facilities,like for eg: the CAM::PDF package provided by CPAN. This provides pdf text extraction facility but i am not sure how it would handle watermarked PDFs.So i could have a perl script doing the job, and i could program my application to invoke the script, do the extraction job and get the extracted content back to the application.

So if you could please provide me some insight about any web-service (or) online tools which could be accessed via application (or) other APIs / packages which provide such facilities and which could be invoked via java, it would be really helpful.

Thank You,
Divya.
Patrick Corless
Greenhorn

Joined: Jan 25, 2011
Posts: 1
Sorry to hear that you weren't able to build the binaries from source. From the and log it looks like you where trying to compile with a JDK < 1.5, you'll need JDK 1.5 or higher.

As for extracting text it is sometimes impossible to extract text from a PDF without using OCR. ICEpdf has a pretty active community at http://www.icefaces.org/JForum/forums/show/28.page . If you can post the file I can take a quick look at the file to let you now if there is extractable data in it or if it's just an image.
Martijn Verburg
author
Bartender

Joined: Jun 24, 2003
Posts: 3274
    
    5

Hi Patrick and welcome to Javaranch!


Cheers, Martijn - Blog,
Twitter, PCGen, Ikasan, My The Well-Grounded Java Developer book!,
My start-up.
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi Patrick,

I have sent a copy of the pdf to your email. i hope that is ok with you. i believe that there is indeed extractable data in the pdf because i am being able to extract the content using iText , except that it has a problem with 2 columned PDFs , the one i emailed you. Eagerly awaiting your expert comments.

Thanks in advance,
Divya.
Martijn Verburg
author
Bartender

Joined: Jun 24, 2003
Posts: 3274
    
    5

Hi Divya,

Please UseTheForumNotEmail.
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi,

i am sorry i am not being able to attach any pdfs here. i get a message saying "Files with the extension .pdf are not allowed as attachment in the message." Please let me know where i can post this pdf.?

Thank You,
Divya.
Divya Kambhatla
Greenhorn

Joined: Jan 25, 2011
Posts: 13
Hi,

As suggested, I have created a post for the same in icefaces and the post can be accessed at http://www.icefaces.org/JForum/posts/list/0/18525.page

Awaiting your expert comments.

Thank You,
Divya.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Creation of icepdf-core.jar and icepdf-viewer.jar from ICEPDF.