But iText does use the existing Java API to do that. You could do the same thing if you wanted to spend the time and write an implementation of the PDF spec, but that's a non-trivial exercise. Even implementing just the part of the PDF spec that you need is non-trivial. And iText has the advantage that it has already been tested by a lot of people on a lot of different PDF files. It's a tremendous value considering that you pay absolutely nothing for it. That's a lot of "betters" in my opinion.
Here is a simple code to find the page number in PDF. But I am getting a exception. Anyone come across this? How do I resolve?
Exception: ava.io.IOException: PMBOK.pdf not found as file or resource. at com.lowagie.text.pdf.RandomAccessFileOrArray.<init>(Unknown Source) at com.lowagie.text.pdf.RandomAccessFileOrArray.<init>(Unknown Source) at com.lowagie.text.pdf.PRTokeniser.<init>(Unknown Source) at com.lowagie.text.pdf.PdfReader.<init>(Unknown Source) at com.lowagie.text.pdf.PdfReader.<init>(Unknown Source) at test.CreateIndexFiles.main(CreateIndexFiles.java:37)
Go through the usual procedure for any Exception: find where it occurred, using the line number (37) and see what you are doing there. Find out what is going wrong; where is the file you are supposed to be looking for? If the file isn't there, or if you are trying to read a write-only file, or if another application has opened it, so the file is unavailable, you will get an Exception.
You can try to use the -Xmx flag of the JVM; execute "java -X" to see more information. If that still will not allow you to open the file (which is quite understandable, the file is just so freakingly large) then you will need to find a different PDF library that does not store everything in memory but uses a more stream-like way of handling. Compare this to XML parsing using DOM (the entire document is stored in memory) versus SAX (only a small part is in memory at any given time, and events are fired for each part).
But maybe you should ask yourself why a PDF file needs to be 1.5GB. I have very few files on my system that are over 1GB. Most are CD/DVD images that I've ripped for backups / faster installation or movies. Surely nothing like a Word or PDF file; the largest PDF file I have is a 580 page installation guide that is just 5.5MB. Not even remotely as large as your PDF file.