This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes TIFF. DOC, EXCEL to PDF Converter Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "TIFF. DOC, EXCEL to PDF Converter" Watch "TIFF. DOC, EXCEL to PDF Converter" New topic
Author

TIFF. DOC, EXCEL to PDF Converter

Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Hi,

I need to write an application that converts a TIFF image/Word document/Excel to a PDF document and stroes it in the database.
It should be possible to later create a single TIFF images from this PDF document.

Can anyone please suggest APIs I could use to realise the above?
Can all this be achieved using the Java Advanced Imaging API (JAI)?

Thanks in advance,
Anup
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Creating a PDF that contains nothing but an image is quite easy using the iText library; its web site has an example that shows how to do that.

Converting Excel files is not hard; the Apache POI library can be used for reading the Excel file, and then again the iText library can be used for creating PDFs that contain tables.

Word can be dealt with in a similar manner (POI also supports it), but it'll be quite a bit tricker, especially if the file contains tables and images, since the POI API for handling DOC/DOCX isn't as advanced as the one handling XLS/XLSX, and of course Word files have a less regular structure than Excel files.

JAI won't be of any help with this.

There are commercial packages available that can be used from Java applications; you may want to investigate those before embarking on writing your own, especially if you need to deal with complex documents - writing your own converter that handles those and generates good quality output could easily take a couple of weeks (or a month) of your time.


Ping & DNS - my free Android networking tools app
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Hi Ulf,

Thank you for the information.
I tried the example to convert a tif file into PDF.
However the generated PDF file contains only a part of the Tif file.
Are there any specific parameters that we need to take care of inroder to preserve the tif image information.

Following is the snippet of code:
import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

public class TiffToPDFConversion {

public static void main(String[] args) {

System.out.println("Images");
// step 1: creation of a document-object
Document document = new Document();
try {
// step 2:
// we create a writer that listens to the document
// and directs a PDF-stream to a file
PdfWriter.getInstance(document, new FileOutputStream("D:\\!Anup\\Project\\iText\\Temp\\Images.pdf"));
// step 3: we open the document
document.open();
// step 4:
document.add(new Paragraph("iText.tif"));
Image tiff = Image.getInstance("D:\\!Anup\\Project\\iText\\Temp\\iText.tif");
document.add(tiff);

} catch (DocumentException de) {

System.err.println(de.getMessage());

}
catch (IOException ioe) {

System.err.println(ioe.getMessage());

}
// step 5: we close the document
document.close();
}
}
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Is the page large enough to hold the image? If not you'll need to scale it down.

The image's getWidth, getHeight, getDpiX and getDpiY methods will be helpful in figuring this out.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
The Tif contains a logo on the upper left hand corener. Only this logo is written in the PDF. The complete data other than the logo is missing.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
On printing the values of the image following is what is shown:
System.out.println(tiff.getAbsoluteX());
System.out.println(tiff.getAbsoluteY());
System.out.println(tiff.getAlignment());
System.out.println(tiff.getDpiX());
System.out.println(tiff.getDpiY());
System.out.println(tiff.getHeight());
System.out.println(tiff.getWidth());
Output:
NaN
NaN
0
200
200
2309.0
1632.0

This does not make much sense to me. I feel that while reading the Tif, not the complete image is read but only a part of it which is being converted to PDF.
Is there a way to influence this reading?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Well, *is* the image 2309x1632 pixels large? If so, then it's probably being read correctly. It's possible that iText doesn't honor the DPI setting, in which case you'd need to do the scaling yourself - play around with the Image.scalePercent method to see if that helps.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Thanks. The scale percentage works. But how can I determine the right percentage so that is works for every tiff image?
Is it possible to determine it based on the some calculation from values fetched from the image itself?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Well, if iText really uses 72 DPI internally no matter what (and you should make sure that's what it's doing), then you can calculate how big the image would be since you know its pixel dimensions. You also know how big the page is (based on 72 DPI), so you can perform some math with that.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Thanks Ulf!

It seems that by default it is indeed 72 DPI for iText.
Checked the folloinwg site: http://www.mail-archive.com/itext-questions@lists.sourceforge.net/msg44706.html
Also on testing it I found out that using the scaling percentage as: 72/200 * 100 = 36, my Tif perfectlly fits in the PDF.
(200 being the DPI for my Tif image)
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
The image conversion worked as per your suggestions.'
I am however confused on how to do the same for .doc/.docx/.xls/.xlsx files.

In one of your posts you have mentioned:
"I'd probably create the PDF at the same as the XLS file, using the iText API. Or, if it's not feasible to do it at the same time, use POI to open it later, and then use iText to create the PDF.
"
http://www.coderanch.com/t/420976/Other-Open-Source-Projects/Java-API-convert-Excel-PDF


I could not locate any method in POI that reads the doc/x, xls/x files in one go and the output of whcih could be direclty fed to an iText method to get the PDF.
Data from doc or excel can be extracted part by part and fed to iText for PDF creation. However, the entire formatting is lost.

Is it actually possible to use POI with iText to convert doc/x, xls/x to PDF.??


Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
I could not locate any method in POI that reads the doc/x, xls/x files in one go and the output of whcih could be direclty fed to an iText method to get the PDF.

You're right, there's no such method - you'll have to code that yourself. For XLS/X document you'd use POI to read the cell contents and formatting, and create PDF tables as appropriate using the iText API. Same for DOC/X, except that the range of possible inputs in a text document is much wider (text, images, tables, ...) and consequently the code will be more complicated than for spreadsheets documents. My first post in this topic talks about this.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
This would really make the entire processing really heavy.
We would be receiivng thousands of documents in DOC/X, XLS/X formats to be converted to PDF.
Is there not any other API that I could use? OpenOffice/JODConverter etc?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
I don't understand what you mean by "really heavy".

JODConverter is certainly an option if you can require OO to be installed, and the resulting documents properly reflect the input documents.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Normally the formats of these documents being scanned (i.e if it is an excel file, then each excel file is not of the same structure) is not the same. Every document or excel can structurally differ.
So will using the POI-iTEXT combination be limited to certain structures only (for which I specifially code) or can it be done generally for all?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Since there's no ready-made solution using POI/iText, whatever solution you come up with will support exactly those features that you care to implement (which probably will be just those features that the documents you're dealing with are using).

If you're looking for general-purpose solutions (and OO/JODConverter doesn't cut it) then you're probably better off buying a commercial package.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Hi Could you direct me to some example where I could use POI and iText to convert Word/Excel to PDF preseerving the format of the intial document?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Both the iText and POI web sites have plenty of examples on how to use them; beyond that it's a matter of searching the javadocs for methods/classes that accomplish the rest. If you're serious about using iText I strongly recommend getting the book "iText in Action"; it'll save you a lot of time figuring out stuff.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
I tried to generate a PDF document from a WORD doc but I get the following Exception:
ExceptionConverter: java.io.IOException: No message found for the.document.has.no.pages
at com.itextpdf.text.pdf.PdfPages.writePageTree(PdfPages.java:113)
at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1171)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:780)
at com.itextpdf.text.Document.close(Document.java:409)
at com.abnamro.nl.scan.pdfconvert.process.MSWordToPDFConversion.convert(MSWordToPDFConversion.java:51)
at com.abnamro.nl.scan.pdfconvert.process.MSWordToPDFConversion.main(MSWordToPDFConversion.java:61)

Following is a snippet of the code:


What is causing this issue. I even set the PDF writer to accespt balnk pages but still this issue occurs.
The AFM file for the font being used in the DOC file is also present in the jar.




Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Please edit your post to UseCodeTags. It's unnecessarily hard to read the code as it is, making it less likely that people will bother to do so.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
Added code tags. I hope it is readable now. Any idea why the error is thrown?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41029
    
  43
Are any paragraphs being added? I take it there's no exception?

One thing I'd try is to use iText 2.1 instead of iText 5.
Anup Bansal
Ranch Hand

Joined: Sep 12, 2006
Posts: 69
This was a configuration issue. (Not exactly sure what specific configuration)
I re-created the workspace and ran the code. It worked.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: TIFF. DOC, EXCEL to PDF Converter
 
Similar Threads
Practical JavaScript, DOM Scripting and Ajax Projects
Java library for watermarking PDF, TIFF files?
help in printing
multi-page TIFF file in PDF
Converting Tiff to PDF