aspose file tools*
The moose likes I/O and Streams and the fly likes PDF writing woes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "PDF writing woes" Watch "PDF writing woes" New topic
Author

PDF writing woes

Dave T Taylor
Greenhorn

Joined: Mar 30, 2006
Posts: 4
Hi, I'm trying to write a simple program that will read and write PDF documents - however, I'm having a few problems with the code.

My program seems to read the document fine, but will not write it out again properly. Comparing the two files side by side before and after writing, it appears there's a small number of control characters missing from the output file. Any clues as to why this is happening?


import java.io.*;
import java.awt.*;
import java.applet.*;

public class readFile
{
public static void main(String[] args)
{
try
{
String line;

// open the input stream
FileInputStream fis = new FileInputStream("c:/mypdf.pdf");
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream data = new DataInputStream(bis);

// open the output stream
FileOutputStream fos = new FileOutputStream("c:/mynewpdf.pdf");
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos);

System.out.println("Reading data...");

// read the input file
while ((line = data.readLine()) != null)
{
// write the output file
dos.writeBytes(line);
}

System.out.println("OK, done...");

}
catch (Exception e)
{
System.err.println(e);
}
}
}


Any help is much appreciated!!

Many thanks,

Dave.
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8997
    
    9

Welcome to the JavaRanch, Dave.
Have a look at the Java documentation and you'll see this:

public final String readLine() throws IOException

Deprecated. This method does not properly convert bytes to characters.

Java API Documentation - java.io.DataInputStream

What's more is you can't treat a binary file like a PDF like a plain text file. A PDF file doesn't have "lines". It has some text data, but it also contains a ton of other binary data to describe what to do with that text. If you try to read the binary data in as text, Java tries to make it conform to a Unicode character set. Since the binary values can be outside the range of a particular character set, you'll lose information.


[How To Ask Questions On JavaRanch]
Dave T Taylor
Greenhorn

Joined: Mar 30, 2006
Posts: 4
Thanks, Joe.

Yes, I've noticed that myself now. I've now converted the program to read the files on a character by character basis, and while it's converting a lot more of the characters properly, there's still certain ones that are getting changed.

I'm having to go through my output files with a hex editor and fine tooth-comb to find exactly where it's going wrong.

Thanks for the help.

Dave
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18991
    
    8

You just want to copy the file from one place to another? Then do not read the files one character at a time. What Joe said (about binary data versus Unicode characters) still applies no matter how many characters at a time you read. To copy any file, PDF or otherwise, just read bytes (not characters) from the input and write them to the output.
Dave T Taylor
Greenhorn

Joined: Mar 30, 2006
Posts: 4
Yes, thanks. That's what I'm doing now!


Dave
Jason Moors
Ranch Hand

Joined: Dec 04, 2001
Posts: 188
Hi Dave,

There is an open source library called iText which enables you to create, manipulate and also copy PDF files. It maybe overkill for what you are trying to perform, but it's worth knowing about as it enables you to copy only certain pages etc.

http://www.lowagie.com/iText/
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: PDF writing woes