File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes I/O and Streams and the fly likes PDF writing woes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "PDF writing woes" Watch "PDF writing woes" New topic

PDF writing woes

Dave T Taylor

Joined: Mar 30, 2006
Posts: 4
Hi, I'm trying to write a simple program that will read and write PDF documents - however, I'm having a few problems with the code.

My program seems to read the document fine, but will not write it out again properly. Comparing the two files side by side before and after writing, it appears there's a small number of control characters missing from the output file. Any clues as to why this is happening?

import java.awt.*;
import java.applet.*;

public class readFile
public static void main(String[] args)
String line;

// open the input stream
FileInputStream fis = new FileInputStream("c:/mypdf.pdf");
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream data = new DataInputStream(bis);

// open the output stream
FileOutputStream fos = new FileOutputStream("c:/mynewpdf.pdf");
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos);

System.out.println("Reading data...");

// read the input file
while ((line = data.readLine()) != null)
// write the output file

System.out.println("OK, done...");

catch (Exception e)

Any help is much appreciated!!

Many thanks,

Joe Ess

Joined: Oct 29, 2001
Posts: 9189

Welcome to the JavaRanch, Dave.
Have a look at the Java documentation and you'll see this:

public final String readLine() throws IOException

Deprecated. This method does not properly convert bytes to characters.

Java API Documentation -

What's more is you can't treat a binary file like a PDF like a plain text file. A PDF file doesn't have "lines". It has some text data, but it also contains a ton of other binary data to describe what to do with that text. If you try to read the binary data in as text, Java tries to make it conform to a Unicode character set. Since the binary values can be outside the range of a particular character set, you'll lose information.

[How To Ask Questions On JavaRanch]
Dave T Taylor

Joined: Mar 30, 2006
Posts: 4
Thanks, Joe.

Yes, I've noticed that myself now. I've now converted the program to read the files on a character by character basis, and while it's converting a lot more of the characters properly, there's still certain ones that are getting changed.

I'm having to go through my output files with a hex editor and fine tooth-comb to find exactly where it's going wrong.

Thanks for the help.

Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

You just want to copy the file from one place to another? Then do not read the files one character at a time. What Joe said (about binary data versus Unicode characters) still applies no matter how many characters at a time you read. To copy any file, PDF or otherwise, just read bytes (not characters) from the input and write them to the output.
Dave T Taylor

Joined: Mar 30, 2006
Posts: 4
Yes, thanks. That's what I'm doing now!

Jason Moors
Ranch Hand

Joined: Dec 04, 2001
Posts: 188
Hi Dave,

There is an open source library called iText which enables you to create, manipulate and also copy PDF files. It maybe overkill for what you are trying to perform, but it's worth knowing about as it enables you to copy only certain pages etc.
I agree. Here's the link:
subject: PDF writing woes
jQuery in Action, 3rd edition