This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I have read the I/O chapters in both Herb Schildt's, Java, A Beginner's Guide and Kathy Sierra's, Head First Java books and I'm still having trouble understanding how to do file I/O in Java. I'm hoping someone in this forum will help me get started.
But before I get into that, I should explain that part of my problem is that I come from a host Cobol/Assembler background, and that I'm trying to make the transition into a distributed Java world. Anyway, on the host, file I/O is a fundamental part of any data processing program. A file is composed of sets of data called records, which are broken into fields. When a record is read, it is copied into a structure containing a breakout of all those fields. You can think of records as rows and fields as columns or, in Java terms, primitives. Things like record length and buffering are handled by the operating system, and there is no need for new-line characters to separate the records. The fields, or if you prefer, the primitives, in a file are usually in several different formats including binaries such as integer and double.
I've read that to process binary data in Java, I need to use byte streams and that I need to specify the buffer size in the program. Okay fine, I can do that. But how do I read in an entire record using byte I/O and copy it into a structure containing all the various field formats?
Suppose I have a record layout as shown below. I would really appreciate it someone would post a reply containing an outline of the Java methods needed to read in such a record. Note the example shown below is a simplistic one. Most real world master files would have many more fields, and can be in the two gigabyte range and up in terms of overall size.
Sample record layout: Cust Name 30 bytes (ASCII character format) Cust Number (integer) Last year sales volume, units (integer) Last year sales volume, dollars (double) YTD sales volume, units (integer) YTD sales volume, dollars (double) Rolling monthly sales table, units (INT) Rolling monthly sales table, dollars (DOUBLE)
I would like to read in the above record for each customer, update the fields for the current month, and write it out to a new file. Basic data processing kind of stuff. But I need help putting all the Java I/O methods together to read in the file.
You can create a class that implements the Serializable interface, which will allow instances of that class to be saved easily to disk. For example:
Will enable any instances of Customer to be written to an output stream. Then to save any instances of this to disk you just need to set up a FileOutputStream (which tells you where the data will be saved) and an ObjectOutputStream (which tells you what type of data we are dealing with - in this case, Java objects):
Note I have left out exception handling from this example so as not to confuse matters. Also many people don't like using serialization for lots of data, but its a good place to start IMHO.
Serialization of objects is a neat approach. I haven't done much with serialization and almost never think of it first.
For some of the building blocks of IO you might play plain text files a bit more. Making everything the equivalent of COBOL DISPLAY format is easiest. One option is some kind of delimiter between fields and newlines between records. That's like a mainframe variable length record, and also displays nicely if you just type or edit the file. You can use a buffered reader and do something like:
Another option is Random Access Files. These work best with fixed length records. Now you can use binary formats instead of "display" and jump around the file. You'll need to pad your string fields just as COBOL would and figure out a format for your binary data. Your logic turns out more like:
For fun you could make an index in parallel to your data file with keys and offsets and invent VSAM all over again. (I'd use a database instead!)
Serialization is a tad trickier to grasp (for me anyhow) but it gets real easy to work with:
Do you have some specific projects in mind, or just exploring? Let us know what you try and how it works!
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Mar 27, 2005
Thank you Stuart and Stan for replying to my post. I ran a little test using serialization and I was able to read and write records, or I guess I should say objects, to a file. However, apparently, serialization is only for Java programs reading Java produced files because Java inserted some addition characters into the file. The source of the master file is on the mainframe host, so it won't be in the serialization format that Java expects.
Neither of you guys said anything about data structures and neither did Kathy and Herb in their books. I assume that means that Java just doesn't support structures. It that right? That means I'll have to take a different approach. Let me know what you think about this?
1) On the mainframe, split the file into two files. One with all convertable fields in display format and the second one with the remaining binary fields. 2) Download the first file in character format, so that the EBDIC to ASCII conversion takes place, and download the second file in binary format, so that no EBDIC to ASCII conversion is performed. 3) Then in a Unix Java program, read a record from the first file via the readLine method, and read the appropiate number of matching fields from the second file via the readInt and readDouble methods. 4) Save these fields in a Java class, and write the file out to dasd with the serialization methods mentioned earlier, so that the file can be proccessed later with other Java programs.
I remember the transition from COBOL/Host/FixedRecords into the anarchy of free flowing data
It gets easier if you don't try to recreate the familiar structures, but instead adopt the totally different approach.
[OK smartassing can get on your nerves, sorry]
Originally posted by JR Daniel:
Neither of you guys said anything about data structures and neither did Kathy and Herb in their books.
Well in my view it's quite the contrary since nearly everything is a datastructure in java.
You might try to build a Class with your record structure as attribute. So every record has a number of setters to fill the attributes, a number of getters to read single attributes (could even be obsolete) and a dumpRecord() method that spits aout the whole record in the form you would expect your line of data.
With this String you could then write a file.
Usage would be a bit like:
Joined: Jan 29, 2003
I clearly recall your pain. The non-IBM-mainframe world is into more nebulous structures. I still miss the power of redefines, hierarchical data and 88s in COBOL!
Delimited files are common. "Comma Separated Values" or CSV is a near-standard, with commas between fields, quotes around string fields either all the time or only if they include other commas. Then you have to escape quotes within your string. If you have Excel, create several columns with as many different data types as you can imagine - include commas and quotes and other special characters. Then "save as" CSV and see what the result looks like.
XML is the "lingua franca" of the day, or as I like to call it the "lowest common denominator", supported in some way by almost every language and platform. It would be good to get some experience with it as it becomes more and more common for communication between systems and even within systems.
Right before XML came along I got a lot of mileage out of sending "smart delimited" strings between PCs and mainframe COBOL. The writer picks a delimiter that's not in the data and puts it as the first character in a delimited string. You can nest these things to use one delimiter between fields, another between records. The reader gets the first character and uses that to split the rest of the string. Advantages are no fixed delimiter and no escape characters, but it takes some overhead to pick a delimiter and you can't send all 256 values for a 1-byte character cause you have to keep one for the delimiter.
If you have some time to experiment, learn to read CSV files and XML. Let us know how it goes!
Joined: Mar 27, 2005
Thank you guys for all of your advise.
Stan, I do have some questions about those CSV files you transferred between PC's and mainframe Cobol. Did you let TCPIP do the ASCII/EBDIC conversion? Did you encounter any IP ASCII/EBDIC conversion issues? Do you know if IP ASCII/EBDIC conversion is all or nothing, or can you specify which fields to convert, and which fields to leave alone?
Also, with CSV files being so common, does Java have any methods to help with the formatting and unformatting of them? jd
Joined: Jan 29, 2003
In the 1990-1995 timeframe I was doing PC to Mainframe communications via APPC - a proprietary IBM protocol over SNA I think. We made everything display format on the mainframe, X or 9 with no packed or binary fields, so we could use the all-at-once ASCII-EBCDIC conversion. I think here were options to mask certain binary areas or provide your own translation tables, but we never used those. Today we do the same kind of thing with MQ-Series doing the translation somewhere along the line. In fact I'm on my 4th version of the GUI program calling some of the same mainframe programs. Seems to be a carnival ride I can't get off!
I agree about CSV - I'd expect a really solid library somewhere. I think there are some open source packages if you Google for them, but I don't know if any one is better than another.
Do you plan to communicate from Windows or AIX to mainframes? What tools or protocols do you have in mind? I've talked to CICS almost exclusively with MQ-Series for years, plus a tiny bit of screen scraping on the side. There are many more or less kludgey products and ways to do it from database stored procs to External Call Interface and Java Connector Architecture.
We just switched from a partner system on CICS that had about 0.75 second response times to their new improved web-services version with about 20 second response time. Ah, progress.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com