wood burning stoves 2.0*
The moose likes I/O and Streams and the fly likes reading and parsing fixed length file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "reading and parsing fixed length file" Watch "reading and parsing fixed length file" New topic
Author

reading and parsing fixed length file

Karan Jain
Ranch Hand

Joined: May 30, 2007
Posts: 82
Hi,
I have to read a fixed length file of 2000 records. Each record is of 1800 bytes. I have the field position of the file in a excel document.
What can be the possible ways to code the file reading logic without hard-coding the field position? Which API is best for this scenario as each record is separated by a newline?

Thanks...
[ June 15, 2007: Message edited by: Karan Jain ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Well if you don't want to hard-code values, you probably need to read them from a file. You can try reading that Excel file using [url=]HSSF[/url] from Apache POI. Or maybe you can use Excel to export the data as a CSV file, which in turn can be easily parsed using an existing library like one from Steve Ostermiller - or see the alternatives he lists.

Once you know how many bytes are in each column, you can read the main data file in a variety of ways. I would probably use a RandomAccessFile and use readFully() to fill a byte[] array of size 1800 (or whatever length you can determine from the Excel file). Then each part of that array can be interpreted as you wish. For example, if bytes 10-29 represent a person's name (as text), you can do somethign like

String name = new String(byteArray, 10, 30);

Or perhaps, if it's not a text file, you might benefit from methods like readInt() or others - it's hard to say without knowing more about the format.

Note that if each record is 1800 bytes, but the records are also separated by newlines, it may be important to learn whether the newline is part of the 1800 bytes, or not.

Since records are separated by newlines, you may want to use a BufferedReader's readLine() method. Or use a Scanner and nextLine(). However this is only a good idea if the file is all text, and only if the file encoding is known, and the encoding is a single-byte encoding. (Such as ISO-8859-1 or Cp-1252, rather than UTF-8.) Since your initial description focuses on bytes, I suspect it's better to avoid a Reader and instead read bytes with a RandomAccessFile.


"I'm not back." - Bill Harding, Twister
Karan Jain
Ranch Hand

Joined: May 30, 2007
Posts: 82
Thanks for the reply Jim.
I have a fixed length text file. Each row is ending with a newline and newline is not part of 1800 characters. I am not sure what will be the format as it is coming as an external data. Is there any way to find out ourselves? I need to upload the file using multipart request to server. Then i need to parse and validate the file.
From the next time i will make sure i am giving more detailed information.
[ June 18, 2007: Message edited by: Karan Jain ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[Karan Jain]: I am not sure what will be the format as it is coming as an external data. Is there any way to find out ourselves?

Well, I suppose there are various ways one might write a program to try to guess at a file format, based on patterns in the file itself. But it would be much simpler to figure out how to read the excel file, I think. I'm not sure I understand the question, because I can't really imagine what else you might want to do here. I personally have no way of guessing what's in your file; you have that info.
Karan Jain
Ranch Hand

Joined: May 30, 2007
Posts: 82
I meant whether there is a way to find the encoding of a text file. Whether its a ISO-8859-1, Cp-1252 or UTF-8. If its the first two, should i use BufferedReader and readline?
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I don't know any built-in way to do that. And fundamentally, the best you can do is guess in a situation like that. It may well be a well-informed guess, but it's still going to be somewhat uncertain. There are various existing libraries that attempt to do this. I've never used any of them, but with a brief bit of googling I found a few. If you use the IntelliJ IDEA IDE, there's a CharsetToolkit class in one of the jarfiles, apparently. There's also a class of the same name with similar functionality that's part of the Groovy project. The latter is free, and the license may allow you to extract the code for that class and use it without the rest of Groovy. There are probably other options if you spend some time googling for them. If you find a good, free one, please let us know. Good luck...
Karan Jain
Ranch Hand

Joined: May 30, 2007
Posts: 82
Sure Jim...

Thanks,
Ravi
sathish kumar
Ranch Hand

Joined: Feb 14, 2007
Posts: 47
I extensively used apache's StringUtils package to escape of NullPointer and other run time exceptions.
 
jQuery in Action, 2nd edition
 
subject: reading and parsing fixed length file
 
Similar Threads
How to jump to specified position?
Can someone explain this in better english for me ?
What about records
URLyBird: How do I read in Deleted Flag?
In Java, how to read from a specific line, given the line number?