This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

reading and parsing fixed length file

 
Karan Jain
Ranch Hand
Posts: 82
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have to read a fixed length file of 2000 records. Each record is of 1800 bytes. I have the field position of the file in a excel document.
What can be the possible ways to code the file reading logic without hard-coding the field position? Which API is best for this scenario as each record is separated by a newline?

Thanks...
[ June 15, 2007: Message edited by: Karan Jain ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well if you don't want to hard-code values, you probably need to read them from a file. You can try reading that Excel file using [url=]HSSF[/url] from Apache POI. Or maybe you can use Excel to export the data as a CSV file, which in turn can be easily parsed using an existing library like one from Steve Ostermiller - or see the alternatives he lists.

Once you know how many bytes are in each column, you can read the main data file in a variety of ways. I would probably use a RandomAccessFile and use readFully() to fill a byte[] array of size 1800 (or whatever length you can determine from the Excel file). Then each part of that array can be interpreted as you wish. For example, if bytes 10-29 represent a person's name (as text), you can do somethign like

String name = new String(byteArray, 10, 30);

Or perhaps, if it's not a text file, you might benefit from methods like readInt() or others - it's hard to say without knowing more about the format.

Note that if each record is 1800 bytes, but the records are also separated by newlines, it may be important to learn whether the newline is part of the 1800 bytes, or not.

Since records are separated by newlines, you may want to use a BufferedReader's readLine() method. Or use a Scanner and nextLine(). However this is only a good idea if the file is all text, and only if the file encoding is known, and the encoding is a single-byte encoding. (Such as ISO-8859-1 or Cp-1252, rather than UTF-8.) Since your initial description focuses on bytes, I suspect it's better to avoid a Reader and instead read bytes with a RandomAccessFile.
 
Karan Jain
Ranch Hand
Posts: 82
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the reply Jim.
I have a fixed length text file. Each row is ending with a newline and newline is not part of 1800 characters. I am not sure what will be the format as it is coming as an external data. Is there any way to find out ourselves? I need to upload the file using multipart request to server. Then i need to parse and validate the file.
From the next time i will make sure i am giving more detailed information.
[ June 18, 2007: Message edited by: Karan Jain ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Karan Jain]: I am not sure what will be the format as it is coming as an external data. Is there any way to find out ourselves?

Well, I suppose there are various ways one might write a program to try to guess at a file format, based on patterns in the file itself. But it would be much simpler to figure out how to read the excel file, I think. I'm not sure I understand the question, because I can't really imagine what else you might want to do here. I personally have no way of guessing what's in your file; you have that info.
 
Karan Jain
Ranch Hand
Posts: 82
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I meant whether there is a way to find the encoding of a text file. Whether its a ISO-8859-1, Cp-1252 or UTF-8. If its the first two, should i use BufferedReader and readline?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know any built-in way to do that. And fundamentally, the best you can do is guess in a situation like that. It may well be a well-informed guess, but it's still going to be somewhat uncertain. There are various existing libraries that attempt to do this. I've never used any of them, but with a brief bit of googling I found a few. If you use the IntelliJ IDEA IDE, there's a CharsetToolkit class in one of the jarfiles, apparently. There's also a class of the same name with similar functionality that's part of the Groovy project. The latter is free, and the license may allow you to extract the code for that class and use it without the rest of Groovy. There are probably other options if you spend some time googling for them. If you find a good, free one, please let us know. Good luck...
 
Karan Jain
Ranch Hand
Posts: 82
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure Jim...

Thanks,
Ravi
 
sathish kumar
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I extensively used apache's StringUtils package to escape of NullPointer and other run time exceptions.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic