File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes CSV EOF issue Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "CSV EOF issue" Watch "CSV EOF issue" New topic
Author

CSV EOF issue

Chuck Barnes
Ranch Hand

Joined: Aug 10, 2010
Posts: 37
I have a csv that is autogenerated. I have no control over it

For some reason the EOF is not at the end of the last line of data. So I have 2 lines at the end of the file with no data.
I'm using opencsv to parse the csv and upon readNext it will not stop at the last line of data. It reads the next empty line and I get a java.lang.ArrayIndexOutOfBoundsException error when I look for a column.

I get it why its happening and I just ignore the exception and move on. I guess that is the best work around I can think of. I could remove the two lines empty lines manually before hand but I don't want to do that.

any other ideas on how to 'clean up' the file before I parse it? remove those last two empty lines and put the EOF where it should be.

Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3169
    
  10
Before parsing the line, check if there's any data in it.


Joanne
Chuck Barnes
Ranch Hand

Joined: Aug 10, 2010
Posts: 37
Joanne Neal wrote:Before parsing the line, check if there's any data in it.


sounds like it would work. But, would there be a performance hit? It would run on every line and I am talking about just over 200k records across 5 files.
As it stands the only extra code is the two instances of the exception being caught.
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3169
    
  10
Chuck Barnes wrote:
Joanne Neal wrote:Before parsing the line, check if there's any data in it.


sounds like it would work. But, would there be a performance hit? It would run on every line and I am talking about just over 200k records across 5 files.
As it stands the only extra code is the two instances of the exception being caught.

You'd have to do some performance testing to find the answer to that, but my bet would be that creating and catching an exception would be slower than checking the length of a string.

And I'm sure one of Campbell Ritchie's many rules will say that you should not use exceptions to control program flow.
Ove Lindström
Ranch Hand

Joined: Mar 10, 2008
Posts: 326

What you could use, if you don't want to get the curse of Campbell upon you, is to use some sort of a tail-function to read the file from the end until you find the first line that has true content.

Forget that I said that. Had to do a little test and if using the "trim"-solution on a 100k-row large CSV-file, it is still faster in total than trying to clean out the trailing blank rows with a "tail"-function.

But the "catch exception" solution is not that much quicker than the trim.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36501
    
  16
Ove Lindström wrote:. . . the curse of Campbell upon you, . . .
Mwaaahaahaahaa!

How did you implement your tail program?
Would it be possible to read the entire file into a List<String> and iterate the List backwards removing items until a non-empty String is found? That would depend on it memory footprint, I would presume.
Ove Lindström
Ranch Hand

Joined: Mar 10, 2008
Posts: 326

Campbell Ritchie wrote:
Ove Lindström wrote:. . . the curse of Campbell upon you, . . .
Mwaaahaahaahaa!

How did you implement your tail program?
Would it be possible to read the entire file into a List<String> and iterate the List backwards removing items until a non-empty String is found? That would depend on it memory footprint, I would presume.


The tail program is a variant of Mat Flemings tail implementation (http://mattfleming.com/node/11).

I was thinking in the same direction as you. If we have enough memory, it would be possible to read it all and then do all the parsing. I was also thinking in the direction of a codec, having a factory that gets one line at the time and can handle the case of not enough information.

Did a test where I validated the sting with a regexp, but that is slower than the trim-check-for-zero-length solution.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36501
    
  16
That tail implementation seems to use a random access file. Can you convert a CSV to random access?

At this point, we are beyond the “beginning” stage, so I shall move this discussion.
Ove Lindström
Ranch Hand

Joined: Mar 10, 2008
Posts: 326

Campbell Ritchie wrote:That tail implementation seems to use a random access file. Can you convert a CSV to random access?

At this point, we are beyond the “beginning” stage, so I shall move this discussion.


You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.
Chuck Barnes
Ranch Hand

Joined: Aug 10, 2010
Posts: 37
Ove Lindström wrote:

You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.


I think I'll try that idea. I was wondering if you could point me in the direction of an example of a factory that you refer too? Im not sure I know the concept of what a 'factory' is.

Thanks for the help
Ove Lindström
Ranch Hand

Joined: Mar 10, 2008
Posts: 326

Chuck Barnes wrote:
Ove Lindström wrote:

You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.


I think I'll try that idea. I was wondering if you could point me in the direction of an example of a factory that you refer too? Im not sure I know the concept of what a 'factory' is.

Thanks for the help


The factory pattern is described at http://en.wikipedia.org/wiki/Factory_method_pattern and many more places.

The general idea is to have a logic that somehow chooses what to do so the caller doesn't necessary need to know exactly what or how an object is going to be created. In your case, you could have an identifier on the first line of the cvs-file that tells you what format to use. Depending on that, you select the correct parser for the data and return an object that can be used by the program.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: CSV EOF issue
 
Similar Threads
parse a csv file (PLEASE HELP)
JFileChooser
Changing String Value (if and else statements
CSV file to open in Excel
Premature EOF while reading from Inputstream