• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

CSV EOF issue

 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a csv that is autogenerated. I have no control over it

For some reason the EOF is not at the end of the last line of data. So I have 2 lines at the end of the file with no data.
I'm using opencsv to parse the csv and upon readNext it will not stop at the last line of data. It reads the next empty line and I get a java.lang.ArrayIndexOutOfBoundsException error when I look for a column.

I get it why its happening and I just ignore the exception and move on. I guess that is the best work around I can think of. I could remove the two lines empty lines manually before hand but I don't want to do that.

any other ideas on how to 'clean up' the file before I parse it? remove those last two empty lines and put the EOF where it should be.

 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Before parsing the line, check if there's any data in it.
 
Chuck Barnes
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Joanne Neal wrote:Before parsing the line, check if there's any data in it.



sounds like it would work. But, would there be a performance hit? It would run on every line and I am talking about just over 200k records across 5 files.
As it stands the only extra code is the two instances of the exception being caught.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chuck Barnes wrote:

Joanne Neal wrote:Before parsing the line, check if there's any data in it.



sounds like it would work. But, would there be a performance hit? It would run on every line and I am talking about just over 200k records across 5 files.
As it stands the only extra code is the two instances of the exception being caught.


You'd have to do some performance testing to find the answer to that, but my bet would be that creating and catching an exception would be slower than checking the length of a string.

And I'm sure one of Campbell Ritchie's many rules will say that you should not use exceptions to control program flow.
 
Ranch Hand
Posts: 326
Android Mac OS X Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What you could use, if you don't want to get the curse of Campbell upon you, is to use some sort of a tail-function to read the file from the end until you find the first line that has true content.

Forget that I said that. Had to do a little test and if using the "trim"-solution on a 100k-row large CSV-file, it is still faster in total than trying to clean out the trailing blank rows with a "tail"-function.

But the "catch exception" solution is not that much quicker than the trim.

 
Marshal
Posts: 79179
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ove Lindström wrote:. . . the curse of Campbell upon you, . . .

Mwaaahaahaahaa!

How did you implement your tail program?
Would it be possible to read the entire file into a List<String> and iterate the List backwards removing items until a non-empty String is found? That would depend on it memory footprint, I would presume.
 
Ove Lindström
Ranch Hand
Posts: 326
Android Mac OS X Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

Ove Lindström wrote:. . . the curse of Campbell upon you, . . .

Mwaaahaahaahaa!

How did you implement your tail program?
Would it be possible to read the entire file into a List<String> and iterate the List backwards removing items until a non-empty String is found? That would depend on it memory footprint, I would presume.



The tail program is a variant of Mat Flemings tail implementation (http://mattfleming.com/node/11).

I was thinking in the same direction as you. If we have enough memory, it would be possible to read it all and then do all the parsing. I was also thinking in the direction of a codec, having a factory that gets one line at the time and can handle the case of not enough information.

Did a test where I validated the sting with a regexp, but that is slower than the trim-check-for-zero-length solution.
 
Campbell Ritchie
Marshal
Posts: 79179
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That tail implementation seems to use a random access file. Can you convert a CSV to random access?

At this point, we are beyond the “beginning” stage, so I shall move this discussion.
 
Ove Lindström
Ranch Hand
Posts: 326
Android Mac OS X Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:That tail implementation seems to use a random access file. Can you convert a CSV to random access?

At this point, we are beyond the “beginning” stage, so I shall move this discussion.



You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.
 
Chuck Barnes
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ove Lindström wrote:

You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.



I think I'll try that idea. I was wondering if you could point me in the direction of an example of a factory that you refer too? Im not sure I know the concept of what a 'factory' is.

Thanks for the help
 
Ove Lindström
Ranch Hand
Posts: 326
Android Mac OS X Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chuck Barnes wrote:

Ove Lindström wrote:

You can convert any file to a random access. I still think that this is a Parser pattern problem and should be treated as such. Create a factory that can handle a line and parse it from there. Return an object if parsing was ok or a null-implementation of that objects interface if not. Or throw an exception. But I am not that fond out using exceptions to control flows.



I think I'll try that idea. I was wondering if you could point me in the direction of an example of a factory that you refer too? Im not sure I know the concept of what a 'factory' is.

Thanks for the help



The factory pattern is described at http://en.wikipedia.org/wiki/Factory_method_pattern and many more places.

The general idea is to have a logic that somehow chooses what to do so the caller doesn't necessary need to know exactly what or how an object is going to be created. In your case, you could have an identifier on the first line of the cvs-file that tells you what format to use. Depending on that, you select the correct parser for the data and return an object that can be used by the program.
 
reply
    Bookmark Topic Watch Topic
  • New Topic