aspose file tools*
The moose likes Performance and the fly likes Best and optimum method to delete the first line from pipe delimited file with HUGE size Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Best and optimum method to delete the first line from pipe delimited file with HUGE size" Watch "Best and optimum method to delete the first line from pipe delimited file with HUGE size" New topic
Author

Best and optimum method to delete the first line from pipe delimited file with HUGE size

Titan Spectra
Greenhorn

Joined: Jul 19, 2011
Posts: 4

Hi,

I would like to know the best and optimum method to delete the first line ( specifically ) from a pipe delimited file , which is huge is size.

I have tried the RandomAccessFile , FileChannel , Buffers .... but in all of these cases , I have to load the entire file , make the operations and then rewrite the whole file again.

This is a very expensive operation I do not want to load the whole file into memory and cause memory related problems , also since the size of the file may run to GB's , it is not feasible..

Can anybody direct me to an optimum method to just delete the first line from the pipe delimited file , without loading / rewriting the file.

Note: Could regular expressions help me in this , but the problem is that the length / format for the first line would be unknown. the only known thing would be the pipe.

Thanks in advance.
Madhan Sundararajan Devaki
Ranch Hand

Joined: Mar 18, 2011
Posts: 312

I believe, without loading the file you cannot skip/delete the first line!


S.D. MADHAN
Not many get the right opportunity !
Titan Spectra
Greenhorn

Joined: Jul 19, 2011
Posts: 4

Yes , agreed. But loading a huge file which may run into GB's just for deleting one line , that is my concern.

Would like to know if there is a better way of doing it , else if i need to load the file , which is the most optimum method which would not require a large memory.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11419
    
  16

"best" and "optimum" are subjective terms. Best in terms of

a) program complexity/simplicity?
b) memory consumption?
c) speed?
d) fault tolerance?
e) recoverability?

several of these are conflicting - in other words, you can't have it all. You need to define what is most important, what is least, and HOW important each is.

your second post indicates that memory may be an issue.

Can you read the file a line (or 10 lines, or 100 lines) at a time, write them to the output file, then get the next 'chunk'?


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Titan Spectra
Greenhorn

Joined: Jul 19, 2011
Posts: 4

fred rosenberger wrote:"best" and "optimum" are subjective terms. Best in terms of

a) program complexity/simplicity?
b) memory consumption?
c) speed?
d) fault tolerance?
e) recoverability?

several of these are conflicting - in other words, you can't have it all. You need to define what is most important, what is least, and HOW important each is.

your second post indicates that memory may be an issue.

Can you read the file a line (or 10 lines, or 100 lines) at a time, write them to the output file, then get the next 'chunk'?


Oops..my bad.. I should have been more specific.

Well in terms of what i exactly need , mentioned below are the same

a) Memory Consumption and Speed are the first priority
b) Recoverability would be the second priority

The rest follow ... program complexity / simplicity is not an issue as long as my first 2 priorities can be met.

I could read the file in chunks , and then re-write the output file.
But this is something that I want to avoid , since I want to just delete the first line from the file . Loading and rewriting a 1GB file just to delete the first line from it , is what I don't want to do , would like to know any other approach to this problem ( ( like the sed / tail command in Linux ).
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12806
    
    5
You appear to be asking for a magic method to cause the first line to disappear while the rest of the file "moves up" to occupy the space previously used.

Think about it - how would an operating system store a file such that this is possible?

Reading and writing binary blocks to a new file, using a block size that fits with the operating system's internal buffers is your best bet. Do NOT do any character conversion, stick to binary.

Bill

Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18662
    
    8

Titan Spectra wrote:Can anybody direct me to an optimum method to just delete the first line from the pipe delimited file , without loading / rewriting the file.


There isn't any such method, let alone an "optimum" method. Like Bill says, operating systems don't support that sort of thing.

If you omit the requirements at the end of that sentence then the optimum method is to read the file one line at a time and write out a new version, not writing the first line. There is absolutely no need to read the entire file into memory.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4659
    
    5

Paul Clapham wrote:There isn't any such method, let alone an "optimum" method. Like Bill says, operating systems don't support that sort of thing.

Some operating systems would let you get very close. But I don't know of any in widespread use today that support it.

VMS (the Vax OS of the 70s and 80s) stored a file not as a string of bytes, but as an array of records, with a binary record descriptor at the start of each record. For normal ASCII files, each record was just a line of the file. On a Vax you did not have a \n to delimit the line, rather the binary record descriptor at the begining of each line/record had the number of bytes in the record, which could be padded. With this, you could make the first record disappear by simply changing the descriptor for the first record to show zero interesting bytes.

This did not, of course, actually make the file smaller. To do that, you have to read each and every byte of the file, and write out the ones you like.
Doing this for files of a gigabyte or two will not take all that long, assuming you don't do a lot of buffer reallocation/garbage collection. Naturally you want to only read in a buffer at a time.
Titan Spectra
Greenhorn

Joined: Jul 19, 2011
Posts: 4

I think I get it now , guess the posts have helped me in a way , will try out the solutions offered.

A very Big Thank you to William , Paul , Pat , Fred , Madhan for all your help and inputs on the problem. I will try out the same and will update the thread on the same.
 
Don't get me started about those stupid light bulbs.
 
subject: Best and optimum method to delete the first line from pipe delimited file with HUGE size