| Author |
cutting file_id from file name by looping through foolders UNIX
|
prince davies
Ranch Hand
Joined: May 08, 2009
Posts: 74
|
|
My files are located in UNIX Server, i want to extract file_id and file_name from each file and save it in a CSV file. How do I do that?
I have folders in unix environment, for example directory structure is as follows
year folder -> inside 12 months folders -> inside 30/31 days folders I did ls command for 2012 year as follows
2009 2010 2011 2012
$ cd 2012
$ ls
01 02 03 04 05 06 07 08 09
$ cd 09
$ ls
01 02 03 04 05 06 07 08 09 10 11 12 13
$ cd 13
$ ls
there are folders for each year like 2009,2010,2011 and 2012
and folder has 12 folders for each months like 01,02,03,04,05,06,07,08,09,10,11,12
and each month folder has 31 folders for days like 1,2,3, etc... 29,30,31
inside each day folder has files..
the file name is as follows,
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz
I want to cut 20532 in a clumn and the whole file name in second column sasmm_fsbc_durds_id00020532_t20100313192606.dat
CSV file will look like
file_id file_name
20532 sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 sasmm_fsbc_durds_id00020513_t20120913003312.dat
file_id is to be cut from the file name , if you look at the file name closely, you can see; after 000 , file_ids in above file name examples , they are 20532 and 20513.
How do I loop through year 2012 and 12 months folders and 31 days folders inside it and create csv file which has data as shown above?
I am very new unix, please help me out.. If you provide a code , that would be great.. thanks..
|
 |
prince davies
Ranch Hand
Joined: May 08, 2009
Posts: 74
|
|
looping through folders and read file name and cut string of file_id
creat a new CSV file
save this cut value in file_id column of csv file and file name in file_name column
first file
files are located in 2010 -> 03 -->13
sasmm_fsbc_durds_id00020532_t20100313192606.dat
second file
files are located in 2012 -> 09 -->13
sasmm_fsbc_durds_id00020513_t20120913003312.dat
OUTPUT CSV file
file_id | file_name
20532 | sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 | sasmm_fsbc_durds_id00020513_t20120913003312.dat
|
 |
Tim Holloway
Saloon Keeper
Joined: Jun 25, 2001
Posts: 14486
|
|
You can use the "find" command to iterate through the tree and get the raw filename or filename/path of each file.
To parse and format the paths, you can use any of a number of popular text-formatting tools, including sed, awk, perl or python.
Or (this being the JavaRanch), you can simply write a Java app that uses the java.io.File and java.util.Regex packages!
|
Customer surveys are for companies who didn't pay proper attention to begin with.
|
 |
Anand Hariharan
Rancher
Joined: Aug 22, 2006
Posts: 252
|
|
Did you answer your own question?
Let me know if this works or if it needs tweaking (assumes a reasonably modern shell):
NB: NOT TESTED.
Hope this helps,
- Anand
[Edit: Changed search and replace expressions]
|
"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
|
 |
 |
|
|
subject: cutting file_id from file name by looping through foolders UNIX
|
|
|