wood burning stoves*
The moose likes Linux / UNIX and the fly likes cutting file_id from file name by looping through foolders UNIX Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Engineering » Linux / UNIX
Bookmark "cutting file_id from file name by looping through foolders UNIX" Watch "cutting file_id from file name by looping through foolders UNIX" New topic
Author

cutting file_id from file name by looping through foolders UNIX

prince davies
Ranch Hand

Joined: May 08, 2009
Posts: 74


My files are located in UNIX Server, i want to extract file_id and file_name from each file and save it in a CSV file. How do I do that?

I have folders in unix environment, for example directory structure is as follows

year folder -> inside 12 months folders -> inside 30/31 days folders I did ls command for 2012 year as follows

2009 2010 2011 2012
$ cd 2012
$ ls
01 02 03 04 05 06 07 08 09
$ cd 09
$ ls
01 02 03 04 05 06 07 08 09 10 11 12 13
$ cd 13
$ ls

there are folders for each year like 2009,2010,2011 and 2012
and folder has 12 folders for each months like 01,02,03,04,05,06,07,08,09,10,11,12
and each month folder has 31 folders for days like 1,2,3, etc... 29,30,31


inside each day folder has files..

the file name is as follows,
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz

I want to cut 20532 in a clumn and the whole file name in second column sasmm_fsbc_durds_id00020532_t20100313192606.dat

CSV file will look like

file_id file_name
20532 sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 sasmm_fsbc_durds_id00020513_t20120913003312.dat



file_id is to be cut from the file name , if you look at the file name closely, you can see; after 000 , file_ids in above file name examples , they are 20532 and 20513.


How do I loop through year 2012 and 12 months folders and 31 days folders inside it and create csv file which has data as shown above?

I am very new unix, please help me out.. If you provide a code , that would be great.. thanks..
prince davies
Ranch Hand

Joined: May 08, 2009
Posts: 74
looping through folders and read file name and cut string of file_id
creat a new CSV file
save this cut value in file_id column of csv file and file name in file_name column


first file
files are located in 2010 -> 03 -->13
sasmm_fsbc_durds_id00020532_t20100313192606.dat

second file
files are located in 2012 -> 09 -->13
sasmm_fsbc_durds_id00020513_t20120913003312.dat

OUTPUT CSV file

file_id | file_name
20532 | sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 | sasmm_fsbc_durds_id00020513_t20120913003312.dat


Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 16160
    
  21

You can use the "find" command to iterate through the tree and get the raw filename or filename/path of each file.

To parse and format the paths, you can use any of a number of popular text-formatting tools, including sed, awk, perl or python.

Or (this being the JavaRanch), you can simply write a Java app that uses the java.io.File and java.util.Regex packages!


Customer surveys are for companies who didn't pay proper attention to begin with.
Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 257

Did you answer your own question?

Let me know if this works or if it needs tweaking (assumes a reasonably modern shell):



NB: NOT TESTED.

Hope this helps,
- Anand

[Edit: Changed search and replace expressions]

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: cutting file_id from file name by looping through foolders UNIX