I have a requirement in which I need to load multipl files into an Oracle database. The actual code to read and load these files is written in Java because there a lot of business validations etc that need to be done on these files prior to loading. There is a shell script that ftps/scps these files from a remote location to the working area (Linux server). If a file fails due to some reason, the rest of the files should continue to be loaded.
My question is where should I loop and why? The concern is more around trying to find the better option when loading huge files.
Option 1 - Loop in the shell script -----------------------------------
Option 2 - Loop in the java file -----------------------------------
Loop in the java is better approach simply because you have full control in the Java itself to handle the error handling like logging a error log record into the db, moving file across to a error folder etc.
Moreover you will be starting the java for each file if you do in Shell script, which is unnecessary.
Personally, when it comes to shell interaction, I do as little of that from my Java program as possible. Up to a certain point, it's pretty easy, but things can get really ugly really quickly. For example, will you be reading error codes from your external application? Is it important to capture STDERR? All of these tasks can be difficult to do from a Java program.
So I guess I would say that if you're doing a little bit of shell integration, then do everything from your Java program. Otherwise, you're better off creating a Java program that simply does validation, and then calling that program from a shell script.
P.S. If you find yourself writing a lot of Java programs that interact with some sort of shell, you may want to consider looking into Groovy. It has *significantly* better shell integration than plain-old-java, and can save you from having to write a Java program *and* a shell script.
it's eventually up to you. But I'd do the loop in bash, and make the Java program to be simpler and more reusable.
Basically I prefer to have Java does the heavy lifting of loading the files into the oracle db and have the shell that drives the usage of the Java class.
This approach is useful because the loop statement is very simple
But since it's bash, modifying the looping and file operation becomes much easier. For example, if you only want to load files with .db extension from the dir, simply do
and so on. Also sometimes certain preprocessing may be required, i.e removing certain lines or replacing certain values. In this case, sed and awk can come in handy. Or even, if there is another program already written (even in another language, i.e PERL or PYTHON) it can be invoked from the 'driver' bash shell prior to calling the Java class that you wrote.
Adding additional activities such as backing up, aging, zipping, unzipping files also becomes easier. I.e
For example, if the directory contains a collection of files that needs to be loaded to db, but each of them compressed and needs to be preprocessed, and after each load the file needs to be aged/archived somewhere the script can easily accommodate:
(Note: code above is not tested, but you get the idea...)
So not doing the loop in the Java code increases the modularity of the java program, it makes it much simpler and much more reusable. [ August 24, 2008: Message edited by: Zenikko Sugiarto ]