I am trying to understand if my approach to the following problem is workable or somewhere close to best practice.
I am re-writing a batch file re-namer, trying to use more advanced Java techniques. My old version uses simple loops along with a SwingWorker thread. With the new version I am trying to incorporate more advanced threading using ExecutorService.
The code below takes in a file array and a few "switches" that I use to determine which method to use to rename the current file. The array can contain a mixture of files and directories, which I sort at the beginning. I then rename the files and determine the count of directories in this array. If there are directories, I establish a thread pool equal to the number of directories in order to have a thread per sub-directory which would then recursively call RecursiveRename in that thread using the loop to feed it a directory to process.
The problem I foresee is if the program is fed a root directory with 15 levels of sub-directories, this will obviously lead to a massive number of total threads spread over many pools. Remember, this program is just a learning tool for me, so I'd like to stay with the ExecutorService but if this is completely crazy for what I am trying to do, I'll abandon this method.
I thought about using a fixed pool number equal to the number of directories directly under the root, but this seems limiting. On the other hand, it seems simple to test for a thread that has completed its work at which time I could merely assign it the next directory to process. I also wonder if my desire to make my re-namer massively parallel a simple waste of resources. So I guess my question is this: is recursive pool creation a good idea in this context? Is it ever a good idea?
Do I need to watch out for concurrency issues here? If one thread is renaming a directory farther up the tree while another thread is renaming a file lower down the tree, I think I'll have conflicts.
I feel like I am in the area of most danger. I know just enough to screw everything up. :-)
Thanks for your help,
I've got just enough Java knowledge to royally screw everything up. :-)
The number of threads in a thread pool is calculated of available computational resources: number of processors and I/O devices. Good starting point is the number of available processors (cores). If tasks do some I/O, the optimal number of threads can be corrected: increased for network IO, decreased for disk IO. But since number of cores is relatively small, there is no much sense to decrease number of threads for disk IO, but you should expect that tasks doing disk IO would not run in parallel.
Choosing number of threads based on logical resources (15 directories, in your case) has no sense. All logical dependencies should be managed in some other way, usually by properly submitting tasks to threadpool in a timely manner.
And since renaming would be done by the O/S anyway, the rate of parallelization is limited by O/S. Very likely, O/S does not parallelize file renaming, so the whole idea to exploit thread pool is worthless.
Joined: Mar 30, 2012
Thanks for the reply Alexei. I knew I was missing something in terms of the big picture, and the I/O is it. What point is there to using threads for this type of disk access?