I took a look at your code. My feeling is an issue with (1) thread scheduling and (2) with the logging - especially logging to files.
As for the logging to a file: A typical Disk has only a single write head, which means that you can only write to one place at a time. When you try to write to the disk with multiple threads then the disk head has to move to file_location_1 and write a chunk, then seek to file_location_2 and write a chunk, then seek to file_location_3 etc... These file locations can be in very different parts of the disk and what happens is the disk head spends more time seeking from one spot to another. When you condense the writes into a single thread then the disk can write larger chunks to the same area before being diverted to another location and the write efficiency increases. Try feeding all your logging lines into a single Thread (maybe have a LogLine class that maps a file and
String to write to the file. The Gobblers append LogLines into a synchronized queue and a writer thread pops lines off of the queue and writes the string to the proper file).
The other thing to worry about is thread scheduling. It seems like you have lots of tasks - each task runs (1) an external process (the python script), (2) a thread which starts and waits for the external process, and (3) a StreamGobbler which consumes the input and logs the data. Your system has a finite number of threads it can execute in parallel, and so if you have so many tasks many of them are going to be put into a waiting state - waiting for processor time and access to the file system. Since all your tasks would be competing for the same limited resources you could be wasting time with processor time-sharing and/or causing some of the tasks to be resource starved - not getting a 'fair' share of the processor they need to execute. Pushing the file IO into a single thread may help here as well, since then your StreamGobblers won't be competing with each other for File IO. But
you should also tune the number of Threads which will be running at any given time to reflect the resources your system actually has. Have a Thread Pool maybe of 1x, 2x, or 4x threads (where x is the number of executable threads your processors can run), and see if reducing the number of threads increases performance (then fine tune the multiple to get the best performance). This might be something already built in to cron4j, I don't know I don't use it.
Finally, you should not really be guessing about what is happening. Get a Profiler and attach it to your system. See what threads are in what states, how much CPU is being consumed, etc... That will help you really nail down problem areas, whereas the above are just generalized strategies.