Running several Python scripts in concurrent threads taking too long
Joined: Apr 26, 2012
I posted on this thread about a problem I was having, where the threads I was running never finished - that problem is solved. Now, though, I have another problem, which is in the last post of that thread, but I'll talk about it here.
The threads are taking too long to finish. When I schedule several processes via Linux's "cron", they're all done after, at most, one and half, two minutes tops. When they run via my java scheduler, however, they take about five, six minutes to finish. What could be the problem here? How do I solve it?
I took a look at your code. My feeling is an issue with (1) thread scheduling and (2) with the logging - especially logging to files.
As for the logging to a file: A typical Disk has only a single write head, which means that you can only write to one place at a time. When you try to write to the disk with multiple threads then the disk head has to move to file_location_1 and write a chunk, then seek to file_location_2 and write a chunk, then seek to file_location_3 etc... These file locations can be in very different parts of the disk and what happens is the disk head spends more time seeking from one spot to another. When you condense the writes into a single thread then the disk can write larger chunks to the same area before being diverted to another location and the write efficiency increases. Try feeding all your logging lines into a single Thread (maybe have a LogLine class that maps a file and String to write to the file. The Gobblers append LogLines into a synchronized queue and a writer thread pops lines off of the queue and writes the string to the proper file).
The other thing to worry about is thread scheduling. It seems like you have lots of tasks - each task runs (1) an external process (the python script), (2) a thread which starts and waits for the external process, and (3) a StreamGobbler which consumes the input and logs the data. Your system has a finite number of threads it can execute in parallel, and so if you have so many tasks many of them are going to be put into a waiting state - waiting for processor time and access to the file system. Since all your tasks would be competing for the same limited resources you could be wasting time with processor time-sharing and/or causing some of the tasks to be resource starved - not getting a 'fair' share of the processor they need to execute. Pushing the file IO into a single thread may help here as well, since then your StreamGobblers won't be competing with each other for File IO. But you should also tune the number of Threads which will be running at any given time to reflect the resources your system actually has. Have a Thread Pool maybe of 1x, 2x, or 4x threads (where x is the number of executable threads your processors can run), and see if reducing the number of threads increases performance (then fine tune the multiple to get the best performance). This might be something already built in to cron4j, I don't know I don't use it.
Finally, you should not really be guessing about what is happening. Get a Profiler and attach it to your system. See what threads are in what states, how much CPU is being consumed, etc... That will help you really nail down problem areas, whereas the above are just generalized strategies.
Joined: Apr 26, 2012
Oh my god. Even before I got a chance to actually read your post, everything kinda exploded. We tried using the system (which is working - slowly, but working) in our test server to see if it was a local problem. Every freaking script decided to not work for no reason. I naturally assumed it was a library problem - it was not, I packaged everything into my runnable jar, and, just to be sure, set them up in our server. Still, no dice. Maybe it was permissions, I thought. Again, it was not. I'm seriously stumped to what it could be. I am aware this is not a thread-specific problem, but damn, everything in this thing seems to blow up in my face.
Anywho, back to the main issue. I attempted using Quartz Scheduler to see if it was a problem with Cron4J or something. Alas, it was not, everything ran just as slowly (if not more) with Quartz. I'll read your post more thoroughly later on and post my thoughts on it. Right now, I'm trying to solve this incompatibility issue.