I am testing my own 'desktop application' which processes bulk user data with the help of multithreading with each thread processing one user's data. The number of threads running at a time is configurable and I have set it to a value which gave us best overall throughput. Out of blue, I had setup multiple instances running with the same number of threads per each instance. That had resulted in steep increase in performance. Increasing threads after certain limit had adverse effect on performance, but with multithreading if I implement multiprocessing I have improved results. However as the application writes some history in common folders like user documents folder, running multiple instances of application on same machine is not recommanded.
Though I can avoid storing of history in common place, I don't want user to run multiple instances, but instead I want only single instance (say parent process) running which will ...
1. spawn multiple processes which will actually do bulk user processing
2. each process itself implement multithreading, each thread for one user
3. all processes will communicate with this single parent process, which provide common user interface for all (like status of individual user activity within a process and then within a thread).
How spawning of java processes, multiprocessing communication are achievable which will satisfy my design approach? Any other better suggestion for faster performance of my application is welcome.
Of course Java offers similar solutions for inter-process communication like other programming languages (for example communication via sockets, RMI or maybe REST services).
That said I'm pretty sure there is a good reason why your application degrades when it uses more than a certain amount of threads in parallel. Do you know WHY you get better results when you're are running two different JVMs in contrast to one JVM with more threads? I'd rather try to fix this issue than to use inter-process communication which is probably not as efficient and makes things even more complicated.
A typical reason for such issues could be the concurrent behavior of data structures like the common collections. There are some benchmarks available on the internet which clearly show that the performance of some collections is optimal only within a specific range of parallel threads. Another thing could be lock contention because of synchronization bottlenecks (which could for example be solved by using lock-free concurrent programming models) etc.
Why do you think it's really necessary to have multiple processes? Of course there may be good reasons to do so but I can't imagine that performance issues like this necessarily require multiple processes. But perhaps I could be wrong...
Joined: Feb 12, 2008
Thank you very for your reply. Yeah it was my first post here.
I will revisit the logic behind using multiprocessing. However, I did not refer any forums and all for this, but the idea is purely based on the experience I had with multiple instances of application running. I believed sockets, RMI or CORBA may be used for multiprocessing. However I am not much versed with Java technologies, so my question was seeking answer for a technology as well as suggestions on this or any other approach.
Originally my idea is the parent process will provide a common UI, spawns multiple processes, then divide userlist selected in UI and pass separate lists to individual processes. The child processes will do their job on that group of users. There won't be any communication among child processes. Either parent process will collect the status or individual processes pass the status to parent process. Status by means is just the percentage migration done of individual user, so frequency of status update can be controlled avoiding expensive resource utilization.
This application is a migration tool and I can dedicate one machine for this application to run, so I have enough resources, that I want to fully utilize for better performance. This may be one reason I wanted to go with multiprocessing. Existing application is anyway doing good, however it proved it may work even better, I will give thoughts on all possible approaches. All welcome
I believed sockets, RMI or CORBA may be used for multiprocessing.
You're absolutely right. These technologies could be used for communication between processes (even on different machines). There are surely a lot of other possibilities like XML-RPC for example. Almost any of these communication solutions have their own advantages or disadvantages. So it depends on your detailed requirements what could be a solution for you. RMI for example is one preferred way to communicate between Java applications but it won't be the best idea if you have applications running with different programming languages.
In general, with multiple threads there's obviously no need for such technologies because threads have direct access to shared data which usually means less overhead. That's why I recommended to improve the application to run with only one process but multiple threads which is usually more resource friendly.
Originally my idea is the parent process will provide a common UI, spawns multiple processes, then divide userlist selected in UI and pass separate lists to individual processes.
There's basically nothing wrong with this concept and the separation of concerns! In my opinion it should be even more efficient if you can modify your application to use the same modular concept but with multiple threads instead of multiple processes. That should make the communication logic between modules much easier and the overall performance should be better.
It's a completely different thing if your problem domain is too difficult to solve for a single machine. If you really have to use multiple machines for performance reasons, i.e. you have to create a distributed system, then there's (almost) no other way than to use inter-process communication technologies to synchronize and communicate between different processes running even on different machines. But to write a distributed system from scratch (and to get it right!) will most probably be overkill for the average application
Existing application is anyway doing good, however it proved it may work even better, I will give thoughts on all possible approaches.
That's for sure a good starting point if your application is already performing well. As I already said I suspect that you're only problem could be some technical details which prevent your application to scale up to run efficiently with even more threads. I'm not an expert here but a good understandable example are collection classes. If you share data structures like collections between threads there are some collections which are better suited for concurrent access and some are not. Some only scale up well up to a certain amount of threads accessing it and then degrade in performance.
And that's exactly what you should try to find out. Somewhere there's a bottleneck in your application. Besides a bottleneck inside the application itself or with external resources like a database or filesystem there's no obvious reason why multiple processes (running on the same machine) should scale better than an equivalent application running with one process and multiple threads. Using a profiler to locate such a bottleneck would be the best approach to find a starting point for optimization.
A excellent resource for details are the online articles and the book of Brian Goetz. He's definitely what I would call an expert on Java concurrency. Unfortunately this is a quite difficult topic (in my opinion) but it will definitely become increasingly important for writing efficient applications on modern multicore machines.
Joined: Feb 12, 2008
Thanks Marco for your detailed reply and some pointers. I will have a look at the suggested articles. I will debug the code for potential issues in my thread based model and simultaneously work on a POC of a multiprocess application.