I've a distributed application made in java that runs in a clustering system. This application is composed by several components. Each component run on each node of the cluster. The components exchanges messages using HTTP protocol and RPC invocations. The cluster is composed by several linux machines and it doesn't use X11. I'm doing some performance tests to see if I can improve the performance of my application.
For that, I'm trying to see how many memory the application uses, or if it exists loops or deadlocks, using top command. I'm also looking to the messages exchanged using the tcpdump command, to see if I can reduce the number of messages exchanged between the components. I've also made a static code analysis using the PMD and findbugs application to find possible errors, or deadlocks.
In overall, I trying to do all the possible tests to see if my application doesn't consume plenty of resources (CPU, memory, ...) and if it's possible to improve its performance.
1 - But, I don't have too much experience in analysing the behaviour of an application using the top or the tcpdump. Can anyone give me an advice on how to debug programs using top and tcpdump?
2 - Is there any good manual that explains how to debug programs?
3- Is there any other tool beside PMD and findbugs to make static code analysis, or any other type of analyses?
It looks like you skipped the first step in the process, namely the step where you define what "performance" is. What, exactly, are you trying to improve?
For example since your system is processing messages, you might want to increase the number of messages processed per second, or reduce the average time required to process a message.
Once you have a goal, then everything you do should be in support of that goal. So: reducing the memory footprint of the system, for example. Would that help achieve that goal? Or would it be better to actually increase the memory footprint? Point being, you can go in and optimize things at random, but that may or may not achieve something useful.
I see that each components takes about 70% of the CPU in each node. I don't understand if this is related to the data that he's processing, or it's because of something wrong in the code, like deadlock, or infinite whiles. The application takes 30 minutes to process 1GB of data through all the cluster.
Is this a good a point to start to improve the performance of an application?
xeon costa wrote:Is this a good a point to start to improve the performance of an application?
If you defined "performance" as the CPU usage of the application, then yes. Your next step is to decide whether "improving" that performance would require increasing the CPU usage or decreasing it. Although I have to say, that doesn't strike me as a particularly useful performance metric.
xeon costa wrote:If I start to talk about the amount of messages that each switched through all components. Is it a useful performance metric?
Well, the application is there to do something, right? So there must be people around who have complaints about how well it does something. You would use those complaints to figure out what was useful to look at.