aspose file tools*
The moose likes Performance and the fly likes Design for high throughput Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Design for high throughput" Watch "Design for high throughput" New topic
Author

Design for high throughput

Edmund Yong
Ranch Hand

Joined: Nov 16, 2003
Posts: 164
Sometime ago I went to a job interview, and the interviewer drew on a white board a box representing a module.

The module will take in an input X, which is a queue (could be a message queue or anything) with many messages coming in. The module must be able to handle the high throughput, which could be 100 messages per second or 1000 messages per second.

The module has another input Y, which is an XML file containing business rules. This input can be read only once, when the module initializes.

Based on the business rules read from Y, the module will apply the rules on the message from X, then put the message into either output A or B.

I was asked to design such a module on the spot. My question is: what kind of answer am I supposed to give? Was the interviewer expecting some class diagrams or sequence diagrams? Or was he expecting some architecture drawings?

And how is this going to address the high throughput problem?

Note that the interviewer said that this has nothing to do with J2EE.


SCJP 1.2, SCWCD 1.4
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

I was asked to design such a module on the spot. My question is: what kind of answer am I supposed to give? Was the interviewer expecting some class diagrams or sequence diagrams? Or was he expecting some architecture drawings?

Did you ask him/her? Most questions like this in interviews are not posed looking for a complete solution but to find out how you approach the problem.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Edmund Yong
Ranch Hand

Joined: Nov 16, 2003
Posts: 164
No, I didn't ask him exactly what he wanted. He was also in a hurry to finish the interview, and wanted me to quicky give an answer. I only said that this would require multiple threads. Then he asked me how many threads would be required, and I couldn't answer that. I mean, how would I know? This would depend on factors like the CPU speed.

Do you think that my answer on the multiple threads is at least in the right direction?
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


I only said that this would require multiple threads. Then he asked me how many threads would be required, and I couldn't answer that. I mean, how would I know? This would depend on factors like the CPU speed.

That might be the point he was hoping you would make: how could you know? I couldn't be 100% sure of his motives, but when I ask techie questions in interviews I'm not neccessarily looking for a 100% complete an correct answer, I'm more interested in the candidate's approach.

What he could be looking for is a way to design it so the number of threads is configurable and controllable. He could also have been looking for you to show some awareness of load balancing and what to do with unservicable requests. He possibly wanted you to ask how the system should behave in the advent of a failure and talk around the possibilities there.
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 861
That's a tough interview question. Sometimes questions are simply asked to see how you can answer things that really can't be known. The worse thing you can do is to act like you know something you don't.

In this case to talk about throughput in any detail without knowing about possible bottlenecks would be impossible. He might have been seeing if you mentioned that you would have to measure performance. I would probably answer such a question by first stating all the types of things that may effect performance and that should be accounted for, and why i couldn't totally design it until i knew these things. Then maybe follow up with something like under the following assumptions this is how i would design it. Also, you could even ask if he was looking for a specific answer or are you after my approach?


http://www.jamonapi.com/ - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
Peter Lawrey
Ranch Hand

Joined: Dec 21, 2008
Posts: 62
This sort of question can be a prompt to talk about relevant experience you have. You can say something like; I worked on a project like this, or I wrote this myself as an exercise and it did this etc. You could talk about how many threads worked for you.
IMHO The question is to see what you have learnt from experience and how you apply it, rather than what questions you might know the answer to.

My answer would be something like;
In the projects I have worked on high throughput means 10K/s to 100K/s so I would be tempted see if I could drop the queue and just process the messages in the source thread(s). i.e. no extra threads.
If the processing is too expensive to do this, you might use one or more threads but having more threads than you have CPUs is unlikely to be useful as it doesn't give you any more CPU power.
If the processing requires an external resource such as a database or a file to make the decision, you could cache this data to improve efficiency.

Note: One of the things I do is challenge the assumptions made in the question. High throughput doesn't mean the same thing for everyone and a queue might not be required. I can do this because I have worked on a project where I removed two queues (and two threads) to improve efficiency.
Don Solomon
Ranch Hand

Joined: Jul 20, 2008
Posts: 48
Possibly a quick sketch of a producer/consumer pattern would have all you needed. And note along with that stating the pattern is bounded to consider cpu resources.


Software development is an exercise in thinking not coding.
Edmund Yong
Ranch Hand

Joined: Nov 16, 2003
Posts: 164
Hi Don,

I had thought of the producer/consumer pattern during the interview. However, the messages in the incoming queue can stay as long as they want. So there is no urgent need to quickly grab the messages off the queue before they disappear. Therefore, there is only the consumer -- the thread that takes the messages from the queue and then process them.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Design for high throughput