File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes Interview ques Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Interview ques" Watch "Interview ques" New topic
Author

Interview ques

Sri Desi
Greenhorn

Joined: Oct 31, 2011
Posts: 1
I have been asked this question in a couple of interviews: "A java program reads records from the database and processes it. It takes about 6-7 hours to complete the process. How can you improve the performance and decrease the time to 10-15 minutes". any help would be a great help.
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
Perhaps use multiple threads. A thread reads the records from database while another thread processes it. something like producer -consumer.

But lets wait for the "Expert" answers
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

I think this is a trick question, assuming you didn't get any additional information. You don't know the cause of the performance problem, you don't even know whether there is a performance problem (you only know that someone claims that the process can be made 30 or 40 times faster). In real world, you need to find out what the bottleneck is and then offer solutions, if there are any at all.

If multithreading really was the correct answer, their box would have to be equipped with some 30 or 40 cores, probably more. This is still quite uncommon configuration. Moreover, the database or network might not be able to serve the records fast enough to keep forty CPUs busy all time.
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4372
    
    8

I think Martin's right. I'm no expert on performance (or interviews ), but I suspect what a sensible interviewer would be looking for is an explanation of how you'd go about solving the problem - starting with profiling it to track down exactly where the bottleneck is.

After you've explained your approach, there wouldn't be any harm in following it with "here are some potential causes that I wouldn't be surprised to find, and if they turn out to be true, here's what might help" (e.g. finding the same query made many times with the same parameters - cache the results). Which might also help you out with a less well-informed interviewer who thinks there is a correct answer.

Chris Bicnal
Ranch Hand

Joined: Aug 17, 2005
Posts: 78
    
    1
How about adding an index (or indexes) to the data you're reading - this would improve performance!

What's interesting about this question is that they give you the time frame they want to get the process to. If they were after a general 'how would you approach this problem' type of answer I'm not sure they would bound it with numbers like they did.

I could be wrong though....

Chris
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
This is a silly question and a silly post in my opinion. Probably posted by a recruiter looking for answers to an interview question they may want to ask
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Chris Bicnal wrote:How about adding an index (or indexes) to the data you're reading - this would improve performance!


This is one approach that might work. Of course, there may already be proper index or indexes on the data, so it won't help.

I can make any program run in half the time (twice as fast). But the first step is to find out where it is slow, rather than making wild guesses.
Rohan Dhapodkar
Greenhorn

Joined: Jun 27, 2011
Posts: 23
JDBC Performance :
One point many developers forgot while using plain jdbc or even hibernate (which internally uses JDBC API's) is properly setting fetchSize on JDBC Statement, improves throughput considerably (about 10x). Default fetch size is 20, which means in single network call, JDBC driver will fetch only 20 records and next 20 records in another network call. What if i am reading millions of records in those 4-5 hours ? Increasing fetch size from 20 to 500 or even 1000 will reduce IO/Network calls by 250 or 500 times. This will definitely need more memory for store those 1000 records in memory but i can ignore them for throughput.
Above mentioned point is definitely worth trying. Another interesting point is fetching CLOB/BLOB in resultSet. Using CLOB/BLOB in resultSet will force JDBC driver (alteast about oracle I know) to fetch only one record per network call. Hence fetching BLOB/CLOB will result in more network calls.If BLOB/CLOB is not used, remove them from resultSet and you will see performance improved.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Rohan Dhapodkar wrote: properly setting fetchSize on JDBC Statement, improves throughput considerably (about 10x).


This is an interesting idea, so I thought I'd check it out. A quick google search shows that the setFetchSize() parameter's default value is implementation specific, and is different for MySql, Oracle, SqlServer, etc.

It is also clear that one can't reliably make such broad generalizations, as a large value may result in the DBMS returning so much data that your program has to garbage collect constantly.

I wrote a simple test program to test a fairly large MySql database that I'm working on for my day job. A snippet of the code looks like:



As you can see, it loops, and calls a random number, and switched between using a setFetchSize() parameter of zero (use default), or 1000 or 10,000 a third of the time. I use a random selection, repeated, so we can fairly test any caching logic that the DBMS may have.

At least for my testing with MySql on a fast quad processor MacBook Pro, the answers are not obvious.


Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Opps, my earlier test had some bugs in the libraries that I was using. I have fixed that, and re-ran the code. The results are clear, there is no meaningful difference.


I wrote a simple test program to test a fairly large MySql database that I'm working on for my day job. The results of testing with a setFetchSize() of 0, 1000, and 40000 are:



Again, I picked a random number and did one of the three tests. You can see that I processed at least 28 trials yet no matter what value I used, the runtime was within 7 milliseconds with a small standard deviation. These values are so close together as to be within the measurement error.

So my tests with MySql show that there is no advantage, none at all, with settting the fetch size.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Pat Farrell wrote:So my tests with MySql show that there is no advantage, none at all, with settting the fetch size.


Yeah, there's a lot of "simple solutions" out there which people freely pass around to each other. Many of which are... um... of limited utility.

However would it be fair to guess that your tests were done in an environment with MySQL running locally? And that the results might have been different if the server was running elsewhere on the network? I wouldn't bet much money on the latter, frankly, but that's just guesswork, which isn't good practice in the performance-improvements world.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Paul Clapham wrote:However would it be fair to guess that your tests were done in an environment with MySQL running locally? And that the results might have been different if the server was running elsewhere on the network?

Yes, and yes, there is some chance that they would be different if there was real latency between the client and server.

My tests are good enough for me to say: (1) the proposed change is not always useful and (2) I don't bother to test/look at this further.

As always, when there is a real performance problem, I will profile and then see if this helps. But I'm not gonna hold my breath.
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6661
    
    5

Welcome to code ranch Sri.

Like the others have mentioned, the question is quite open ended. You can drill it down further by asking more questions like...

1. How much time is spent on database queries
2. What is the network latency between the database and the application server
3. What sort of processing is done on the data
4. etc etc

That can lead to an interesting discussion.


SCJP 6 articles - SCJP 5/6 mock exams - More SCJP Mocks
Lakshman Arun
Greenhorn

Joined: Sep 24, 2008
Posts: 19
Couple of thoughts

1. Use connection pooling
2. Define proper index on the table
3. Read only required columns (in most of the cases we no need to fetch all the column values)
4. Use the Producer - Consumer model for one set of threads reading it from db another set of threads processing it
5. Check CPU utilisation while running Java program, if the CPU utilisation is very less, increase the number of threads in the program. This should speed up the things. Even after increasing the threads if there is no improvement in the performance take heap dump of the application check if there is any memory leak.
6. Using profilers check which method/part of the program takes long time and address it first.
Lakshman Arun
Greenhorn

Joined: Sep 24, 2008
Posts: 19
7. In producer-consumer model, we can make the consumers running in distributed environment (using JMS worker threads as consumers). This way we can scale the application much better
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Interview ques