File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Threads and Synchronization and the fly likes Threads and connection pool Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Threads and connection pool" Watch "Threads and connection pool" New topic
Author

Threads and connection pool

Hanna Habashy
Ranch Hand

Joined: Aug 20, 2003
Posts: 532
hi all:
According to my understanding of Threads and Connection pool, I have this theory, which I don't know if it is correct or not.
Assuming a single proccessor machine, only one thread can be excuted at a time. Threads alternate, but only one thread can be excuted at one time. For simplicity, consider a web application with one servlet. Inside this servlet there is a single method that can access the database.
Here is the question:
1- will it be more effecient to creat a single connection object to the connection pool, in the initilizaion proccess of the application, and leave it open, so all thread can access it sequentially?
2- or, it is more effecient to create a new connection and statement objects inside the method, and then close the connection before the method return?
From my understanding of how Threads work, case one shuold eleminate unnecesarry creation of objects, every time a thread ask to access this method, and overall should be better performace.
I did a test on both cases, and I was surprised that case number 2 won. The time needed to create about 5000 connection in 5 threads was much less than in case number 1.
any help in uderstanding what is going on or ideas are appreciated.


SCJD 1.4<br />SCJP 1.4<br />-----------------------------------<br />"With regard to excellence, it is not enough to know, but we must try to have and use it.<br />" Aristotle
Corey McGlone
Ranch Hand

Joined: Dec 20, 2001
Posts: 3271
This is a question better suited for the Thread and Synchronization forum - I'm moving this post there.


SCJP Tipline, etc.
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Database queries tend to have to access the disk. Since disk is slower than memory, this is likely to be the limiting factor to speed.
With only one connection, each thread has to wait until the previous disk access completes before it can start its query. With multiple connections, things like the database setting up the query plan for later queries can proceed in parallel with the earlier queries' actual disk access. In addition, the disk cache may be used more efficiently with multiple queries in parallel. That's one thing that might make multiple connections more efficient even on single processor machines.
However, with multiple connections, you may need to pay a little attention to database transaction integrity to avoid corrupting the database.
Hanna Habashy
Ranch Hand

Joined: Aug 20, 2003
Posts: 532
hi Warren
Thanks for your replay. Your explination makes sense to me. So, if we have a disk that as fast as the main memory, one connection should be suffecient??
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Well if there is only 1 cpu, sequential execution will be faster than simulated concurrent execution. Especially considering the multi threaded approach involves opening and closing a connection several times, vs. just once for the single threaded model.
Your results could be related to your OS's thread scheduling model. For example, say you have 100 threads active on your computer. 1 of those is your program. Your thread may get 1/100 of the overall time. But if you have 5 threads, you could get 5/100 of the cpus time. Which would make your program run faster overall because its getting more cpu time.
This is cheating
When I was at the university I tested lots of code for my professor. He would give me some time late at night where he would end many of the system processes on a particular computer so my results would make more sense.
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Originally posted by Warren Dew:

With only one connection, each thread has to wait until the previous disk access completes before it can start its query. With multiple connections, things like the database setting up the query plan for later queries can proceed in parallel with the earlier queries' actual disk access.

No, the problem was qualified to be on a single CPU computer, so there is only simulated parallelism, nothing can actually be done in parallel.
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Originally posted by Hanna Habashy:
Thanks for your replay. Your explination makes sense to me. So, if we have a disk that as fast as the main memory, one connection should be suffecient??

If we had a disk as fast as the processor cache, yes, I would expect one connection to be as fast - or faster because we save on connection setup time. Databases are pretty complex pieces of software, though, so I could conceivably be missing something.
CL Gilbert - the thing is, the thread that's waiting on disk access doesn't need even the one CPU - it's blocked on the disk controller. At that point, the CPU is idle unless there's another thread to take advantage of it.
Warren
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Originally posted by Warren Dew:

CL Gilbert - the thing is, the thread that's waiting on disk access doesn't need even the one CPU - it's blocked on the disk controller. At that point, the CPU is idle unless there's another thread to take advantage of it.
Warren

Right, but the other threads are trying to do the same thing. So they will all be blocked too. And I dont think you want to spawn an extra thread to take advantage of the small period of time the disk is spinning up.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[Warren]: the thing is, the thread that's waiting on disk access doesn't need even the one CPU - it's blocked on the disk controller. At that point, the CPU is idle unless there's another thread to take advantage of it.
[CLG]: Right, but the other threads are trying to do the same thing. So they will all be blocked too. And I dont think you want to spawn an extra thread to take advantage of the small period of time the disk is spinning up.

Well for starters, the other threads will often be able to get all their non-IO processing taken care of during the periods while all IO processing is blocked waiting for the controller. More importantly if we allow multiple threads to register their IO needs with the disk controller, the controller gets a chance to optimize its movement and cache usage in a manner to satisfy all requests in the most efficent manner possible. E.g. if I have a single truck full of widgets and I need to deliver widgets to many different customers throughout the city, I probably do not want to simply deliver each widget in the order the orders were received. That leads to a lot of cross-town trips. Instead I'd like to look at all the orders received so far, find all the orders near where I am now, and deliver them first - them move to another nearby area, and deliver everything requested from that neighborhood, etc. Making sure I eventually cover the whole city, but minimize the cross-town trips. A hard disk controller can do something similar I think.


"I'm not back." - Bill Harding, Twister
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Disagree.
There will already be an OS level thread spawned which is going to have to control loading the data from the file into a memory image. This will already be taking place in the background while your first thread is blocking. So your other threads will be sharing CPU time with that thread which likely has a much higher priority.
You can spawn threads for responsiveness on a single CPU system, but not for performance. If you do manage any performance gain it will be due to flaws or compromises in the OS thread scheduler.
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Originally posted by Jim Yingst:
Well for starters, the other threads will often be able to get all their non-IO processing taken care of during the periods while all IO processing is blocked waiting for the controller.

Yes. A little arithmetic may be in order here.
Typical disk drives today spin at 5400-7200 rpm, or somewhere around 100 rotations per second. That's 10 milliseconds per rotation; on average, it takes half a rotation to bring the data under the read head, which is 5 milliseconds, or alternatively, 5,000 microseconds or 5,000,000 nanoseconds.
A typical CPU today runs at a clock speed in excess of 1 GHz (1 cycle per nanosecond) and can dispatch multiple instructions per clock cycle. Some of those instructions are speculative, but I think it's reasonable to assume that we get around one useful instruction per clock cycle. That means that in the time the disk platter spins half a rotation to bring the data under the read head - and this is ignoring actual reading - there's time for the CPU to execute 5,000,000 instructions!
Granted, one line of code typically translates to more than one instruction - maybe as much as 10 instructions in Java, since the byte code has to be translated to machine code - but 500,000 lines of code is not insignificant, and is far more than needed to service the disk read. Why not take advantage of some of that time to service another thread?
More importantly if we allow multiple threads to register their IO needs with the disk controller, the controller gets a chance to optimize its movement and cache usage in a manner to satisfy all requests in the most efficent manner possible.

Yes. For example, it might be able to schedule reads so that it can service more than one read during a single disk rotation - which can speed things up quite a bit.
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

I don't see how this condition can ever exist. Lets take disk reading out of the equation as a well written program would have.
The question was why multiple threads appeared to perform better than a single thread, even when those threads are running sequentially.
I started with the assumption of well written code. This means that the single thread should have processed all available data before it went to the resource. In the case of multiple threads, though the work is divided, there is still the same finite amount of work which can be performed before you need the resource. So perhaps you divide your job into multiple threads, but they are just doing the same amount of work the single thread had to do. Without parallelism. This, as the original poster indicated, should make the multiple thread approach take longer.
My contention is that it may not take longer because the time your process gets on the CPU may be increased simply because you have more threads.

[ March 11, 2004: Message edited by: CL Gilbert ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I'll just site an example. I have a little program that downloads log files from a series of web servers. There are many numbered like log.001, log.002 etc. When I spin the downloader class into more threads, my network use goes way up. The servers have different hardware and network connections, so I haven't tried to see if I get something linear like bytes-per-second * threads but it's surely better than running one at a time. This code is probably ideal for threading - it spends a lot of time waiting for HTTP GET and some waiting for disk write. A CPU intensive program, computing pi for a zillion places, would be a very bad candidate for threads on a single CPU.
Orthogonal topic: Has anybody read about pipelining as an alternative to threading? Say you have 100 requests come into a web server. Create a command that does the first little bit of processing and queue up 100 of them. Some component does all 100 of them and then queues up a new command for the next component. It does the 100 commands and queues up another command for the next component. I read a PhD thesis online and his theory was that the processor could fetch and optimize a tiny bit of code and get a benefit from running it many many times. With threads the processor is jumping around from one bit of code to another, throwing out all of its optimizations to make room for the next. He claimed it degraded more smoothly and lost fewer requests at very high loads approaching saturated CPU. Sounded like any number of university types are pursuing it.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

I think those are good ideas. But they dont have anything to do with threading. They are on the compiler optimization level I would expect. Especially since each CPU has its own pipeline.
 
jQuery in Action, 2nd edition
 
subject: Threads and connection pool