File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Threads and Synchronization and the fly likes Threads and DB Access Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Threads and DB Access" Watch "Threads and DB Access" New topic
Author

Threads and DB Access

Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Hi,

I need to write a program that will process records in a batch basically it look for any records being added in a Database table e.g. Requests. If there are any records picks them up and calls a Web Service gets a response and inserts in some other table e.g Response, updates the column in Requests table so next time the same record is not picked up again. Now the issue is I may have over 8000 records that needs to be processed in a reasonable time and am trying to figure out a better solution so that all my records gets processed in a reasonable time, the Web Service may take 10-15 seconds to send the response back. I will have this program to run as a windows service all the time. Just trying to find out if it's ok to poll on the database to check for new records and in case of errors how do I make sure my program that runs as a service keep running. Or what else can be a better solution.

In my scenario my batch process run every night. From there I need to process all the records and send them to a Web Service that responds back and insert the response in a separate table. The same process can be run in a real time on demand i.e. from a UI a user can select the submit button and behind the secenes the same route is taken inserts a record in a request table and gets the response the whole cycle takes arround 10-15 seconds. My concern is in a batch mode when I have so many records that all needs to be processed within a reasonable time. So wondering what is my best way to do this. As the program will run 24x7 all the time as a Windows Service so wondering if I can make use of threads and have 10 threads at one point running now I don't know how to make use of Database Connection within threads and how often to release a connection and gets a new connection.

Any help in this is really appreciated. If using a program to poll on a table are there be any issues with the database connections and using threading if I have 8000+ records what are chances of failure etc.

Just looking for some bright ideas that anyone can share.

Thanks


MB<br />Sun Certified Programmer for Java2 Platform
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

You keep mentioning this "reasonable time" but you don't say what it is. And you don't say how long it takes to process 8,000 records. So first of all you need to know those two numbers.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Reasonable Time is want to process all the records within 4 to 5 hours as if there are 8000 records and if each request takes 10 seconds then processing will takes 8000x10=80000 i.e. 80000 seconds means 22 hrs.

Just trying to find out how can process multiple records in Parallel and also keep track of the errors plus Database connections.

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
It's a pretty simple matter to spread a job like this across a fixed number of threads. You can tune the actual number over time based on experience.

You definitely want the runnables that do all the work to use a ConnectionPool. Just thinking out loud .. you could cause each Thread in the pool to have its own connection. Maybe a reason to extend Thread after all?
[ December 27, 2006: Message edited by: Stan James ]

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Thanks,

Can you please elaborate on:

You definitely want the runnables that do all the work to use a ConnectionPool. Just thinking out loud .. you could cause each Thread in the pool to have its own connection. Maybe a reason to extend Thread after all?


So if I have 10 threads should I have 10 connections. If so how will it affect on the Database side is it okay to have that many connections.

Thanks
Mukesh Mittal
Ranch Hand

Joined: Jan 13, 2006
Posts: 31
You can open 10 connection.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Yes, any database server ought to handle a fair number of concurrent connections. Deciding how many to use is part of the tuning I mentioned. As you add more threads you might saturate the CPU with Java. Or you might get enough connections and insert/update transactions going on to hit the database's limits. Try to scale the thread pool up and see what breaks first or turns into a bottleneck.

Re a connection per Thread ... the Executor lets you specify a thread factory to create new threads. You could provide a factory that creates a subclass of thread, say ConnectedThread, that happens to have a database connection. Then your Runnable could go through the ConnectedThread to get a live connection.
Tony Burleson
Greenhorn

Joined: Dec 31, 2006
Posts: 6
Another possible solution is to use batch loading in conjuction with a threaded solution. If the input data file could be "chunked", each chunk could then be batch loaded from a single thread. This might improve performance by effieciently using the open connection. But, this means that your JDBC driver supports batch loading. I have hit my head on that brick wall a few too many times.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Hi,

Happy New Year to all.

I am trying to build a program that will run constantly 24x7x365 all the time. Following code is just a test case that I am trying to make sure
works and that I can take it as a template. Just need some feedbacks how can I improve this and what are the drawbacks or points or issues that
I may run into and how to handle those or should I take some other approach. How will this approach effect on the performance or CPU utilization
and how can I make it better.

Here is a sample code for Threads I got from http://www.informit.com/articles/article.asp?p=30483&seqNum=6&rl=1.
I modified a little for my test use. Basically it polls on a table and picks 5 records a time then create 5 threads that
calls a Web Service that takes arround 20 seconds to response back and then I update the table with the response. Once 5 threads are
completed next 5 records if any are picked-up and so on. If there are no records to pick up I keep polling but every 20 seconds.

Done.java


ThreadPool.java


TestThreadPool.java


TestWorkerThread.java


Any feedbacks are really appreciated.

Thanks
Tony Burleson
Greenhorn

Joined: Dec 31, 2006
Posts: 6
Hi Mike,
I am sorry I mis-understood your previous post. I have brain dump of questions about your situation. Essentially, I am trying to find the pressure points to push.

A message comes into the system [T0] (i.e. saved to the REQUESTS table). On the next polling cycle, the message is �gathered� up with any other new messages [T1]. Those messages are then sent off to a webservice (one at a time?) which takes about 10-15 seconds for a response [T2]. The response from the webservice is then saved to the database [T3].

T0 means time zero
T1 means time one

T1 � T0 = ? seconds
T2 � T1 = 10 to 15 seconds
T3 � T2 = ? seconds

Total Time (T3 � T0) = ? seconds (min, max, average)

What is the rate of records coming into the REQUESTS table?
How many records can the web service accommodate per request? 1, 10, 1000?
How many requests can the web service accommodate at one time? 1, 10, 1000?
At what rate are you polling the REQUESTS table?
Do you have the control or the access rights to intercept when a message is saved into the REQUESTS table?

From the behavior you were describing, the first table, the REQUESTS table is acting like a Message Queue. From your main concern seems to be the webservice time cost, not necessarily the connection and threading issues to your internal database. If we assume that it would take 0 time to do the create and update transactions internally, would the total time still be significantly close to the Total Time above? In an ideal world, how can you maximize your utilization of the webservice within the constraint of messages coming into the system? For example, could you queue the first 50 in memory and then do a webservice batch job? Would it be more efficient to process a message the moment it came into the system (before it is saved to the database)?

I hope you can correct any misinterpretations.

Thanks!
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Thanks for the feedback. Here is some more info:

1. On nightly bases every night on database side a job runs and may insert any number of records i.e. 8000+. Also users can explicitly create a record in a request table while working through UI and in a second 1000 users may put 1000 records in a request table. So at any given point there may or may not be any records.

2. In each request Web Service can only handle one record a time. I can't bunch multiple records so always have to send one record per web service call.

3. At any given point Web Service can accomodate over 100 requests.

4. I am polling the table for new records every 10 seconds. But if there are some requests being processed I wait for all the threads to finish before it polls for new records. But if there are no records in a table I still poll every 10 seconds and want to reduce this to 5 seconds.

So wondering how many threads can I have at any given point so performance wise, cpu utilization point all is acceptable. Plus in case of errors or exceptions what should be my route.

Also when making a web service call I am passing in a large xml and gets back a large xml that I have to process and save it in the database. Just wondering if connections to the database and threading is ok, is it fine to poll on the database so often and can I have that many connections open/close, how will it affect the database and is it going to increase the cpu utilization etc.

Any help is really appreciated.

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Are you required to meet a service level expectation? That is, does any request have to be processed in n seconds? Or would it be ok to run some minutes behind in peak hours and catch up in off hours?

At a major rewrite level, see if you can use message queuing. It's designed for just this kind of thing and solves a lot of problems. If you can't use that, you'll have to solve most of those again in your own code.

Here are some major topics to think about ...

1) Schedule the database queries. Java Timer is fine for this. You can make it run n minutes after the last run completed. So if you're set to run every minute and one run takes five minutes Timer will schedule the next run one minute after the long one finishes. I'd go for the longest interval I can get away with ... make sure you know what the SLA is.

2) Query the database. You'll need to make sure that any row gets picked once and only once. Maybe query and change the status from "available" to "in progress" in a single transaction.

3) Execute the remote operations. Here's where I'd get into threads to execute some number of these in parallel. We can't guess at the number of threads that will help you and the number that will thrash and get nothing done. I hope the other load on the server is relatively steady, or you may have real trouble tuning this.

4) Update the results db and change status from "in progress" to "done". In the worker thread as the remote operation finishes.

5) Handle errors. What if the remote operation or db update at the end fails? What if the server crashes and leaves a lot of stuff in "in progress" status. What happens if you execute the remote call twice for the same row ... disaster or nothing?

Do those topics make sense? Which ones have you figured out, which still give you trouble?

PS Another option in the timing and querying stuff ... We have something that works much like this. When it runs it doesn't do just one query. It does the whole query-process-update cycle in a loop until it gets no rows. Only then does it go back to sleep for a while. That cuts down the delay if it's running behind.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Thanks for the feedbacks,

The reason I am not using message queues is due to retry as using the current approach and resetting the flag in Request table makes that record to be processed over again. As in batch mode a sql stored procedure is called when a job runs and inserts records in a Request table where the java program polls and picks them up. If want to process any record it's easier to just reset a flag in a table and the record is processed again rather than starting all over and queuing a message.

Each record is picked-up once as when a response is updated the record in request table is also gets updated at that time so next time it's not picked-up again. Now if a web service fails or any thing goes down restarting the java service starts picking up the records again and so this will eliminate lots of records being "in progress" state.

Just trying to figure out if I keep getting some kind of errors in the threads how to detect that in the main loop so can send some kind of notification to Admin rather than sending notifications in the threads.

Thanks for any feedbacks
Tony Burleson
Greenhorn

Joined: Dec 31, 2006
Posts: 6
Hi Mike,
I think Stan has hit all the right points. Especially steps 1 - 3.

Here is what we know:
1) around 8,000+ records are created nightly (I assume in a batch)
2) users can create 1000+ records in an instant at any time of the day
3) the webservice can accommodate 100 simultaneous requests to process one record


The need is :
1) to query and update the database without running out of database connections (Connection Pooling is key here)
2) a way to get the new records and change their state in one transaction. (Stan�s point #2)
A stored procedure can do this where you can select say, 300 records that will update their state to �in progress� and also set a timestamp of the requests. This timestamp will allow for subsequent records that have �timed out� to show up to be dealt with. They can be processed again, for troubleshooting purposes, auditing purposes, etc.. You would handle the failure conditions here (along with notifications if that is desired).

3) to process 100 records (100 Threads) simultaneously and not tap out the systems memory nor the cpu (this really shouldn�t be so much of an issue due to the cheapness of hardware nowadays). I assume the webservice time of 10-15 seconds accommodates the transmission time so it will have to be in the low MB range at most (How large is the XML anyway?).

4) save the results from the webservice. this can have a trigger associated with it where it updates the request record�s state to �done� (I also assume this can be done in a batch instead of a per record basis for performance considerations).


Given what we know, here is the calculations of what should happen for maximum efficiency:
100 records or threads = 1 unit of work
(8000 records)/(100 threads) = 80 units of work.

Each unit of work time = 15 seconds for webservice + 10 seconds poll time + @10 seconds for create and update operations = about 35 seconds max

(80 units of work) * (1 unit of work time) = 2800 seconds or @ 46 minutes

This is the pressure point I was looking for. It should take 46 minutes to process 8000 records if everything runs perfectly at full speed theoretically.

Nitty Gritty:

To save on the database connections, I would sacrifice memory. I would set up the database so that when I query for the next set of records (via aforementioned procedure) that the size is up to 3 or 4 times one unit of work and save into memory. That result set would then be stuffed into a queue (specifically, I would use the Apache Commons collection SynchronizedPriorityQueue). Then, some manger, which I will call the RequestManger, will monitor the queue and the Threads. Via a configuration parameter, RequestManger will create a max of 100 active threads to send their work to the webservice. As a thread completes, a new thread is launched by the manger and the completed thread is sent to another queue with the other finished results (assuming it is feasible) until there were enough to make another database trip (say 20?). When that batch of 20 are saved, the database will update those 20 record status� to �done� via a trigger. As the queue starts to empty out, it will make another trip to the database and get another chunk of work. The process repeats until there are no more records and it just polls the database until more work comes in.
It is important that you can change the number of max running thread so you can throttle up or down the system.

It would be good to make each WorkerThread log that states. From the logs, you can then start to evaluate the number of threads your system and the webservice can handle for optimum results.

As this system matures, you can then start to add more sophisticated sleeping algorithms that match the activity of record creation. This will further save on unnecessary database connections.

I hope this helps. I am posting so much on this, because I too am working on a very similar system.
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 417
    
    2

Its always worth looking at any technology your database provider gives you, know idea which database your looking at it.

I've written a very few multithreaded, very fast database crunchers and the last time I did toy with the idea of playing with Oracle's JVM native to the database (other databse vendors do exist :-) and may have equally devious solutions ), because then in theory you can trigger your Java (no polling) and in theory you can trigger your java on record change and possibly make your request a web service as well. In theory the database would then be generating your threads\ pooling connections behind the scenes and you could ignore the problem (yippee :-) ), it would also optimise your SQL (as there isn't any :-) ) etc etc i.e. your effectively palming most of your problems off on the database in my case I could just tell the DBA to optimise it ;-) in theory it would be a lot easier to optimise. Assuming you want a remote external dabase you'ed have to look at what security model is in force i.e. I've never looked at if you can actually call out from their JVM (though it looks like you can from a quick google), their JVM is rather unique ;-)

I suspect there is a big gotcha in this somewhere which is why I shyed away and used the more traditional route (I got my system preforming faster than necessary using the basic threading approach). I know for instance you can't multithread within any individual 'process' (you can create threads but they'll run one after another) or in your case request but my understanding is each request would be in its own thread i.e. you would have threads but not ones you create which is ideal. In theory this would be a lot simpler and faster (but note the word theory) note also your tying your self to a database vendor and one that in this case ain't cheap.

Anyone tried this, I 'ed be quite interested to hear if this would work or any obvious gotchas I wouldn't at all be suprised to find one or loads :-)


"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

There is no point in spawning more threads for more processing unless you have more CPUs. You can spawn threads to prevent the app from blocking while you are waiting on a resource such as the DB connection. But thats about it.

Are you on multi CPU/core system?
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Just trying to figure out if I keep getting some kind of errors in the threads how to detect that in the main loop so can send some kind of notification to Admin rather than sending notifications in the threads.


Is there some issue with having notification sent from the other threads? I wonder if you're concerned about where the code is written rather than which thread runs it. Or maybe looking for a common place to collect all the messages? You might even be able to build some intelligence to say if I get more than n errors per minute the remote partner is probably bad and I should shut down entirely.

The Runnables that are out there doing work on the various threads can all have a reference to a common error handler. When they call methods on the handler the code will run in the Runnable's thread, not the main thread, but I think that doesn't really matter. Or does it?

BTW: We're all over the place guessing what is giving you trouble right now. Help us understand what bits you are comfortable with and what is still bugging you.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Hi thanks for the feedbacks.

I am curious if there will ever be a scenario where any thred never returns back and the program gets hangs. How can I make sure program does not hang or there is no deadlock issues. How to make sure all the threads does come back and never goes into a hang state.

How can I incorporate a common error handler in the above code.

Is there a way to get the datasource connection from connection pool even though it's a standalone program.

The xml size for the request and response is pretty big. When I get the response have to process the xml and safe it into mulitple tables.

Plus I am not on a multi CPU/core system.

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I am curious if there will ever be a scenario where any thred never returns back and the program gets hangs. How can I make sure program does not hang or there is no deadlock issues. How to make sure all the threads does come back and never goes into a hang state.


I don't think I have an answer for this. If a task gets blocked on some non-interruptable IO or something it can be hard to kill it. If you think something has gone too long and you want to give up on it, System.exit() will get you out of the wait. And everything else.

How can I incorporate a common error handler in the above code.




Is there a way to get the datasource connection from connection pool even though it's a standalone program.


There are connection pool packages out there for standalone programs. I haven't used any of them; maybe somebody will pop up with a favorite. Or you could just allocate one connection to each thread so the thread pool is the connection pool. You can do that with normal JDBC code, no con pool.

The xml size for the request and response is pretty big. When I get the response have to process the xml and safe it into mulitple tables.

Plus I am not on a multi CPU/core system.


XML handling is mostly CPU intensive which makes it a poor candidate for multi-threading. On the other hand, your database updates and the 15 second wait for the remote system are good candidates even on a single processor. Take care that the database updates don't lock each other out or they're right back to single threading. If that's unavoidable maybe do the updates through yet another queue and thread so the threads in the pool don't wait.

See also the APIs on the thread pool executor. You can ask how many tasks are still active or get the queue and ask how many tasks are still in queue all so you don't shut down while one is running. Read up on Daemon threads; you can implement ThreadFactory to set the threads up any way you like them.

BTW: Your code samples pulled from the resultset and populated a collection, then iterated the collection to do the work. You can skip the collection and submit a task per row directly to the thread pool as suggested in the little snippet above.

This is a neat task. Are you still having fun?
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Thanks,

Regarding Database locks it will never be a case as each thread will always execute a unique record.

The only xml I'll be processing will be the response as the request xml is stored in a table as a clob and all I'll do is read the clob and pass it on to the Web Service. The response I get will be processed and staged into multiple tables.

I am bit concerned over if one thread gets hanged due to any reason then nothing proceeds until the thread dies or is done. Is there a way I out a timer that each thread has to be done within some time period else I just kills this so atleast the program continues and in the next run the record that hanged will be again picked-up. Any thoughts how else I can take care of this.

Thanks
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Any feedbacks. Thanks
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
I am bit concerned over if one thread gets hanged due to any reason then nothing proceeds until the thread dies or is done. Is there to set a time for each thread to return else forcefully stop that thread instance to continue so atleast the program continues and in the next run the record that hanged will be again picked-up.


Any feedbacks on how to handle the above case.

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
If one thread out of a pool hangs forever, you're down one thread but the others can keep working. How bad is that? How often does this happen? What would you do if it happened in a single-threaded program?

I suppose you could keep a collection of references to worker runnables, iterate through them on a Timer, see if any have taken too long. If a worker is hung you may well have trouble killing it, so about all I can think of to do is add another thread to the pool.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Thanks,

It does not happen that often and I haven't seen this happening but just to handle this scenario if it does happens. Right now the way it works is I create 5 threads and then wait until all these 5 are done and then create next five and so on. Currently I am thinking of using a timer that each thread need to finish within 2 minutes else I kill the thread and continue.

As I am processing 5 records a time if any record from database is not processed in time due to thread lock it will be picked-up in the next iteration.

I will have 6 DB connections opened at any given time. The connection in each thread are open and closed before return. While the connection in main stays open all the time as long as there are records in the table. My concern is if I have to process 5000+ records then this connection stays opened for probably couple of hours. I don't know if it's a good idea to have a connection opened for that long or should I close it oftenly and get a new connection. Any ideas on this.

Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
You can indeed check and see whther a runnable finishes in 2 minutes or whatever. But if you find it hasn't finished then "kill the thread" will be a problem. This has been discussed many times in this forum but often there really isn't a good way to kill a "hung" thread. If a thread is blocking on some operation that is not interuuptable, we're just outta luck. I'd mark the task failed, forget that thread and make a new one.

Your proposal seems to be:

If you're in JDK 5 or later Future would probably be better than join. Now we're down to what action to take if a task doesn't finish in time. You may or may not be able to "kill" it.

This makes a thread per database record which might turn out to be a problem amount of setup and teardown on threads. A thread pool would eliminate that problem but I'm less sure how we'd check on the completion of tasks in time and still not sure what we'd do about long tasks, if anything.
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
I am using jdk 1.4. And as this process will run constantly 24x7 so just looking for any work arround to get out of the thread gracefully in case if it hangs. As if any thread hangs then basically the whole program will stop until the thread is done as in my main program I query the table get 5 records a time and am creating 5 threads that run in parallel, when these five threads complete I get the next 5 records and so on. So don't know what to do in the scenario where one thread gets hangs.




Each thread is basically making a call to a Web Service and waits for a response and then inserts the response in the database tables. Web Service usually takes less than a minute to come back with a response and then do the parsing of the response and insert data. I don't know if there can ever be a case where these tasks are blocked foreever that inturn makes that thread blocked and in-turn the whole program getting hanged. Any feedbacks are appreciated to handle this.

Thanks
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Threads and DB Access