I have a Spring-MVC, Hibernate Web app. An admin user can send in a request to process nearly 2 lakh records (each record collected from various tables via joins). Such operation is requested on a weekly or monthly basis (OR whenever the data reaches to a limit of around 2 lakh/1lakh). On the database end, i am correctly implementing batching.
PROBLEM: Such a long running request holds up the server thread and that causes the the normal users to suffer.
REQUIREMENT: The high response time of this request is not an issue. Whats required is not make other users suffer because of this time consuming process.
MY SOLUTION: 1. Implementing threadpool using Spring taskExecutor abstraction. So i can initialize my threadpool with say 5 or 6 threads and break the 2 lakh records into smaller chunks say of size 1000 each. I can queue in these chunks. To further allow the normal users to have a faster db access, maybe i can make every runnable thread sleep for 2 or 3 secs.
Advantages of this approach i see is: Instead of executinga huge db interacting request in one go, we have a asyncronous design spanning over a larger time. Thus behaving like multiple normal user requests.
Can some experienced people please give their opinion on this?
I have also read about implementing the same beahviour with a Message Oriented Middleware like JMS/AMQP OR Quartz Scheduling. But frankly speaking, i think internally they are also gonna do the same thing i.e making a thread pool and queueing in the jobs. So why not go with the Spring taskexecutors instead of adding a completely new infrastructure in my web app just for this feature?
Please share your views on this and let me know if there is other better ways to do this?
Once again: the time to completely process all the records in not a concern, whats required is that normal users accessing the web app during that time should not suffer in any way.
Some databases allow resource management, ie. limiting usage of system resources by various processes. If your database supports resource management, I'd look into this option first, as it is designed exactly for the kind of situation you're facing. You wouldn't need to modify client code at all in this case.
Another question is what the processing looks like. Moving as much as possible of the processing to the database will in most cases result in faster, less resource-intensive processing, which might render the load of such processing less relevant to regular users, especially if your actual bottleneck is the network.
Joined: Aug 07, 2010
@Martin: Thanks for reply.
I dont really understand what you mean by limiting usage of system resources by various processes? Are you saying if a multiple batch requests comes from a single process (in my case Admin request to process 2 lakh records) , then it can set some kind of a priority to this request, such that requests from other processes will be fulfilled first?
The resource management means that you can somehow define a resource profile and attach it to a database session, often by assigning it to a database user. The profile can in turn limit consumption of various resources, eg. disallow the processes in a session form consuming more than 20% of the CPU, for example. I know this feature is available in Oracle database.
Unfortunately, a quick search seems to indicate this is not feasible in PostgreSQL. So if the processing cannot be optimized to be bearable during normal processing (as mentioned in my above post), the solution you devised seems sensible to me.