The Web App has a functionality by name Bill Of Material(BOM) where the application users can update the information about the structure of a Manufacturing Part. There is no limit on the number of levels in the hierarchy of a Part. The Web App has a report screen where the user can choose an "entity"(well assume that an entity has been mapped to many parts) and ask for the report. The business logic for generating the report for an entity involves identifiying all the parts mapped to the entity and fetching the demand for each part. Fetching the demand for a part involves exploding the entire BOM hierarchy upwards and identifying the dependent demand (In simple terms, identify the parent parts demand and how many child parts make a parent). This process has to be done recursively until the top most parent is reached. This process has to be repeated for all the parts mapped to the entity.
We thought of invoking an Oracle Stored Procedure for exploding the entire BOM tree and identifying the total demand for a part. From our past experience, we guess that the procedure will take more time if the BOM tree has more levels. So, we have decided that to develop a batch job(again an oracle stored proc) which will explode the BOM and populate the data in a temporary table. The report usecase will use this temp table and display the report. But the problem arises while deciding when this batch job should be executed. Ideally the temp tables should be updated whenever the BOM tables are modified. Apart from BOM there are quite a few other scenarios which affect the report data.
This is our design as of now. The Business Classes which serve those usecases which affect the report data will spawn a new thread just before returning data to the servlet that invoked them. The new thread will invoke the Batch Job for updating the temp tables. This way the temp tables will always reflect the latest data. Suppose if the user asks for the report, a check is made to see if the batch job is running. Since we dont want to show inconistent data to the user, we'll show a message stating that the data is being refreshed. Once the batch job finishes, we'll display the report to the user.
There are certain issues that we foresee in this design.
Is it a good practice to spawn new threads in an Web App? Though the response will be displayed to the user and the batch job will be executed as a background thread, there is no guarantee from the spec in the execution order. There is also the possibility of the background thread not relinquishing control to the main thread.
If someone keeps updating the BOM/other usecases which affect the report data continously, as per our design the report will not be displayed. Only when the batch job is not running, we will display the report. Is this a good design?
Are there any flaws in this design? Is there a better design?
Please excuse me for such a long problem statement. I sincerely appreciate any pointers to a better design.
My Special Thanks to Ben Souther for his LongRunningProcess.war, which is the one that triggered my mind into thinking this way
Spawing threads from within a webapp is not without it's risks. I can say, I've seen folks smarter than me (specifically Tomcat contributors) claim that it's a bad idea. I've also read that it's never a good idea to spawn a thread in a program, at all. See: http://www.faqs.org/docs/artu/ch07s03.html#id2923889 by Eric Raymond. All that being said, shared memory and threading are a core part of Java and it is often useful to do this from a webapp.
If you do, be sure you understand the daemon flag in the thread class: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#setDaemon(boolean) and how it will affect your app. There are ups and downs to either setting. One will keep your app and/or container from shutting down or reloading until it's finished. The other could cause your thread to terminate unexpedidly if the app is restarted.
If you have other options such as threading or phantom processing from within the database or even spawning a new process from the command line, you might wnat to explore those as well. I don't know much about it but I think there are some new APIs in Java 1.5 that make spawning processes from a Java app easier.
As to your second question.. That one may do better as a separate topic in the JDBC forum. I don't know enough about Oracle's 'snapshot' or transaction support to give any advice but I'm guessing there are standard patterns or techniques for dealing with this common problem. [ July 29, 2006: Message edited by: Ben Souther ]
Dont understand why your stored proc takes time. Have you tried Select START WITH queries. Definetely START WITH queries reduce lot of time.
Joined: May 11, 2005
Thanks for your response Abhi.
There is already some discussion thats going on in the JDBC forum over this topic. Please share your views about stored procedures in this thread.
Joined: May 11, 2005
This is our current design with respect to the BatchJob.
BatchJobService will be invoked after updating BOM and will be invoked from the respective service class. In future, we might have more batch jobs coming our way. Since we have planned to have on BatchJobService class for al the Batch Jobs, the run method would require a instance variable to decide on the batch job. To avoid synchronization issues, we have decided that the BatchJobService should be told prior to creating an instance what type of batch job should it create.
Have you looked into the daemon flag, mentioned earlier? If so, did you you intentionally decide not to switch it?
Joined: May 11, 2005
Ben, Thanks for reminding me
I initially thought that I'll set it to false before invokign the start() method on the thread so that it is made sure that the batch job is invoked once BOM has been updated. But it seems that unless that batch job is done, the server will not be restarted. This is not fair as the Batch Job is happening only in the Oracle Server.
But if i set the flag to true, chances are there that sometimes the Batch Job will not be triggered. I am confused Please help me out.
Read the API docs in the link I sent to you and then decide: Should restarting the application and/or container be a higher priority than finishing a batch that is still in progress?
Generally, the answer is yes but in some cases, such as the process being the sole reason for the webapp or if the damage done by interrupting the process is more risky than the problems that could happen if the web-server couldn't be restarted right way if a batch is still running.
If reading the docs doesn't make the daemon flag clear, create a small webapp that has code for spawning a thread. In that code, have the thread sleep for a minute and a half or so. Run it and then try to reload your app or the container. Do the same with the switch set the other way. That sample app of mine could be actually be a good test of this.
Joined: May 11, 2005
I read the docs and now am clear with the deamon flag.
For testing my understanding of the daemon flag, I used your LongRunningProcess web app. I set the process time to 100 seconds and tested by setting the flag to true/false.
When the flag was true, invoking Tomcat's Shutdown.bat undeployed all
the applications from the container and shutdown the server as expected.
When the flag was false, invoking Tomcat's Shutdown.bat undeployed all
the applications from the container, but the thread was still running. Only when the thread finished execution, the server was shutdown.
The results were same as expected but i thought the application will not be undeployed until the thread completes, which failed to happen.
Since our batch job is going to be highly critical, we have planned to run it in a non-daemon thread. Anyway, our production server is not going to be restarted frequently.
By the way, the server was shutdown irrespective of the flag if i typed CTRL+C in the server console instead of invoking Shutdown,bat
Can you use "database triggers" on your Bills Of Materials (BOM) related table to fire on any insert/update/delete to immediately update your temp tables and also have a "wait" flag column in your temp table to make sure that updating the same item of your temp table and accessing the report data for the same item do not happen concurrently resulting in stale data being read by your report. If the wait flag for a particular item is set to "true" then implement a retry strategy (re-try after 30 seconds etc) or send a warning message indicating that BOM is being exploded for the item XYZ and try later.
Triggers...Yeah, We thought over it. But isnt trigger processing sequential? I mean, if you have a *before update* trigger on a column, if a procedure updates the column, the trigger will execute and only then the procedure will take control. And the trigger program in our case is a time consuming job. So, we thought this will slow down the current scenario, I mean the scenario which has updated the column/table