This week's giveaway is in the EJB and other Java EE Technologies forum. We're giving away four copies of EJB 3 in Action and have Debu Panda, Reza Rahman, Ryan Cuprak, and Michael Remijan on-line! See this thread for details.
i am writing a program which gathers prices from several websites. From this gathered data i create a weighted avarage price.
Currently, i am just asking all sites for their price, creating a weighting factor and do the avarage price. Creating the avarage price and the weighting factor though takes a whiles, which now is a problem:
Some sites offer a socket, where i am being noticed when something changed. Now, with every little information, the whole calculation would be done again, maybe even sevaral times in a single second. I could make the method synchronized, this i already found out. But now it can happen, that while calculating the new average, new information comes in, waits for the calculation, and while waiting, another thread instance wants its new information to be included into the avarage.
Do you understand what i mean? I fear that calculating takes more time than i get new information, which would create and endless queue at the method.
Has anyone expirience with this kind of problems?
I think I understand. You have a collection of data (prices), a calculation operating on this data that takes some time (calculating weights and average price), and multiple concurrent workers that are updating the data.
It seems to me that what you need to do is to copy the data before you start a calculation and perform the calculation on the copy. You can start a new calculation every time an update arrives, but note that every calculation holds a copy of the data, so if many updates arrive, your application can run out of memory. You have to limit the number of calculations running at the same time. I don't know how long your calculation takes, but a first step would be to run only calculation at the same time. When it finishes, you start a new one if updates have arrived.
The other option is to hold all updates of the data until a calculation is done, but this seems like a bad idea. This option implies that you have to queue incoming update information and apply the queued updates when a calculation has finished. While you're applying the updates, new updates can come in... When do you stop?
Paul Balm wrote:The other option is to hold all updates of the data until a calculation is done, but this seems like a bad idea. This option implies that you have to queue incoming update information and apply the queued updates when a calculation has finished. While you're applying the updates, new updates can come in... When do you stop?
I think this is actually a viable idea. Create a synchronized queue (that is, one from java.util.concurrent) to hold all received updates. As soon as calculation finishes, drain all items that meanwhile arrived into the queue (that can be done quickly), apply these updates and recalculate. If the queue is empty after calculation, wait for first update to arrive (easy) and again update+recalculate. Don't fetch/apply updates that arrived into the queue while you've applied the previous ones (this answers Paul's question "when do you stop").
This could be the preferred option if copying the data was expensive.
The calculations and updates would of course run in background threads. The calculated results would be handed to another thread (EDT in Swing application, for example) to be displayed to the user. Therefore, the application will remain responsive even if the updates are being processed constantly. This has to be done even for the other solution (the copy-data one) suggested by Paul, of course.
Also keep in mind that given the unpredictability of GC, it is generally not possible to guarantee responses of Java application in sub-second intervals.
Joined: Dec 13, 2008
I think to decide between the different options, more details are needed:
how much data is there? (to know how much memory multiple copies would require)
how long does a calculation take approximately?
how often do new updates come in?
If you have an idea of these figures I think it would be clear which of the solutions is best.