hello everybody,
I have a problem concerning "loops in sql queries" and would like to explain it on a short simplified example.
there is one table containing website clicks with the following columns (Microsoft SQL):
- UserID for every user
- Kind of ressource (information, ...)
- Click-Time (Exact timestamp concerning the click, in seconds)
i am interested in clickstreams per "user" AND "per kind of ressource".
i want to perform an sql query for every userid AND (combined) kind of ressource. first, i want to identify for every the "beginning" (code 1), the inside (code 2) and the end (code 3) of a clickstream. a clickstream begins if the user didn't click (in the specific kind of ressource) for more than 5 minutes.
Second, i want to give a unique number for every clickstream.
final table should look like this:
the pseudocode looks probably like this:
i wrote the code in java with two loops - the algorithm inside the two loops is not as easy and needs a lot of comparisons, but the program works as it should. BUT because the table has more than 120 000 entities, it took about 8 hours to perform the task. this is much more time than i will have for such an operation in future). If i apply queries directly in ms sql server it takes only seconds for complicated operations).
Thus my question -
1. how should such an operation be performed?
2. Are there any possibilites to do it directly in ms sql server (two loops, one nested in the other one)?
3. Are there any tools?
I can't imagine that my solution is the fastest one (there are millions or rather billions of comparisons).
(The example is not exactly as my project is, thus don't ask about the logic of the table structure...)
it would be very nice if someone could help me....
Thanks a lot
kilia