I am fetching 40 million rows (data rows) from database table or flat file. I am processing each row for groovy
evaluation by creating one worker per row (so in this case i am creating 40 million worker).
Here I am using round robin pool of AKKA. Is this approach correct ?? If not what is the best way to do it.
I'm assuming this is just some hacky code to test it, since I would expect the setup to be in a config file somewhere (eg the number of Actors for your round robin).
But other than that I'm not sure what we can really say.
With Akka you're really looking at identifying bottlenecks, and that can only be done by actually running the thing and using the various tools to spot issues.
However, I'm not sure what would be cured by using something other than a round robin.
Hi Dave. Tough it is a sample program the logic i am going to use will be same.
I just wanted to know is it advisable to use the following logic ?? meaning using round robin and 40 million workers. Please help me decide.
Well, you won't have 40 million workers.
You'll have 4 workers (I would still put this in a config rather than in code).
Well, 4 active ones anyway.
The rest will be queued up.
Assuming your Work class isn't huge (and that will be based on the data in a row) and you have a reasonable box to run it on (again, this will take a bit of monitoring on your part) and the work is pretty quick (again, monitoring etc) for each row...
It's the sort of thing Akka was built for.
How many cores does the target machine have?
I'm guessing 4?
I know it's the recommendation (actors to cores), but you might want to try doubling it and seeing what happens. This all depends on the work load for each actor, but I have seen it improve throughput. Again, it comes down to actual numbers, and that means monitoring.