I'm trying to build a MySQL table-based queue. Basically imagine there are a dozen threads all waiting (aka polling the table) to do a job (one record in the table = one job) as soon as they find a record "WHERE available = 1". Each job can take minutes to complete, so a thread can't lock the entire table or all the other threads would block.
What's the best way in MySQL to retrieve a row and simultaneously set a value in the row? For example, as soon as a thread got a job it would atomically set available = 0 so that no other threads could get the job. Is there a way to do this in SQL? I have a couple ideas, but I'm curious what you all think would be the best approach.
What you need is Optimistic locking of the row. You should have a version number column on the row that gets updated whenever you update the row. Also, everytime you update the row, you first get the row again to see if someone else has updated the version number. If they have that means you had stale data. [Optimistic Locking comes with hibernate if you are using Hibernate
I have very similar design in my system, but that one is on Oracle. I have several separate processes polling table for jobs, and in each process there is just one thread to pick up jobs which are then distributed to worker threads (this obviously reduces the number of concurrent polls). I need to coordinate the separate servers and I do it via the database, by explicitly locking the table to grab a new job. The polling interval is one second, and since the job-grab operation is much shorter I don't mind the exclusive lock.
(I've got more than one DB call per poll query. Given the polling interval and the average job duration it does not seem a big deal for me, but your situation is perhaps different. Oracle actually has returning clause, which might allow me doing this in one call, but MySQL does not seem to have similar mechanism.)
You might also want to check message queues for MySQL, apparently some exist (such as Q4M)
Actually, if you don't have multiple processes like Martin has, you can just have a single thread that is responsible for polling the database that submits the job to an Executor. You really don't need to lock the record since you will always have a single thread handling the record.
ETA: Or to put it in another way, your "queue" is still in memory. MySQL is just acting as a persistent storage for it., and your publisher thread is responsible for synchronizing the persistent state with the real state of the queue.
As I said in my original post, there are multiple threads involved. A single-threaded solution is trivial since it doesn't require checking anything. I'd rather keep it multi-threaded than creating anything resembling a bottle-neck, even if it is a light-weight message proxy that creates Executors as needed.
Lock up on querying a table? Yes that is a valid concern, but then if all your threads were accessing the database directly they would lock up too. You already have a point of failure which is the database table that contains the queue. If you want to eliminate the point off failure, you need to eliminate that table.
What you need is a Persistent Queue. A normal Queue that is backed by persistent storage.