I am wondering if you can give us some recommendation, or best practices related to queue seek in Kafka.
Assuming we have a system that reads messages from the Queue and then independently transports them to a container. If the process fails in between, and we already committed the read from the queue what would be the best option to pick up the queue reader where it left off? Currently we are storing the offset and partition of the messages that have already been transported to the container, and when the reader starts, we are always seeking to the last known partition/offset combo. Is there a better or more optimal way of doing this?
When you say Queue, I'm going to assume you are referring to a Kafka topic.
The process you describe is the exact way you keep track of the last record you consumed.
However, I don't know what version of Kafka you are running, but you can store the committed offsets in Kafka itself.
With a KafkaConsumer, you can specify the commit-interval via configs and after the specified time elapses, the consumer will persist the last read offset into Kafka (an internal topic named _consumer_offsets).
Then when you restart the consumer will communicate with the Kafka broker, and it will pick up where left off, with no need for doing a manual seek yourself.
We are doing commit to the Kafka in several ways (commit interval as well as count of retrieved messages). But it's actually not enough for our specific needs. The Kafka consumer doesn't have control over the final destination of the messages. So if the process is broken in between than some of the messages that have been read (and committed) by the consumer are not yet delivered to the specific container.
But I am glad to know that the option we chose is the right one.
machines help you to do more, but experience less. Experience this tiny ad:
Devious Experiments for a Truly Passive Greenhouse!