Kafka Idempotent Producer!!

Kafka idempotent producer this is just the term but what exactly mean bu idempotent producer.

Let us first try to understand what is mean by an idempotent.

“Denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself”. — Google dictionary 

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

Now we know basically in HTTP verbs get is the Idempotent operation.

As seen in the above picture you can imagine, a producer will be publishing data over the broker and some times due to network error you will see a duplicate message in Kafka. When the producer sends the message to kafka topic, you can introducer duplicate message due to network error. 

kafka producer

Good request flow: Producer publish the message over kafka and kafka say I got the message and I committed and this the ack and producer listening to the ack.

In Failure case: Producer publishes the message over kafka and kafka says I got the message and I committed and this the ack and due to network error producer unable to get the ack and here is the problem. Then in the above case producer say I am going to try because I haven’t received ack from kafka. The producer publishes the same message and this makes the data duplication. 

  1. If the producer resends the message it creates duplicate data.
  2. If the producer doesn’t resend the message then message lost.

How to solve it?

Kafka provides “at least once” delivery semantics. This means that a message that is sent may be delivered one or more times. In kafka ≥0.11 released in 2017, you can configure “idempotent producer”, which won’t introducer duplicate data. To stop processing a message multiple times, it must be persisted to Kafka topic only once. During initialization, unique ID gets assigned to the producer which is called producer ID or PID.

In this flow after network failure also kafka doesn’t make duplicate data, even though the producer publishes a message multiple times till he receives the ack, because of Producer ID or PID.

using PID

To achieve this as a programmer we don’t have to do anything.

producer = Producer({'bootstrap.servers': ‘localhost:9092’,          'message.send.max.retries': 10000000,          'enable.idempotence': True})

Enable the idempotence is true and kafka producer will take care of everything for you.

message.send.max.retries= Integer.MAX_VALUE #which is really huge number

Just consider this in my mind how times you want to retry. The early Idempotent Producer was forcing max.in.flight.requests.per.connection to 1 but in the latest releases it can now be used with max.in.flight.requests.per.connection set to up to 5 and still keep its guarantees.

Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.

Reference document:- 

Apache Kafka
You’re viewing documentation for an older version of Kafka – check out our current documentation here. Here is a…kafka.apache.org