Friday, 30 January 2015

Partition and Replication in Kafka

1. Partition
In Kafka,message partitioning strategy is used at the Kafka broker end. The decision about how the message is partitioned is taken by the producer, and the broker stores the messages in the same order as they arrive. The number of partitions can be configured for each topic within the Kafka broker.


2. Replication

In replication, each partition of a message has n replicas and can afford n-1 failures to guarantee message delivery. Out of the n replicas, one replica acts as the lead replica for the rest of the replicas. ZooKeeper keeps the information about the lead replica and the current in-sync follower replica (lead replica maintains the list of all in-sync follower replicas).
Each replica stores its part of the message in local logs and offsets, and is periodically synced to the disk. This process also ensures that either a message is written to all the replicas or to none of them.
If the lead replica fails, either while writing the message partition to its local log or before sending the acknowledgement to the message producer, a message partition is resent by the producer to the new lead broker.
The process of choosing the new lead replica is that all followers' In-sync Replicas (ISRs)register themselves with ZooKeeper. The very first registered replica becomes the new lead replica, and the rest of the registered replicas become the followers.
Kafka supports the following replication modes:
  1. Synchronous Replication: After all replicas are complete and responds to lead replica, replica responds to the producer. 
  2. Asynchronous Replication: Lead replica responds to the lead replica once writes the message to its local log. The downside is this mode doesn't ensure the message deliver in broker failure.
  3. All the pulling of messages is done from the lead replica.


No comments:

Post a Comment