Alvin's Big Data Notebook : New features in Kafka 0.8.2

There are some valuable features in Kafka 0.8.2.

1. New producer

At a high level, the primary difference in this producer is that it removes the distinction between the “sync” and “async” producer. Effectively, all requests are sent asynchronously but always return a future response object that returns the offset as well as any error that may have occurred when the request is complete.
The batching that is done only in the async producer today is done whenever possible now. This means that the sync producer, under load, can get performance as good as the async producer. This works similarly to group commit in databases but with respect to the actual network transmission – any messages that arrive while a send is in progress are batched together.

2. Delete topic

3. Offset management

One of the expensive operations a messaging system has to perform is keeping track of what messages are consumed. Kafka does this using “offsets”, a marker it stores for the consumer’s position in the log of messages. Previously this marker was stored by the JVM client in ZooKeeper. The problem with this usage is that for consumers that want to update their position frequently this can become a bottleneck. Zookeeper writes are expensive, and can’t be scaled horizontally.

Internally the implementation of the offset storage is just a compacted Kafka topic (__consumer_offsets) keyed on the consumer’s group, topic, and partition. The offset commit request writes the offset to the compacted Kafka topic using the highest level of durability guarantee that Kafka provides (acks=-1) so that offsets are never lost in the presence of uncorrelated failures. Kafka maintains an in-memory view of the latest offset per <consumer group, topic, partition> triplet, so offset fetch requests can be served quickly without requiring a full scan of the compacted offsets topic.

4. Automated leader rebalanceing

In Kafka, the leader replica for a partition does the reading and writing of messages. For an efficient load distribution, it is important to ensure that the leader replicas are evenly distributed amongst the brokers in a cluster.
When brokers are bounced or failures occur, leaders automatically failover to a different live replica. Over time, this may lead to an uneven distribution of leaders in a cluster with some brokers serving more data compared to others.

Now it automatically detects such leader imbalance and periodically triggers leader re-election to maintain an even distribution of leaders by moving leaders back to the preferred replica if alive.

http://blog.confluent.io/2014/12/02/whats-coming-in-apache-kafka-0-8-2/

Alvin's Big Data Notebook

Friday, 27 February 2015

New features in Kafka 0.8.2

No comments:

Post a Comment