Wednesday, 3 May 2017

Kafka Consumer Challenges

Consumer Retry
1. When you encounter a retriable error, is to commit the last record you processed successfully. Then, store the records that still need to be processed in a buffer, use the consumer pause() method to ensure that additional polls won't return data. If you succeed, or retried enough times, log an error, call resume() to unpause the consumer and the next poll will return new records to process.

2. When encountering a retriable error is to write it to a separate topic and continue. A separate consumer group can be used to handle retries from the retry topic.

Long Processing Times

The consumer processing records could take a long time. But you can't stop polling for more than few seconds. You must continue polling so the client can send hearbeats to the broker, and rebalance will not be triggered. A common pattern is to hand off the data to process to a thread-pool, when possible with multiple threads. You can pause the consumer, and keep polling without fetching additional data until the worker-threads finished.

Exactly Once Delivery

1. Writing results to a system that has some support for unique keys. Either the record itself contains a unique key, or you can create a unique key using the topic, partition and offset combination, which uniquely identifies a Kafka record. The data-store will override the existing one, when it receives the duplication. This pattern is called idempotent writes.

2. Writing results to a system that has transactions. Write the record and their offsets in the same transactionk. When starting up, retrieve the offsets of the latest records written to the external store, and then use consumer.seek() to start consuming again from those offsets.

No comments:

Post a Comment