Alvin's Big Data Notebook : Kafka Streams

Monday, 10 April 2017

Kafka Streams

Lambda vs Kappa Architecture

A scalable high-latency batch system that can process historical data and a low-latency stream processing system that can't reprocess results.
Use Kafka to retain the full log of the data you want to be able to reprocess Retaining large amounts of data in kafka is a perfectly natural and economical thing to do and won't hurt performance.

The KStreams DSL is composed of two main abstractions; the KStream and KTable interfaces.

A KStream is a record stream where each key-value pair is an independent record. Later records in the stream don’t replace earlier records with matching keys.
A KTable on the other hand is a “changelog” stream, meaning later records are considered updates to earlier records with the same key.

Local State Store:

Kafka streams provide an efficent way to model the application state.
The state store is partitioned the same way as the application's key space. As a result, all the data required to serve the queries that arrive at a particular application instance are available locally in the state store shards. Fault tolerance for this local state store is provided by kafka streams by logging all updates made to the state store, transparently, to a highly-available and durable kafka topic.
Kafka streams uses kafka like a commit log for its local, embedded database.

The aggregation is done on each instance with a local state. Then, the results are written to a new topic with a single partition, which will be read by a single application instance. This type of multi-phase processing is very familiar to those mapreduce.

Interactive Queries allow us to treat the stream processing layer as a lightweight, embedded database and directly query the state of your stream processing application, without needing to materialize that state to external databases or storage.

Kafka steams simplifies the stream processing architecture by eliminating the need for a separate cluster. These embedded databases act as materialized views of logs.
Materialized views provide better application isolation because they are part of an application's state, also provide better performance.

Interactive Queries enables faster and more efficient use of the application state. There is no duplication of data between the store doing aggregation for stream processing and the store answering queries.

Discovering any instance's stores

In kafka, each instance may expose its endpoint information metadata to other instances of the same application. The IQ APIs allow a developer to obtain the metadata for a given store name and key, and do that across the instances of an application. Then, we can discover where the store is that holds a particular key by examining the metadata.

Event streams are ordered, while records in a table are always considered unordered.
Events, once occured, can never be modified. Instead an additional event is written to the stream, recording a cancellation of previous transaction.

Streams contain a stream of events and each event caused a change. A table contains a current state of the world, a state that is a result of many changes.

If we can capture all the changes that happen to the database table in a stream of events, we can have our stream processing job listen to this stream and update the cache based on database change events.

Stream-table join: one of the streams represents changes to a locally cached table. One stream is joining a stream with a table to enrich all events with info in the table. This is similar to joining a fact table with a dimension.
Stream-stream join: Streams have the same key and happened in the same time windows, also called a windowed-join. They are partitioned on the same keys, which are also the join keys.

Kafka Streams scales by allowing multiple threads of executions within one instance of the application., and supporting load balancing between distributed instances of the application. The number of tasks is determined by streams engine, and depends on the number of partitions in the topics. Each task is responsible for a subset of the partitions. The developer of the application can choose the number of threads each application instance will execute. Each task will independently process events from those partitions and maintain its own local state with relevant aggregates.

Kafka assigns all the partitions needed for one join to the same task, so this task can consume from all the relevant partitions and perform the join independently. Kafka streams requries that all topics that participate in a join operation will hae the same number of partitions and be partitioned based on the join key. Instead of shuffling, Kafka streams re-partitions by writing the events to a new topic with the new keys and partitions. It reduce dependencies between different parts of a pipeline.

Global KTable is more service concept, while KTable is more about streaming concept:

All partitions replicated to all nodes.
Supports N-way join
Blocks until initialized
Doesn't trigger processing(reference resources)

By default KStreams keeps state store backed up on a secondary node.

79 comments:

gowsalya28 August 2018 at 03:00
I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done
digital marketing training in tambaram
ReplyDelete
Replies
Mounika6 September 2018 at 08:16
Resources like the one you mentioned here will be very useful to me ! I will post a link to this page on my blog. I am sure my visitors will find that very useful
Click here:
python training in tambaram
Click here:
python training in annanagar
ReplyDelete
Replies
ragul ragul14 September 2018 at 00:57
Thanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.
Blueprism training in Chennai

Blueprism training in Bangalore

Blueprism training in Pune
ReplyDelete
Replies
Unknown15 September 2018 at 03:19
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

Data Science Training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune
ReplyDelete
Replies
Unknown6 October 2018 at 06:30
Really great post, I simply unearthed your site and needed to say that I have truly appreciated perusing your blog entries.

angularjs-Training in velachery

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

angularjs Training in electronic-city
ReplyDelete
Replies
gowsalya23 October 2018 at 03:35
Hello! This is my first visit to your blog! We are a team of volunteers and starting a new initiative in a community in the same niche. Your blog provided us useful information to work on. You have done an outstanding job.

AWS Training in Bangalore | Amazon Web Services Training in Bangalore

AWS Training in Bangalore |Best AWS Training Institute in BTM ,Marathahalli

AWS Training in Rajaji Nagar | Amazon Web Services Training in Rajaji Nagar

Best AWS Training Institute in BTM Layout Bangalore ,AWS Coursesin BTM

Best AWS Training in Marathahalli | AWS Training in Marathahalli
ReplyDelete
Replies
priya2 March 2019 at 06:26
Impressive. Your story always bring hope and new energy. Keep up the good work.

Microsoft Azure online training
Selenium online training
Java online training
Java Script online training
Share Point online training
ReplyDelete
Replies
Ajish18 July 2019 at 07:39
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
blue prism Training in Electronic City
ReplyDelete
Replies
Prwatech17 September 2019 at 00:45
I learned World's Trending Technology from certified experts for free of cost. I Got a job in decent Top MNC Company with handsome 14 LPA salary, I have learned the World's Trending Technology from Python training in pune experts who know advanced concepts which can help to solve any type of Real-time issues in the field of Python. Really worth trying instant approval blog commenting sites
ReplyDelete
Replies
raju14 December 2019 at 01:24
very nice.....!
dominican republic web hosting
iran hosting
palestinian territory web hosting
panama web hosting
syria hosting
services hosting
afghanistan shared web hosting
andorra web hosting
belarus web hosting
ReplyDelete
Replies
preethi minion15 December 2019 at 02:35
nice...
inplant training in chennai
inplant training in chennai
inplant training in chennai for it.php
italy web hosting
afghanistan hosting
angola hosting
afghanistan web hosting
bahrain web hosting
belize web hosting
india shared web hosting
ReplyDelete
Replies
Shalini Kumar17 December 2019 at 06:38
This comment has been removed by the author.
ReplyDelete
Replies
nivetha28 December 2019 at 06:45
AMAZING GOOD...
internships for cse students in bangalore
internship for cse students
industrial training for diploma eee students
internship in chennai for it students
kaashiv infotech in chennai
internship in trichy for ece
inplant training for ece
inplant training in coimbatore for ece
industrial training certificate format for electrical engineering students
internship certificate for mechanical engineering students
ReplyDelete
Replies
datasciencecourse26 March 2020 at 00:52
Cool stuff you have and you keep overhaul every one of us

machine learning course

artificial intelligence course in mumbai
ReplyDelete
Replies
j889921 April 2020 at 02:32
fantastic blog!very useful keep it up

ExcelR data analytics courses
ReplyDelete
Replies
devi27 July 2020 at 14:26
Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.Data Science Training In Chennai

Data Science Online Training In Chennai

Data Science Training In Bangalore

Data Science Training In Hyderabad

Data Science Training In Coimbatore

Data Science Training

Data Science Online Training

ReplyDelete
Replies
Rohini7 September 2020 at 07:16
I am impressed by the information that you have on this blog. It shows how well you understand this subject.
artificial intelligence course in bangalore
ReplyDelete
Replies
Rohini22 September 2020 at 07:17
Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle.
artificial intelligence course in bangalore
ReplyDelete
Replies
Priya Rathod23 September 2020 at 01:13
Nice article and thanks for sharing with us. Its very informative.
AWS Training in Hyderabad
AWS Course in Hyderabad
ReplyDelete
Replies
Mallela2 May 2021 at 02:39
Thanks for posting the best information and the blog is very important ��.data science interview questions and answers
ReplyDelete
Replies
Devi21 June 2021 at 12:28
Did you want to set your career towards Oracle? Then Infycle is with you to make this into reality. Infycle Technologies gives the combined and best Oracle course in Chennai, which offers various stages of Oracle such as Oracle PL/SQL, Oracle DBA, etc., along with 100% hands-on training guided by professional tutors in the field. Along with that, the mock interviews will be given to the candidates to face the interviews with complete confidence. Apart from all, the candidates will be placed in the top MNC's with an excellent salary package. To get it all, call 7502633633 and make this happen for your happy life.
Best Oracle Course in Chennai | Infycle Technologies
ReplyDelete
Replies
Pavithra Devi10 February 2022 at 00:53
Great post. Thanks for sharing such a useful blog.
Tally Course in T Nagar
Tally course in Chennai

ReplyDelete
Replies
Pavithra Devi14 February 2022 at 23:19
This post is so interactive and informative.keep update more information...
hadoop training in tambaram
Big data training in chennai
ReplyDelete
Replies
Aruna Sen21 September 2023 at 00:56
This insightful comparison between Lambda and Kappa architectures in Kafka Streams demonstrates a deep understanding of stream processing. The detailed explanations of concepts like KStreams DSL and Interactive Queries are highly commendable. Well done.
Data Analytics Courses In Dubai
ReplyDelete
Replies
Data Analytics Courses in Agra4 October 2023 at 04:23
An excellent explanation of Kafka Streams and the subtle differences between Lambda and Kappa architecture
Data Analytics Courses in Agra
ReplyDelete
Replies
Advisor Uncle9 October 2023 at 03:29
This was so helpful for me. Keep sharing more.
Visit - Data Analytics Courses in Delhi
ReplyDelete
Replies
datavadodara9 October 2023 at 13:40
good blog
Data Analytics Courses In Vadodara

ReplyDelete
Replies
Digital marketing courses in illinois19 October 2023 at 02:37
Your blog covers a wide range of topics, making it an invaluable resource for both beginners and experienced data professionals. Keep up the great work.

Digital marketing courses in illinois
ReplyDelete
Replies
DA in limerick19 November 2023 at 11:55
Thanks for sharing incredible and outstanding explanation on Kafka Streams and the subtle differences between Lambda and Kappa architecture.
data analyst courses in limerick
ReplyDelete
Replies
Gogou Misao14 December 2023 at 00:12
Thanks for that really useful blog post. This was just what I needed.

Investment banking analyst jobs
ReplyDelete
Replies
Investment banking courses after 12th16 January 2024 at 07:36
Enjoyed reading it. Thoroughly researched and well written.
Investment banking courses after 12th
ReplyDelete
Replies
Mohd Bilal23 September 2024 at 09:37
Kafka Streams is a lightweight, client-side library for building real-time, scalable data processing applications. It simplifies stream processing by integrating directly with Apache Kafka, enabling powerful event-driven workflows.
Data science courses in Gurgaon
ReplyDelete
Replies
Evd26 September 2024 at 02:09
What an insightful blog! Your ability to connect ideas and provide actionable advice is impressive. I’m excited to implement some of your suggestions

Data science courses in Gujarat
ReplyDelete
Replies
Rachana26 September 2024 at 03:13
Great overview of Kafka Streams! Your explanations really highlight its capabilities in real-time data processing. Keep up the fantastic work; you're inspiring others to explore and implement these powerful streaming solutions!
Data Science Courses in Singapore
ReplyDelete
Replies
khush26 September 2024 at 07:48
This article is incredibly well-put-together! It offers practical and actionable insights that are easy to follow. I’m sure many readers will find this information extremely valuable. Thanks for providing such a great resource.
Data Analytics Courses in Delhi
ReplyDelete
Replies
Data Analytics Courses In Ontario27 September 2024 at 11:14
"I found this post really informative! If you're in Faridabad, don't miss the data science courses in Faridabad. They cover everything from basics to advanced topics!"
ReplyDelete
Replies
Bhumi Goswami30 September 2024 at 10:57
This is a great introduction to Kafka Streams! Your explanations are clear and the examples are very helpful for understanding the concepts. Thanks for breaking down such a complex topic in an accessible way!
Data science courses in Bhutan
ReplyDelete
Replies
Vijay1 October 2024 at 03:39
Nice blog and informative one. Your explanation to kafka streams clears by what you have written in your words. Very nice.
Data Science Courses in Hauz Khas
ReplyDelete
Replies
Sadhvi17 October 2024 at 13:17

Here's a comment you can post for the article on Kafka Streams and the comparison between Lambda and Kappa architecture:

This article offers a thorough and insightful exploration of Kafka Streams and the nuances between Lambda and Kappa architectures. I appreciate the clear distinction between KStreams and KTables, highlighting how each serves unique purposes in stream processing. The explanation of the local state store and how Kafka Streams utilizes it as an embedded database is particularly valuable—it simplifies the architecture while enhancing fault tolerance and performance.
Data science courses in Mysore
ReplyDelete
Replies
Anonymous27 October 2024 at 06:58
The post on Alvin CJin about Kafka Streams is very informative! It clearly explains the concepts behind stream processing and how Kafka can be leveraged for real-time data processing. The examples provided help demystify the implementation, making it accessible for developers looking to utilize Kafka in their projects. Thanks for sharing such valuable insights!

Data science courses in Bangalore.
ReplyDelete
Replies
Data science Courses in Norwich10 November 2024 at 04:19
Alvin's Big Data Notebook is a valuable resource for anyone delving into the world of big data. It covers fundamental topics clearly, from data processing to advanced analytics, with practical examples that make complex concepts accessible. The notebook’s structure is intuitive, guiding users step-by-step through essential big-data techniques. It is a well-organized and insightful tool for beginners and intermediates looking to strengthen their data-handling skills.
Data science Courses in Germany
ReplyDelete
Replies
Prachi IIMskills21 November 2024 at 04:43
I’m so glad I found this post. Your fresh perspective on Kafka strams has motivated me to take action. Very innovative article.
Data Science Courses in China

ReplyDelete
Replies
AI Readers club22 November 2024 at 10:13
Thanks for the insightful post on Kafka Streams! The explanation of how Kafka can be used for stream processing is very clear and informative. This is a valuable resource for anyone looking to understand and implement Kafka Streams in their projects. Great work!
Data science courses in Bangladesh
ReplyDelete
Replies
Ria23 November 2024 at 11:02
Nice article, I got new information from your article, keep sharing.
IIM SKILLS Data Science Course Review
ReplyDelete
Replies
Gautham3423 December 2024 at 08:20
Very nice article on Kafka Streams. Thanks for the share.
technical writing course
ReplyDelete
Replies
Sadhvi23 December 2024 at 12:03
This article provides an excellent and comprehensive breakdown of Kafka Streams and the distinctions between the Lambda and Kappa architectures. digital marketing courses in delhi
ReplyDelete
Replies
Shikha iimskills4 January 2025 at 01:48
Kafka Streams is a powerful library for building real-time, event-driven applications. It simplifies processing data streams directly within a Kafka ecosystem, enabling developers to perform complex transformations, and aggregations.
digital marketing course in Kolkata fees

ReplyDelete
Replies
Data Science Courses In Micronesia7 January 2025 at 02:01
Kafka Streams is a lightweight, client-side library for building real-time, scalable data processing applications. It simplifies stream processing by integrating directly with Apache Kafka, enabling powerful event-driven workflows. Thanks for sharing this information.
Data Science Courses in Micronesia

https://iimskills.com/data-science-courses-in-micronesia/

Data Science Courses in Micronesia

ReplyDelete
Replies
Ayesha Sharma18 January 2025 at 16:17
thanks for the info on kappa architecture good read https://iimskills.com/top-23-digital-marketing-courses-in-bangalore/
ReplyDelete
Replies
Chanda9 February 2025 at 04:08
thank you for sharing details about kafka.Medical Coding Course
ReplyDelete
Replies
sanjana1 March 2025 at 08:06
Thanks for sharing the information on Kafka streams ! I really enjoyed the content
Medical Coding Courses in Chennai

ReplyDelete
Replies
Judith7 March 2025 at 04:17
A scalable high-latency batch system that can process historical data and a low-latency stream processing system that can't reprocess results. The KStreams DSL is composed of two main abstractions; the KStream and KTable interfaces.
Medical Coding Courses in Bangalore

ReplyDelete
Replies
IIM SKILLS (Pushpa)24 March 2025 at 09:34
Thanks for sharing! Kafka Streams simplifies stream processing with features like KStream, KTable, and stateful operations. It’s scalable, efficient, and integrates seamlessly with Apache Kafka for real-time data workflows.
Medical coding courses in Delhi/
ReplyDelete
Replies
tushar kaushik30 March 2025 at 10:30
"I really appreciate the effort you put into writing this. It’s obvious how much thought went into it." Medical Coding Courses in Delhi</
ReplyDelete
Replies
Monisha1 April 2025 at 08:02
Such an amazing post! Very well-explained and useful.
Medical Coding Courses in Delhi
ReplyDelete
Replies
Thrisha6 April 2025 at 10:50
This made things so much clearer for me.

Medical Coding Courses in Bangalore
ReplyDelete
Replies
Keerthi6 April 2025 at 13:34
This comment has been removed by the author.
ReplyDelete
Replies
Keerthi6 April 2025 at 13:34
Excellent article, thanks for sharing
Medical Coding Courses in Delhi
ReplyDelete
Replies
laungh new ipad pro18 April 2025 at 12:18
The information you given is very helpful
Medical Coding Courses in Delhi
ReplyDelete
Replies
rani iimskills26 April 2025 at 07:05
I hope you continue to provide new ideas for us all and greetings success always for you. Keep update more information..
Data Science Courses in India
ReplyDelete
Replies
GajenderIIM3 May 2025 at 23:54
This introduction to Kafka Streams is a powerful deep dive into real-time data processing! It explains how stream processing works within the Kafka ecosystem with clarity and relevance. Ideal for developers building scalable, event-driven applications. A must-read for those wanting to harness the full potential of Kafka! Data Science Courses in India
ReplyDelete
Replies
Aisha Duhailij5 May 2025 at 04:36
This is a great article comparing the Lambda and Kappa architectures! It clearly lays out the pros and cons of each approach for handling data processing.
Data Science Courses in India
ReplyDelete
Replies
Hkblog259 May 2025 at 09:40
What an Excellent writing and great insights throughout. I appreciate your thoughtful perspective. Medical Coding Courses in Vadodara
ReplyDelete
Replies
Aditya Shankar20 May 2025 at 12:08
Thank you for this concise and informative introduction to Kafka Streams! I found your explanation of how it differs from traditional messaging systems particularly useful. The breakdown of stream processing concepts was easy to follow, even for someone new to the topic. I’d love to see more real-world examples or use cases in future posts—this was a great start!
Medical Coding Courses in Delhi

ReplyDelete
Replies
sree st23 May 2025 at 07:02
Excellent breakdown of stream processing architectures and Kafka Streams internals! You’ve captured the core benefits of Kafka Streams — like local state stores, materialized views, and interactive queries — which really showcase how it simplifies and decentralizes modern data processing. Medical Coding Courses in Kochi
ReplyDelete
Replies
PathToSuccess23 May 2025 at 07:16
Nice introduction to Kafka Streams. well explained and concise.

Medical Coding Courses in Kochi
ReplyDelete
Replies
Elakhiya23 May 2025 at 14:46
Thanks for the clear explanation of Kafka Streams! I like how you explained the difference between KStream and KTable and how state stores work. The part about Interactive Queries making it easier to query stream state without extra databases was very helpful.
Medical Coding Courses in Delhi
ReplyDelete
Replies
harshgoswami24 May 2025 at 05:20
Thank you for sharing your insights on Kafka Streams. Your explanation of the differences between KStream and KTable, as well as the concept of local state stores, provides a clear understanding of how Kafka Streams operates. The analogy to MapReduce and the emphasis on interactive queries highlight the power of stream processing in real-time applications. Additionally, your discussion on stream-table and stream-stream joins offers valuable insights into how to enrich and correlate data streams effectively. This comprehensive overview serves as a great resource for anyone looking to delve into Kafka Streams and its capabilities.

Medical Coding Courses in Kochi
ReplyDelete
Replies
digital.cvm.2@gmail.com24 May 2025 at 08:04
Nice introduction to Kafka Streams. well explained and concise.
Medical Coding Courses in Delhi
ReplyDelete
Replies
FLK26 May 2025 at 00:37
Great Post. It was enjoyable to read and easy to understand.

Medical Coding Courses in Kochi

ReplyDelete
Replies
IIM Skills Nandni Choubey20 June 2025 at 12:08
This was just the type of content I needed today. Thank you for sharing your knowledge!
Medical Coding Courses in Delhi
ReplyDelete
Replies
Kajal9524 June 2025 at 07:04
Great post. Thanks for sharing such a useful blog.
href=https://iimskills.com/medical-coding-courses-in-delhi/>Medical Coding Courses in Delhi
ReplyDelete
Replies
Saloni3 July 2025 at 09:35
Such a helpful read! I’m going to take a lot of this into consideration.
Medical Coding Courses in Delhi

ReplyDelete
Replies
Arpita ah8 July 2025 at 08:20
Really enjoyed your deep dive into Kafka Streams — the clear breakdown between KStream, KTable, stateful vs stateless processing, and joins makes this complex topic feel approachable. The real-world code snippets and the stream/table duality explanation are particularly helpful for hands-on learners. Thanks for making stream processing so much more digestible!
Medical Coding Courses in Delhi
ReplyDelete
Replies
IIM Skills(Neha Tiwari)9 July 2025 at 04:22
Great breakdown of Kafka Streams and the differences between Lambda and Kappa architecture. Especially liked the clarity on KTable vs KStream and how Kafka handles fault tolerance and state stores.
Medical Coding Courses in Delhi
ReplyDelete
Replies
GAUTAM010 July 2025 at 18:23
This summary highlights how Kafka Streams enables scalable, real-time processing using KStream and KTable abstractions. It contrasts Lambda and Kappa architectures while showcasing features like local state stores, interactive queries, and efficient stream-table joins.
Medical Coding Courses in Delhi
ReplyDelete
Replies
iimskillsdelhinsp12 July 2025 at 05:47
Great comparison! The breakdown between Lambda and Kappa architectures really helps clarify when each model is best suited for streaming data applications.
Medical Coding Courses in Delhi
ReplyDelete
Replies
Meghna12 July 2025 at 08:00
This is a great overview of how Kafka Streams work and how they simplify handling real-time data. I especially liked the part about how apps can use local data instantly without needing a separate database. For anyone exploring real-time data or modern tech setups, this is really insightful. Also, if you're in healthcare or medical fields, you might find this helpful: Medical Coding Courses in Delhi.
ReplyDelete
Replies
Tushar gautam13 July 2025 at 14:13
Great comparison! Lambda and Kappa architectures each have their strengths—Lambda offers flexibility with separate batch and stream layers, while Kappa simplifies the pipeline by focusing on stream processing. Choosing between them really depends on the use case and system complexity.
Medical Coding Courses in Delhi
ReplyDelete
Replies

Add comment