1. Stream
A tuple is simply a list of named values (key-value pairs), and a Stream is an unbounded sequence of tuples.
• an unbounded sequence of tuples
• core abstraction in Storm
• Defined with a schema that names the fields in the tuple
• Value must be serializable
• Every stream has an id
3.
- Workers (JVMs): These are independent JVM processes running on a node. Each node is configured to run one or more workers. A topology may request one or more workers be assigned to it.
- Executors (threads): These are Java threads running within a worker JVM process. Multiple tasks can be assigned to a single executor. Unless explicitly overridden, Storm will assign one task for each executor.
- Tasks (bolt/spout instances): Tasks are instances of spouts and bolts whose
nextTuple()
andexecute()
methods are called by executor threads.
4. Spout
Spouts act as adapters that connect to a source of data, transform the data into tuples, and emit the tuples as a stream.
5. Bolt
Bolts can be thought of as the operators or functions of your computation. They take as input any number of streams, process the data, and optionally emit one or more streams. Bolts may subscribe to streams emitted by spouts or other bolts.
Typical functions performed by bolts include:
Bolts can be thought of as the operators or functions of your computation. They take as input any number of streams, process the data, and optionally emit one or more streams. Bolts may subscribe to streams emitted by spouts or other bolts.
Typical functions performed by bolts include:
- Filtering tuples
- Joins and aggregations
- Calculations
- Database reads/writes
6. Nimbus
The nimbus daemon's primary responsibility is to manage, coordinate, and monitor topologies running on a cluster, including topology deployment, task assignment, and task reassignment in the event of a failure.
7. Supervisor
The supervisor daemon waits for task assignments from nimbus and spawns and monitors workers (JVM processes) to execute tasks. Both the supervisor daemon and the workers it spawns are separate JVM processes.
8. Failover
Nimbus is not a single point of failure in the strictest sense. Nimbus does not take part in topology data processing, rather it merely manages the initial deployment, task assignment, and monitoring of a topology.
In fact, if a nimbus daemon dies while a topology is running, the topology will continue to process data as long as the supervisors and workers assigned with tasks remain healthy. The main caveat is that if a supervisor fails while nimbus is down, data processing will fail since there is no nimbus daemon to reassign the failed supervisor's tasks to another node.
If a worker process spawned by a supervisor exits unexpectedly due to an error, the supervisor daemon will attempt to respawn the worker process.
If a worker or even an entire supervisor node fails, how does Storm guarantee the delivery of the tuples that were in process at the time of failure?
The answer lies in Storm's tuple anchoring and acknowledgement mechanism. When reliable delivery is enabled, tuples routed to the task on the failed node will not be acknowledged, and the original tuple will eventually be replayed by the spout after it is timed out. This process will repeat until the topology has recovered and normal processing has resumed.
The nimbus daemon's primary responsibility is to manage, coordinate, and monitor topologies running on a cluster, including topology deployment, task assignment, and task reassignment in the event of a failure.
7. Supervisor
The supervisor daemon waits for task assignments from nimbus and spawns and monitors workers (JVM processes) to execute tasks. Both the supervisor daemon and the workers it spawns are separate JVM processes.
8. Failover
Nimbus is not a single point of failure in the strictest sense. Nimbus does not take part in topology data processing, rather it merely manages the initial deployment, task assignment, and monitoring of a topology.
In fact, if a nimbus daemon dies while a topology is running, the topology will continue to process data as long as the supervisors and workers assigned with tasks remain healthy. The main caveat is that if a supervisor fails while nimbus is down, data processing will fail since there is no nimbus daemon to reassign the failed supervisor's tasks to another node.
If a worker process spawned by a supervisor exits unexpectedly due to an error, the supervisor daemon will attempt to respawn the worker process.
If a worker or even an entire supervisor node fails, how does Storm guarantee the delivery of the tuples that were in process at the time of failure?
The answer lies in Storm's tuple anchoring and acknowledgement mechanism. When reliable delivery is enabled, tuples routed to the task on the failed node will not be acknowledged, and the original tuple will eventually be replayed by the spout after it is timed out. This process will repeat until the topology has recovered and normal processing has resumed.
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Storm , kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Sangita Mohanty
MaxMunus
E-mail: sangita@maxmunus.com
Skype id: training_maxmunus
Ph:(0) 9738075708 / 080 - 41103383
http://www.maxmunus.com/