Trident is High-Level abstraction on top of storm, and makes it easier to build topologies.
Trident supports stateful stream processing, while pure Storm is a stateless processing framework.
Trident supports stateful stream processing, while pure Storm is a stateless processing framework.
- It is similar to Hadoop, Pig and Cascading.
- The stream is processed as a series of batches. It's a near real-time process.
- Stream tuples are partitioned among nodes in cluster by repartition operations.
- Every message that enters the topology is processed only once.
- During the construction of a topology, operations are performed on the tuple, which will either add new fields to the tuple or replace the tuple with a new set of fields.
1. Usages
Trident will be useful for those use cases where we require exactly once processing.
Trident is not fit for high-performance use cases, because Trident adds complexity on Storm and manages the state.
Trident will be useful for those use cases where we require exactly once processing.
Trident is not fit for high-performance use cases, because Trident adds complexity on Storm and manages the state.
2. Operations in Trident
- Operations that apply locally to each partition and cause no network transfer
- Repartitioning operations that don‘t change the contents
- Aggregation operations that do network transfer
- Operations on grouped streams
- Merges and Joins
3. Transactional topology
A transactional spout guarantees what's in each batch.
- Each batch is assigned a unique transactional ID (
txid
). In the case of failure, the entire batch is replayed. Hence, replays of the failed batch will contain the same set of tuples as the first time the batch was emitted. Thetxid
transactional ID of the failed batch remains the same as the first time.
- Tuples of one batch are not mixed with tuples of another batch. Hence, overlaps of tuples between batches are not allowed.
No comments:
Post a Comment