Tuesday, 14 October 2014

Introduction to Flume

Flume is really meant to push events in real time where the stream of data is continuous and its volume reasonably large.

An agent is a Java application that receives or generates data and buffers it until it is eventually written to the next agent or to a storage or indexing system.
  • Sources can listen to one or more network ports to receive data or can read data from the local file system. Each source must be connected to at least one channel.  
  • Channels are, in general, passive components that buffer data that has been received by the agent, but not yet written out to another agent or to a storage system. 
  • Sinks push events to the next hop (in the case of RPC sinks), or to the final destination.  
A source writes events to one or more channels.
A channel is the holding area as events are passed from a source to a sink.
A sink receives events from one channel only.
An agent can have many sources, channels, and sinks.


A source can write to multiple channels via the processor-interceptor-selector route.
  • The channel processor then passes these events to one or more interceptors configured for the source.
  • An interceptor is a piece of code that can read the event and modify or drop the event based on some processing it does.
  • Channel selectors are the components that decide which channels attached to this source each event must be written to.


Sink runners run a sink group, which may contain one or more sinks. Each sink group has a sink processor that selects one of the sinks in the group to process the next set of events. Each sink can take data from exactly one channel.

An event is composed of zero or more headers and a body.The headers are key/value pairs that can be used to make routing decisions or carry other structured information (such as the timestamp of the event or hostname of the server where the event originated).The body is an array of bytes that contains the actual payload.

An interceptor is a point in your data flow where you can inspect and alter Flume events. You can chain zero or more interceptors after a source creates an event or before a sink sends the event wherever it is destined. 


Reference:
"Streaming Data Using Apache Flume"


No comments:

Post a Comment