A shared variable, accumulators, provides a simple syntax for aggregating
values from worker nodes back to the driver program. One of the most common uses of accumulators is
to count events that occur during job execution
for debugging purposes.
- Accumulators are variables that can only be “added” to through an associative operation used to implement counters and sums, efficiently in parallel.
- Spark natively supports accumulators of numeric value types and standard mutable collections, and programmers can extend for new types.
- Only the driver program can read an accumulator’s value, using its value() method, not the tasks
val accum = sc.accumulator(0) sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) accum.value
Note that tasks on worker nodes cannot access the accumulator’s
value()
—from the
point of view of these tasks, accumulators are write-only variables.The type of counting shown here becomes especially handy when there are multiple values to keep track of, or when the same value needs to increase at multiple places in the parallel program
Reference:
http://spark.apache.org/docs/latest/programming-guide.html#transformations
http://spark.apache.org/docs/latest/programming-guide.html#transformations
No comments:
Post a Comment