1. Download gradle.
2. To bootstrap the wrapper, then the regular gradlew instructions are available.
./bin/gradle -b bootstrap.gradle in project foler
3.generates the eclipse project and classpath files:
./gradlew eclipse
4. Build and test datafu-Pig subproject
./gradlew :datafu-pig:build
./gradlew :datafu-pig:test
1. Hourglass is a framework to incrementally process data with Hadoop MapReduce.
The input data is partitioned according to time, and the range of input data to process is adjusted as new data arrives.Since a previous output already exists, Hourglass is able to reuse this result and therefore it only needs to consume the previous output and the new day of input. It reuses the previous output and merges this with only the new input.
It supports both Fixed-length and Fixed-start use cases.
2. List of DataFu analysis UDFs.
DataFu 1.2.0
Packages | |
---|---|
datafu.pig.bags | A collection of general purpose UDFs for operating on bags. |
datafu.pig.geo | UDFs for geographic computations. |
datafu.pig.hash | UDFs for computing hashes from data. |
datafu.pig.linkanalysis | UDFs for performing link analysis, such as PageRank. |
datafu.pig.random | UDFs dealing with randomness. |
datafu.pig.sampling | Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc. |
datafu.pig.sessions | UDFs for web log sessionizing data. |
datafu.pig.sets | UDFs for set operations such as intersect and union. |
datafu.pig.stats | Statistics UDFs for computing median, quantiles, variance, confidence intervals, etc. |
datafu.pig.urls | UDFs for processing URLs. |
datafu.pig.util | Other useful utilities. |
3. Bacon project contains url parsing UDF. Require java 7
https://github.com/aaronbinns/bacon
Reference:
https://github.com/apache/incubator-datafu
http://datafu.incubator.apache.org/docs/datafu/1.2.0/
http://datafu.incubator.apache.org/docs/hourglass/getting-started.html
http://datafu.incubator.apache.org/blog/2013/10/03/datafus-hourglass-incremental-data-processing-in-hadoop.html
No comments:
Post a Comment