Monday, 27 October 2014

FirstTupleFromBag

Filter records based on the first tuple of a bag, by using a UDF called FirstTupleFromBag(bag, defaultValue) in DataFu.


define FirstTupleFromBag datafu.pig.bags.FirstTupleFromBag();

session = Filter session by FirstTupleFromBag(categories,null) !=('Invalid1') and 
FirstTupleFromBag(catergories,null) !=('Invalid2');


Another Example:
 -- input:
 -- ({(a,1)})
 input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY, numeric:INT)});

 output = FOREACH input GENERATE FirstTupleFromBag(B, null);

 -- output:
 -- (a,1)


Reference:
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/FirstTupleFromBag.html

No comments:

Post a Comment