Filter records based on the first tuple of a bag, by using
a UDF called
FirstTupleFromBag(bag, defaultValue) in DataFu.
define FirstTupleFromBag datafu.pig.bags.FirstTupleFromBag();
session = Filter session by FirstTupleFromBag(categories,null) !=('Invalid1') and
FirstTupleFromBag(catergories,null) !=('Invalid2');
Another Example:
-- input:
-- ({(a,1)})
input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY, numeric:INT)});
output = FOREACH input GENERATE FirstTupleFromBag(B, null);
-- output:
-- (a,1)
Reference:
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/FirstTupleFromBag.html
No comments:
Post a Comment