Thursday 12 June 2014

Pig UDF Outputformat

If your UDF returns a scalar or a map, no work is required. However, if your UDF returns a tuple or a bag (of tuples), it needs to help Pig figure out the structure of the tuple.
If a UDF returns a tuple or a bag and schema information is not provided, Pig assumes that the tuple contains a single field of type bytearray. If this is not the case, then not specifying the schema can cause failures
Below is an example to output a bag of tuples.
@Override
public Schema outputSchema(Schema input) {
try {
return new Schema(
new FieldSchema(null,
new Schema(
new FieldSchema(null,
new Schema(
new FieldSchema("passivedataxml", DataType.CHARARRAY)), DataType.TUPLE)),
DataType.BAG));
} catch (Exception e) {
return null;
}
}

Reference:
1. http://pig.apache.org/docs/r0.9.1/udf.html#udf-java

No comments:

Post a Comment