We have a question on whether the Pig and Avro with the same output schema will be converted into Parquet in the same schema.
I recently did some research on this question by reading the source code of Parquet Converters for Pig and Avro.
The comparison tables are shown as follows.
Avro type | Parquet type |
---|---|
null | no type (the field is not encoded in Parquet), unless a null union |
boolean | boolean |
int | int32 |
long | int64 |
float | float |
double | double |
bytes | binary |
string | binary (with original type UTF8) |
record | group containing nested fields |
enum | binary (with original type ENUM) |
array | group (with original type LIST) containing one repeated group field |
map | group (with original type MAP) containing one repeated group field (with original type MAP_KEY_VALUE) of (key, value) |
fixed | fixed_len_byte_array |
union | an optional type, in the case of a null union, otherwise not supported |
Pig type | Parquet type |
---|---|
null | no type (the field is not encoded in Parquet) |
boolean | boolean |
int | int32 |
long | int64 |
float | float |
double | double |
bytes | binary |
chararray | binary (with original type UTF8) |
tuple | an optional group containing one repeated group field |
bag | an optional group containing one repeated group field to preserve distinction between empty bag and null. |
map | an optional group containing one repeated group field of (key, value). |
It seems that Pig and Avro with the same output schema, after converted into Parquet.
Reference:
No comments:
Post a Comment