We have a question on whether the Pig and Avro with the same output schema will be converted into Parquet in the same schema.
I recently did some research on this question by reading the source code of Parquet Converters for Pig and Avro.
The comparison tables are shown as follows.
| Avro type | Parquet type |
|---|---|
| null | no type (the field is not encoded in Parquet), unless a null union |
| boolean | boolean |
| int | int32 |
| long | int64 |
| float | float |
| double | double |
| bytes | binary |
| string | binary (with original type UTF8) |
| record | group containing nested fields |
| enum | binary (with original type ENUM) |
| array | group (with original type LIST) containing one repeated group field |
| map | group (with original type MAP) containing one repeated group field (with original type MAP_KEY_VALUE) of (key, value) |
| fixed | fixed_len_byte_array |
| union | an optional type, in the case of a null union, otherwise not supported |
| Pig type | Parquet type |
|---|---|
| null | no type (the field is not encoded in Parquet) |
| boolean | boolean |
| int | int32 |
| long | int64 |
| float | float |
| double | double |
| bytes | binary |
| chararray | binary (with original type UTF8) |
| tuple | an optional group containing one repeated group field |
| bag | an optional group containing one repeated group field to preserve distinction between empty bag and null. |
| map | an optional group containing one repeated group field of (key, value). |
It seems that Pig and Avro with the same output schema, after converted into Parquet.
Reference:
No comments:
Post a Comment