If a schema is evolved in a backward compatible way, we can always use the latest schema to query all the data uniformly. For example, removing fields is backward compatible change to a schema, since when we encounter records written with the old schema that contain these fields we can just ignore them. Adding a field with a default value is also backward compatible.
Let's say we have two version of Employee schema as below.
Schema v1:
{
Schema v2:
We will use the latest schema to create a Hive table to load data with different versions of schema.
Please note
1. The "name" fields in two schemas need to be the same. Otherwise, although the data can be loaded in Hive table, but cannot be retrieved successfully.
2. A default value is needed for the optional fields in the latest schema. Specifying "null" as default of a union only works if "null" is specified as first type in the union.
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
Reference:
https://issues.apache.org/jira/browse/AVRO-1118
http://apache-avro.679487.n3.nabble.com/Does-Avro-Serde-support-schema-evolution-td4028398.html
{
"type": "record",
"name": "Employee",
"fields": [
{"name": "email", "type": "string"},
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
]
}
{
"type": "record",
"name": "Employee",
"fields": [
{"name": "email", "type": "string"},
{"name": "name", "type": "string"},
{"name": "yrs", "type": "int", "aliases": ["age"]},
{"name": "gender", "type": ["null",string"], "default": null}
]
}
We will use the latest schema to create a Hive table to load data with different versions of schema.
Please note
1. The "name" fields in two schemas need to be the same. Otherwise, although the data can be loaded in Hive table, but cannot be retrieved successfully.
2. A default value is needed for the optional fields in the latest schema. Specifying "null" as default of a union only works if "null" is specified as first type in the union.
Failed with exception java.io.IOException: org.apache.avro.AvroTypeException:
Found Employee, expecting Employee
CREATE TABLE Avro_table
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.url'='file:///root/avro_schema/Employee2.avsc')
STORED as INPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
https://issues.apache.org/jira/browse/AVRO-1118
http://apache-avro.679487.n3.nabble.com/Does-Avro-Serde-support-schema-evolution-td4028398.html
Very nice article,thank you..
ReplyDeletebig data and hadoop online training
Nice article,keep sharing more posts with us.
ReplyDeletethank you..
big data hadoop training