Wednesday, 26 November 2014

Avro Schema Evolution

Sometimes, we want to update the schema for existing avro files. 
Older data can always be read by the current schema.

1. Change field type

2. Remove fields from a dataset schema
When you remove fields from a dataset schema, the data already written remains       unchanged. The fields you remove are not required when records are written going forward. 

3. Add Fields to a Dataset
You must define a default value for the fields you add to the dataset schema. Records that do not include the field are populated with the default you provide.

4. Read with Different Schemas
You can have a schema that reads fewer fields than are defined by the schema used to write a dataset, provided that the field definitions in the reader schema are compatible with the chosen fields in the writer schema.

5. Generate an Avro schema file (movies.avsc) using movies.csv.
$ kite-dataset csv-schema movies.csv --class movies -o movies.avsc
6. Update schema for movies dataset.
$ kite-dataset update movies --schema movies2.avsc


Reference:
http://kitesdk.org/docs/current/guide/Schema-Evolution/
http://avro.apache.org/docs/current/spec.html#Schema+Resolution

No comments:

Post a Comment