Wednesday, 2 July 2014

PigStore Schema


With Pig 0.10, we now have an option to pass PigStorage the argument ‘-schema’ while storing data. This will create a ‘.pig_schema’ file in the output directory which is a JSON file containing the schema.

store B into 'output' using PigStorage('\t', '-schema');
So the next time you load ‘output’, you only need to specify the location of output to LOAD.
      • PigStorage always tries to load the .pig_schema file, unless you explicitly say -noschema.
      • If you don’t specify anything at all, PigStorage will try to load a schema, and silently fail (behave as before) if it’s not present or unreadable.
      • If you specify -schema during loading, PigStorage will fail if a schema is not present.
      • If you specify -noschema during loading, PigStorage will ignore the .pig_schema file.
      • PigStorage will only *store* the schema if you specify -schema.


No comments:

Post a Comment