The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.
Data stored in ORCFile can be read or written through HCatalog, so any Pig or Map/Reduce process can play along seamlessly.
Hive 12 optimizes this by allowing predicates to be pushed down and evaluated in the storage layer itself. It’s controlled by the setting
hive.optimize.ppd=true
, which should be true by default. The ORCFile reader will now only return rows that actually match the WHERE
predicates and skip customers residing in any other state. You can force Hive to sort on a column by using the
SORT BY
keyword when creating the table and setting hive.enforce.sorting
to true before inserting into the table.CREATE TABLE mytable (
...
) STORED AS orc tblproperties ("orc.compress"="SNAPPY");
Reference:
http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html
No comments:
Post a Comment