Friday, 25 July 2014

Parquet OutOfMemory Error in CDH Cluster

Parquet files have a default block size of 1G. This means that any Parquet file that greater than 1G will be broken up into 1G blocks. So, it is suggested that mapreduce child maximum heap size be set at least to 1GB. SInce in your environment this property value is set to approximately 700MB, this might be causing the OutOfMemory errors. Since you have plenty of free memory on the slave nodes, we suggest increasing this value to 1.5GB to start with, and increase it further as needed. Below are the steps to implement these changes and for the changes to take effect: 1. Under the MapReduce-1 service configuration, search for child in the search box on the left side. 2. When the search results come up, search for the MapReduce Child Java Maximum Heap size in the category Gateway/ResourceManagement. 3. Change this value from 692MB to 1.5GB. 4. Click on the Actions button on the top right corner. 5. Select Deploy Client Configuration. This will push the change made above to the Client(Gateway) systems. These are the systems from where you will be running the Hive and Pig jobs. 6. Select Restart to restart the MapReduce service.

No comments:

Post a Comment