Sunday, 21 September 2014

MRv2 and Yarn Memory Configuration

The final calculation is to determine the amount of RAM per container:

RAM-per-Container = maximum of (MIN_CONTAINER_SIZE, (Total Available RAM) / Containers))

With these calculations, the YARN and MapReduce configurations can be set:

Configuration File
Configuration Setting
Value Calculation
yarn-site.xml
yarn.nodemanager.resource.memory-mb
= Containers * RAM-per-Container
yarn-site.xml
yarn.scheduler.minimum-allocation-mb
= RAM-per-Container
yarn-site.xml
yarn.scheduler.maximum-allocation-mb
= containers * RAM-per-Container
mapred-site.xml
mapreduce.map.memory.mb
= RAM-per-Container
mapred-site.xml
mapreduce.reduce.memory.mb
= 2 * RAM-per-Container
mapred-site.xml
mapreduce.map.java.opts
= 0.8 * RAM-per-Container
mapred-site.xml
mapreduce.reduce.java.opts
= 0.8 * 2 * RAM-per-Container
yarn-site.xml (check)
yarn.app.mapreduce.am.resource.mb
= 2 * RAM-per-Container
yarn-site.xml (check)
yarn.app.mapreduce.am.command-opts
= 0.8 * 2 * RAM-per-Container



By using the calculate script provided by Hortonworks, we get the recommended memory
setting for Yarn.

 Using cores=16 memory=32GB disks=5 hbase=False
 Profile: cores=16 memory=31744MB reserved=1GB usableMem=31GB disks=5
 Num Container=9
 Container Ram=3072MB
 Used Ram=27GB
 Unused Ram=1GB

Set the following properties in /etc/hadoop/conf/yarn-site.xml and mapred-site.xml
But if you config from Cloudera Manager->Configure. Search for each property, and set its value in Cloudera page, instead of xml files. After that, click 'Save change' -> Action->Deploy client configuration -> Restart.

//set the minimum unit of RAM to allocate for a Container
 yarn.scheduler.minimum-allocation-mb=3072
 yarn.scheduler.maximum-allocation-mb=27648

//set the maximum memory Yarn can utilize on each node
 yarn.nodemanager.resource.memory-mb=27648

//since each mapper or reduce runs in a separate container, its memory should be at least as one container
 mapreduce.map.memory.mb=3072
 mapreduce.reduce.memory.mb=3072

//each container run JVMs for the map and reduce tasks, so JVM heap size should be less than map and reduce's as above 75-80%.
 mapreduce.map.java.opts=-Xmx2457m
 mapreduce.reduce.java.opts=-Xmx2457m

 yarn.app.mapreduce.am.resource.mb=3072
 yarn.app.mapreduce.am.command-opts=-Xmx2457m
 mapreduce.task.io.sort.mb=1228

The java.opts parameters are the amount of memory that will be used for map/reduce child tasks. Wheras the memory.mb is the amount of memory that is cared out as a container. You therefore want to ensure enough overhead for MR2 work to run within the container along with the user code. We recommend having the container size larger than the java opts, at 75-80% of the container size.


One other thing, the "Client Java Heap Size in Bytes" is the java client heap size which is the size of the heap that is defined on the node where the job is submitted. So for example there is an edge node that the customer runs the job from, the client java heap size sets the amount of java heap that this particular node will get when it runs the job. Typically the client java heap size doesn't need to be larger than the default, rarely larger than 1-2GB


yarn.nodemanager.resource.cpu-vcores=8 (CPU Cores to use in each for each node by all containers)

Reducing the number of maximum vcores helps to limit the no of containers being created on a node.
Containers are limited by a combination of cpu-vcores setting and memory. If you only allocate 1 vcore, you'll only get 1 container, no matter how much RAM.

You are also limited by the number of vcores a given NodeManager has.  By default each task uses 1 vcore, so if your NodeManager has 24 vcores, it can never have more than 24 containers regardless of how much memory it has.

The disk latency became the bottleneck at the higher vcores settings (spiking to 500 millis). 

You should also consider using CM Role Groups so that you can set some of your nodes to 16 and the others to 24.  You can find out more about role groups here: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Managing-Clusters/cmmc_role_grps.html


Reference:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

No comments:

Post a Comment