The final calculation is to determine the amount of RAM per container:
RAM-per-Container = maximum of (MIN_CONTAINER_SIZE, (Total Available RAM) / Containers))
With these calculations, the YARN and MapReduce configurations can be set:
Configuration File
|
Configuration Setting
|
Value Calculation
|
yarn-site.xml
|
yarn.nodemanager.resource.memory-mb
|
= Containers * RAM-per-Container
|
yarn-site.xml
|
yarn.scheduler.minimum-allocation-mb
|
= RAM-per-Container
|
yarn-site.xml
|
yarn.scheduler.maximum-allocation-mb
|
= containers * RAM-per-Container
|
mapred-site.xml
|
mapreduce.map.memory.mb
|
= RAM-per-Container
|
mapred-site.xml
|
mapreduce.reduce.memory.mb
|
= 2 * RAM-per-Container
|
mapred-site.xml
|
mapreduce.map.java.opts
|
= 0.8 * RAM-per-Container
|
mapred-site.xml
|
mapreduce.reduce.java.opts
|
= 0.8 * 2 * RAM-per-Container
|
yarn-site.xml (check)
|
yarn.app.mapreduce.am.resource.mb
|
= 2 * RAM-per-Container
|
yarn-site.xml (check)
|
yarn.app.mapreduce.am.command-opts
|
= 0.8 * 2 * RAM-per-Container
|
By using the calculate script provided by Hortonworks, we get the recommended memory
setting for Yarn.
Using cores=16 memory=32GB disks=5 hbase=False
Profile: cores=16 memory=31744MB reserved=1GB usableMem=31GB disks=5
Num Container=9
Container Ram=3072MB
Used Ram=27GB
Unused Ram=1GB
Set the following properties in /etc/hadoop/conf/yarn-site.xml and mapred-site.xml
But if you config from Cloudera Manager->Configure. Search for each property, and set its value in Cloudera page, instead of xml files. After that, click 'Save change' -> Action->Deploy client configuration -> Restart.
But if you config from Cloudera Manager->Configure. Search for each property, and set its value in Cloudera page, instead of xml files. After that, click 'Save change' -> Action->Deploy client configuration -> Restart.
//set the minimum unit of RAM to allocate for a Container
yarn.scheduler.minimum-allocation-mb=3072
yarn.scheduler.maximum-allocation-mb=27648
//set the maximum memory Yarn can utilize on each node
//set the maximum memory Yarn can utilize on each node
yarn.nodemanager.resource.memory-mb=27648
//since each mapper or reduce runs in a separate container, its memory should be at least as one container
mapreduce.map.memory.mb=3072
mapreduce.map.memory.mb=3072
mapreduce.reduce.memory.mb=3072
//each container run JVMs for the map and reduce tasks, so JVM heap size should be less than map and reduce's as above 75-80%.
mapreduce.map.java.opts=-Xmx2457m
mapreduce.reduce.java.opts=-Xmx2457m
yarn.app.mapreduce.am.resource.mb=3072
yarn.app.mapreduce.am.command-opts=-Xmx2457m
mapreduce.task.io.sort.mb=1228
The java.opts parameters are the amount of memory that will be used for map/reduce child tasks. Wheras the memory.mb is the amount of memory that is cared out as a container. You therefore want to ensure enough overhead for MR2 work to run within the container along with the user code. We recommend having the container size larger than the java opts, at 75-80% of the container size.
One other thing, the "Client Java Heap Size in Bytes" is the java client heap size which is the size of the heap that is defined on the node where the job is submitted. So for example there is an edge node that the customer runs the job from, the client java heap size sets the amount of java heap that this particular node will get when it runs the job. Typically the client java heap size doesn't need to be larger than the default, rarely larger than 1-2GB
yarn.nodemanager.resource.cpu-vcores=8 (CPU Cores to use in each for each node by all containers)
Reducing the number of maximum vcores helps to limit the no of containers being created on a node.
Containers are limited by a combination of cpu-vcores setting and memory. If you only allocate 1 vcore, you'll only get 1 container, no matter how much RAM.
You are also limited by the number of vcores a given NodeManager has. By default each task uses 1 vcore, so if your NodeManager has 24 vcores, it can never have more than 24 containers regardless of how much memory it has.
The disk latency became the bottleneck at the higher vcores settings (spiking to 500 millis).
You should also consider using CM Role Groups so that you can set some of your nodes to 16 and the others to 24. You can find out more about role groups here: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Managing-Clusters/cmmc_role_grps.html
Reference:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
No comments:
Post a Comment