Alvin's Big Data Notebook : YARN Container Configuration

Setting Container Memory

Controlling container memory takes place through three important values in the yarn-site.xml file:

yarn.nodemanager.resource.memory-mb is the amount of memory the NodeManager can use for containers.

yarn.scheduler.minimum-allocation-mb is the smallest container allowed by the ResourceManager. A requested container smaller than this value will result in an allocated container of this size (default 1024 MB).

yarn.scheduler.maximum-allocation-mb is the largest container allowed by the ResourceManager (default 8192 MB).

Setting Container Cores

It is possible to set the number of cores for containers using the following properties in the yarn-stie.xml:

yarn.scheduler.minimum-allocation-vcores is the minimum number of cores a container can be requested to have.

yarn.scheduler.maximum-allocation-vcores is the maximum number of cores a container can be requested to have.

yarn.nodemanager.resource.cpu-vcores is the number of cores that containers can request from this node.

Setting MapReduce Properties

Since MapReduce now runs as a YARN application, it may be necessary to adjust some of the mapred-site.xml properties as they relate to the map and reduce containers. The following properties are used to set some Java arguments and memory size for both the map and reduce containers:

mapred.child.java.opts provides a larger or smaller heap size for child JVMs of maps (e.g., --Xmx2048m).

mapreduce.map.memory.mb provides a larger or smaller resource limit for maps (default = 1536 MB)

mapreduce.reduce.memory.mb provides a resource-limit for child JVMs of maps (default = 3072 MB)

mapreduce.reduce.java.opts provides a larger or smaller heap size for child reducers.

Calculating the Capacity of a Node

Since YARN has now removed the hard-partitioned mapper and reducer slots of Hadoop version 1, new capacity calculations are required. There are eight important parameters for calculating a node’s capacity; they are found in the mapred-site.xml and yarn-site.xml files.

mapred-site.xml

mapreduce.map.memory.mb
mapreduce.reduce.memory.mb

The hard limit enforced by Hadoop on the mapper or reducer task.

mapreduce.map.java.opts
mapreduce.reduce.java.opts

The heap size of the jvm –Xmx for the mapper or reducer task. Remember to leave room for the JVM Perm Gen and Native Libs used. This value should always be smaller than mapreduce.[map|reduce].memory.mb.(75%-80%)

yarn-site.xml

yarn.scheduler.minimum-allocation-mb

The smallest container YARN will allow.

yarn.scheduler.maximum-allocation-mb

The largest container YARN will allow.

yarn.nodemanager.resource.memory-mb

The amount of physical memory (RAM) on the compute node for containers. It is important that this value isn’t the total RAM on the node, as other Hadoop services also require RAM.

yarn.nodemanager.vmem-pmem-ratio

The amount of virtual memory each container is allowed. This is calculated by the following formula: containerMemoryRequest*vmem-pmem-ratio.

As an example, consider a configuration with the settings in below Table. Using these settings, we have given each map and reduce task a generous 512 MB of overhead for the container, as seen with the difference between the mapreduce.[map|reduce].memory.mb and the mapreduce.[map|reduce].java.opts.

Next YARN has been configured to allow a container no smaller than 512 MB and no larger than 4 GB; the compute nodes have 36 GB of RAM available for containers. With a virtual memory ratio of 2.1 (the default value), each map can have as much as 3225.6 MB and a reducer can have 5376 MB of virtual memory. Thus our compute node configured for 36 GB of container space can support up to 24 maps or 14 reducers, or any combination of mappers and reducers allowed by the available resources on the node.

Some notes about mapper and reducer memory in YARN

1. A job can have 'x' number of mappers and 'y' number of reducers

2. Depending on the data locality and resource availability, any given node may run more than one mapper and/or reducer tasks to get the work done

3. A map or reduce task runs in one container. The memory needed for such a task should fit in container's memory

4. Unless we set-up container re-use explicitly, they are not shared across tasks

Another good example to configure YARN

https://www.linkedin.com/pulse/article/20140706112523-176301000-yarn-resource-allocation.

Alvin's Big Data Notebook

Sunday, 19 October 2014

YARN Container Configuration

Setting Container Memory

Setting Container Cores

Setting MapReduce Properties

Calculating the Capacity of a Node

No comments:

Post a Comment