Setting Container Memory
Controlling container memory takes place through three important values in the
yarn-site.xml
file:yarn.nodemanager.resource.memory-mb
is the amount of memory the NodeManager can use for containers.yarn.scheduler.minimum-allocation-mb
is the smallest container allowed by the ResourceManager. A requested
container smaller than this value will result in an allocated container
of this size (default 1024 MB).yarn.scheduler.maximum-allocation-mb
is the largest container allowed by the ResourceManager (default 8192 MB).Setting Container Cores
It is possible to set the number of cores for containers using the following properties in the
yarn-stie.xml
:yarn.scheduler.minimum-allocation-vcores
is the minimum number of cores a container can be requested to have.yarn.scheduler.maximum-allocation-vcores
is the maximum number of cores a container can be requested to have.yarn.nodemanager.resource.cpu-vcores
is the number of cores that containers can request from this node.Setting MapReduce Properties
Since MapReduce now runs as a YARN application, it may be necessary to adjust some of the
mapred-site.xml
properties as they relate to the map and reduce containers. The
following properties are used to set some Java arguments and memory size
for both the map and reduce containers:mapred.child.java.opts
provides a larger or smaller heap size for child JVMs of maps (e.g., --Xmx2048m
).mapreduce.map.memory.mb
provides a larger or smaller resource limit for maps (default = 1536 MB)mapreduce.reduce.memory.mb
provides a resource-limit for child JVMs of maps (default = 3072 MB)mapreduce.reduce.java.opts
provides a larger or smaller heap size for child reducers.Calculating the Capacity of a Node
Since YARN has now removed the hard-partitioned
mapper and reducer slots of Hadoop version 1, new capacity calculations
are required. There are eight important parameters for calculating a
node’s capacity; they are found in the
mapred-site.xml
and yarn-site.xml
files.mapred-site.xml
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
The hard limit enforced by Hadoop on the mapper or reducer task.
mapreduce.map.java.opts
mapreduce.reduce.java.opts
The heap size of the
jvm –Xmx
for the mapper or reducer task. Remember to leave room for the JVM Perm
Gen and Native Libs used. This value should always be smaller than mapreduce.[map|reduce].memory.mb
.(75%-80%)yarn-site.xml
yarn.scheduler.minimum-allocation-mb
The smallest container YARN will allow.
yarn.scheduler.maximum-allocation-mb
The largest container YARN will allow.
yarn.nodemanager.resource.memory-mb
The amount of physical memory (RAM) on the
compute node for containers. It is important that this value isn’t the
total RAM on the node, as other Hadoop services also require RAM.
yarn.nodemanager.vmem-pmem-ratio
The amount of virtual memory each container is allowed. This is calculated by the following formula:
containerMemoryRequest*vmem-pmem-ratio
.
As an example, consider a configuration with the settings in below Table.
Using these settings, we have given each map and reduce task a generous
512 MB of overhead for the container, as seen with the difference
between the
mapreduce.[map|reduce].memory.mb
and the mapreduce.[map|reduce].java.opts
.
Next YARN has been configured to allow a container no smaller than 512
MB and no larger than 4 GB; the compute nodes have 36 GB of RAM
available for containers. With a virtual memory ratio of 2.1 (the
default value), each map can have as much as 3225.6 MB and a reducer can
have 5376 MB of virtual memory. Thus our compute node configured for 36
GB of container space can support up to 24 maps or 14 reducers, or any
combination of mappers and reducers allowed by the available resources
on the node.
Some notes about mapper and reducer memory in YARN
1. A job can have 'x' number of mappers and 'y' number of reducers
2. Depending on the data locality and resource availability, any given node may run more than one mapper and/or reducer tasks to get the work done
3. A map or reduce task runs in one container. The memory needed for such a task should fit in container's memory
4. Unless we set-up container re-use explicitly, they are not shared across tasks
No comments:
Post a Comment