【发布时间】:2016-05-31 22:35:45
【问题描述】:
输入文件大小:75GB
映射器数量:2273
reducer 数量:1(如网页界面所示)
分割数:2273
输入文件数:867
集群:Apache Hadoop 2.4.0
5 个节点集群,每个 1TB。
1 个主节点和 4 个数据节点。
已经 4 小时了。现在,仍然只完成了 12% 的地图。只是想知道我的集群配置是否有意义或者配置有什么问题?
Yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource- tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>32</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
我使用多个输出的 Map-Reduce 作业。所以reducer会发出多个文件。每台机器都有 15GB 内存。运行的容器数为 8。RM Web UI 中可用的总内存为 32GB。
感谢任何指导。提前致谢。
【问题讨论】:
-
您能提供有关您正在运行的工作类型的信息吗?还有每台机器上可用的 RAM 是多少。您能否登录到资源管理器 UI 并检查集群可用的总内存以及并行运行的容器数量。我怀疑这项工作正在利用资源。
-
@shivanand pawar :我使用多个输出的 Map-Reduce 工作。所以我会有多个文件。每台机器都有 15GB 内存。运行的容器有 8 个。可用总内存为 32GB。
标签: apache hadoop mapreduce cluster-computing hadoop-yarn