【发布时间】:2017-02-10 13:40:16
【问题描述】:
我们正在运行一个使用 yarn 作为资源管理器的 spark 流作业,注意到这两个目录在数据节点上被填满,当我们只运行几分钟时,我们的空间就用完了
/tmp/hadoop/data/nm-local-dir/filecache
/tmp/hadoop/data/nm-local-dir/filecache
这些目录没有被自动清除,根据我的研究发现这个属性需要设置,yarn.nodemanager.localizer.cache.cleanup.interval-ms
即使在设置后..它不会自动清除任何帮助将不胜感激
<configuration>
~
~ <property>
~ <name>yarn.nodemanager.aux-services</name>
~ <value>mapreduce_shuffle</value>
~ </property>
~
~ <property>
~ <name>yarn.resourcemanager.hostname</name>
~ <value>hdfs-name-node</value>
~ </property>
~
~ <property>
~ <name>yarn.nodemanager.resource.memory-mb</name>
~ <value>16384</value>
~ </property>
~
~ <property>
~ <name>yarn.nodemanager.resource.cpu-vcores</name>
~ <value>6</value>
~ </property>
~
~ <property>
~ <name>yarn.scheduler.maximum-allocation-mb</name>
~ <value>16384</value>
~ </property>
<property>
~ <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
~ <value>3000</value>
~ </property>
~
~ <!-- Needs to be explicitly set as part of a workaround for YARN-367.
~ | If changing this property, you must also change the
~ | hadoop.tmp.dir property in hdfs-site.xml. This location must always
~ | be a subdirectory of the location specified in hadoop.tmp.dir. This
~ | affects all versions of Yarn 2.0.0 through 2.7.3+. -->
~ <property>
~ <name>yarn.nodemanager.local-dirs</name>
~ <value>file:///tmp/hadoop/data/nm-local-dir</value>
~ </property>
~
~ </configuration>
【问题讨论】:
标签: hadoop spark-streaming hadoop-yarn