【问题标题】:hadoop mapReduce job never finishedhadoop mapReduce 工作从未完成
【发布时间】:2017-11-01 09:13:05
【问题描述】:

我有 hadoop 集群,我正在尝试从我的 java 代码运行 wordcount 作业,该代码使用 REST API 在另一台机器上运行。这是我如何运行工作的方式

  Configuration conf = new Configuration();

  conf.set("yarn.resourcemanager.address", resourceManagerAddress); 
  conf.set("mapreduce.framework.name", "yarn");
  conf.set("fs.default.name", fsDefaultName);

 Job job = Job.getInstance(conf, "Rest WC job2");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
job.submit();

作业被提交到集群,我可以在 hadoop UI 控制台中看到它,但是当查看从属日志时,我可以看到以下内容:

2017-11-01 09:03:21,669 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1509459563039_0017_000001 (auth:SIMPLE)
2017-11-01 09:03:21,676 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1509459563039_0017_01_000001 by user root
2017-11-01 09:03:21,677 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1509459563039_0017
2017-11-01 09:03:21,677 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=10.56.0.93   OPERATION=Start Container Request       TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_1509459563039_0017    CONTAINERID=container_1509459563039_0017_01_000001
2017-11-01 09:03:21,677 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1509459563039_0017 transitioned from NEW to INITING
2017-11-01 09:03:21,677 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1509459563039_0017_01_000001 to application application_1509459563039_0017
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1509459563039_0017 transitioned from INITING to RUNNING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1509459563039_0017_01_000001 transitioned from NEW to LOCALIZING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1509459563039_0017
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.jar transitioned from INIT to DOWNLOADING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.splitmetainfo transitioned from INIT to DOWNLOADING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.split transitioned from INIT to DOWNLOADING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.xml transitioned from INIT to DOWNLOADING
2017-11-01 09:03:21,678 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1509459563039_0017_01_000001
2017-11-01 09:03:21,680 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1509459563039_0017_01_000001.tokens. Credentials list: 
2017-11-01 09:03:21,689 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user root
2017-11-01 09:03:21,690 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1509459563039_0017_01_000001.tokens to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/container_1509459563039_0017_01_000001.tokens
2017-11-01 09:03:21,690 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017 = file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017
2017-11-01 09:03:22,055 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.jar(->/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/filecache/10/job.jar) transitioned from DOWNLOADING to LOCALIZED
2017-11-01 09:03:22,073 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.splitmetainfo(->/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/filecache/11/job.splitmetainfo) transitioned from DOWNLOADING to LOCALIZED
2017-11-01 09:03:22,092 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.split(->/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/filecache/12/job.split) transitioned from DOWNLOADING to LOCALIZED
2017-11-01 09:03:22,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://<master_ip_address>:9000/tmp/hadoop-yarn/staging/root/.staging/job_1509459563039_0017/job.xml(->/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/filecache/13/job.xml) transitioned from DOWNLOADING to LOCALIZED
2017-11-01 09:03:22,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1509459563039_0017_01_000001 transitioned from LOCALIZING to LOCALIZED
2017-11-01 09:03:22,131 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1509459563039_0017_01_000001 transitioned from LOCALIZED to RUNNING
2017-11-01 09:03:22,135 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1509459563039_0017/container_1509459563039_0017_01_000001/default_container_executor.sh]
2017-11-01 09:03:23,755 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1509459563039_0017_01_000001
2017-11-01 09:03:23,768 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 135.8 MB of 2 GB physical memory used; 1.6 GB of 4.2 GB virtual memory used
2017-11-01 09:03:26,770 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 232.3 MB of 2 GB physical memory used; 1.6 GB of 4.2 GB virtual memory used
2017-11-01 09:03:29,772 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:32,773 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:35,775 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:38,777 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:41,778 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:44,780 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:47,781 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used
2017-11-01 09:03:50,784 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4957 for container-id container_1509459563039_0017_01_000001: 296.7 MB of 2 GB physical memory used; 1.7 GB of 4.2 GB virtual memory used

注意最后几行。数字不断增长,工作永远不会完成

在hadoop ui中我可以看到

YarnApplicationState:   ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

系统卡在这个状态。

我可以通过运行 hadoop jar ... 命令从 hadoop master 运行 wordcount 作业,它正确完成,因此集群已配置并正常工作。

可能是什么问题?

谢谢

UPD。主节点上 yarn--resourcemanager 的最后几行

2017-11-01 11:49:04,630 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1509459563039_0020_000001 State change from ALLOCATED to LAUNCHED
2017-11-01 11:49:05,620 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1509459563039_0020_01_000001 Container Transitioned from ACQUIRED to RUNNING

【问题讨论】:

    标签: java hadoop mapreduce hadoop-yarn


    【解决方案1】:

    作业甚至还没有开始执行。您的 YARN 中没有可用的免费容器,因此无法开始工作。

    这不是错误,而是正常的 YARN 应用程序状态转换。

    【讨论】:

    • 我可以使用 hadoop jar 命令从主节点运行相同的作业,因此容器可用
    • 所以你所说的只是来自数据节点或客户端节点,jar 没有被执行?请澄清
    • 我有主节点和从节点。此外,我在同一个网络中有另一个节点。我可以从这个节点运行 hadoop jar 没有任何问题,但是当从同一节点使用 Java 客户端时,提交的作业永远不会结束。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-05-21
    • 1970-01-01
    • 2020-03-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多