【问题标题】:Spark job fails in yarn-cluster modeSpark 作业在纱线集群模式下失败
【发布时间】:2023-04-08 10:23:01
【问题描述】:

我的工作在 yarn-client 模式下的 spark 完美运行,但在 yarn-cluster 模式下失败,提示错误“文件不存在:hdfs://192.xxx.x.x:port/user /hduser/.sparkStaging/application_1442810383301_0016/pyspark.zip”。 虽然它显示它已经把文件上传到上面的目录了!! 可能是什么原因??

这是完整的错误日志:

 Application application_1449548654695_0003 failed 2 times due to AM Container for appattempt_1449548654695_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://server1:8088/cluster/app/application_1449548654695_0003Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://192.168.0.109:54310/user/hduser/.sparkStaging/application_1449548654695_0003/pyspark.zip
java.io.FileNotFoundException: File does not exist: hdfs://192.168.0.109:54310/user/hduser/.sparkStaging/application_1449548654695_0003/pyspark.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.

【问题讨论】:

    标签: apache-spark


    【解决方案1】:

    能否提供完整的错误日志?

    您是否也将应用程序的 master 设置为“yarn-cluster”?你可以像这样为python做:

    conf = SparkConf().setAppName(appName).setMaster("yarn-cluster")
    sc = SparkContext(conf=conf)
    

    【讨论】:

    • 我已按要求添加了日志文件。请注意它。 @Ton Torres
    • 你试过我的建议了吗?您如何提交 Spark 作业?
    • 我使用 spark-submit 脚本。 ./spark-submit --master yarn-cluster code.py
    • 是的,但是在初始化 spark 上下文的实际代码中,您必须将 master 设置为 YARN-cluster。
    • 好的。感谢您的信息,我会尝试在我的代码中这样做。
    【解决方案2】:

    我通过以下方式解决了它:

    // i.e. remove the .setMaster("yarn-cluster")
    SparkConf conf = new SparkConf().setAppName("hello-spark");
    

    以及命令行中传入的master参数:

    ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]
    

    【讨论】:

      【解决方案3】:

      对我来说,添加设置 Hadoop 配置文件没有帮助: HADOOP_CONF_DIR=/etc/hadoop YARN_CONF_DIR =/etc/Hadoop 相反,关键是 spark.hadoop.fs.defaultFS 必须在 Python 内部的 SparkConf 中设置。 下面是我的代码。在运行它之前,我为资源管理器的主机和 HDFS 文件系统设置了我的环境变量。

      from pyspark import SparkConf, SparkContext
      
      def test():
          print('Hello world')
      
      if __name__ == '__main__':
          _app_name = "DemoApp"
      
          # I define these environment variables before calling
          # e.g., HADOOP_RM_HOST='myhost.edu'
          _rm_host = os.environ['HADOOP_RM_HOST']
          _fs_host = os.environ['HADOOP_FS_HOST']
      
          # It's written that these environment variables should be set, but don't do anything for my Python
          # Adding the core-site.xml, yarn-site.xml etc. to Python path doesn't do anything for my Python
          # HADOOP_CONF_DIR=/etc/hadoop
          # YARN_CONF_DIR =/etc/hadoop
      
          # Run without Yarn, max threads
          local_conf = SparkConf().setAppName(_app_name) \
              .setMaster("local[*]")
      
          # If you have bad substitution error: https://medium.com/@o20021106/run-pyspark-on-yarn-c7cd04b87d81
          # There must be a hdfs://user/ID directory for the ID that this runs under (owned by ID)
          # https://www.youtube.com/watch?v=dN60fkxABZs
          # spark.hadoop.fs.defaultFS is required so that the files will be copied to the cluster
          # If the cluster doesn't dynamically allocate executors, then .set("spark.executor.instances", "4")
          yarn_conf = SparkConf().setAppName(_app_name) \
                          .setMaster("yarn") \
                          .set("spark.executor.memory", "4g") \
                          .set("spark.hadoop.fs.defaultFS", "hdfs://{}:8020".format(_fs_host)) \
                          .set("spark.hadoop.yarn.resourcemanager.hostname", _rm_host)\
                          .set("spark.hadoop.yarn.resourcemanager.address", "{}:8050".format(_rm_host))
      
          sc = SparkContext(conf=yarn_conf)
      
          test()
      

      【讨论】:

        猜你喜欢
        • 2016-04-09
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-10-22
        • 1970-01-01
        • 2016-04-18
        • 1970-01-01
        • 2021-12-23
        相关资源
        最近更新 更多