【问题标题】:Spark job cannot delete its temporary folders at the endSpark 作业最后无法删除其临时文件夹
【发布时间】:2019-05-21 14:37:11
【问题描述】:

我在 Azure HDInsight 中运行了一个 spark 作业,它对数据(位于 ADLS 中)进行一些转换,最后将分区数据写回 Azure Data Lake Store。在处理 spark 作业时,创建一个包含许多名为“_temporary”的子文件夹的文件夹,我想同时计算结果。最后,火花作业删除了这个临时文件夹。在某些情况下,这种删除会失败。

当它失败时会出现以下异常:

 

    ERROR FileFormatWriter: Aborting job null.
    com.microsoft.azure.datalake.store.ADLException: Error deleting directory tree /data/datalake/processed/raw/_temporary
    Operation DELETE failed with exception java.net.SocketTimeoutException : Read timed out
    Last encountered exception thrown after 5 tries. [java.net.SocketTimeoutException,java.net.SocketTimeoutException,java.net.SocketTimeoutException,java.net.SocketTimeoutException,java.net.SocketTimeoutException]
     [ServerRequestId:null]
        at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1194)
        at com.microsoft.azure.datalake.store.ADLStoreClient.deleteRecursive(ADLStoreClient.java:614)
        at org.apache.hadoop.fs.adl.AdlFileSystem.delete(AdlFileSystem.java:574)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:510)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:403)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:364)
        at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:47)
        at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:166)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:213)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
        at com.bosch.ic.spark.dataprocessor.DataProcessor.transformParquetContent(DataProcessor.scala:53)
        at com.bosch.ic.spark.dataprocessor.Application$.main(Application.scala:15)
        at com.bosch.ic.spark.dataprocessor.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
    Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at org.wildfly.openssl.OpenSSLSocket.read(OpenSSLSocket.java:423)
        at org.wildfly.openssl.OpenSSLInputStream.read(OpenSSLInputStream.java:41)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
        at com.microsoft.azure.datalake.store.HttpTransport.makeSingleCall(HttpTransport.java:307)
        at com.microsoft.azure.datalake.store.HttpTransport.makeCall(HttpTransport.java:90)
        at com.microsoft.azure.datalake.store.Core.delete(Core.java:311)
        at com.microsoft.azure.datalake.store.ADLStoreClient.deleteRecursive(ADLStoreClient.java:612)
        ... 34 more 

因此,内部使用的 ADL 客户端似乎无法通过套接字超时异常删除临时文件夹。

这种情况有时会发生,但并非在所有情况下都会发生。有人遇到过同样的问题吗?

你知道如何解决这个问题吗?

谢谢。

【问题讨论】:

    标签: azure apache-spark azure-data-lake azure-hdinsight


    【解决方案1】:

    问题已解决。问题出在 Azure 方面。由于 Azure 中的高网络流量,服务之间的通信出现了问题。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多