【问题标题】:Error while saving PySpark Model to Blob storage将 PySpark 模型保存到 Blob 存储时出错
【发布时间】:2020-10-08 23:20:03
【问题描述】:

这是我用来将 Pyspark 模型保存到 Azure blob 存储的代码。我能够连接到 blob 并从 Spark 中查看其中的现有文件。 model.write().overwrite().save("wasbs://containername@blobname.blob.core.windows.net/model.model")

Caused by: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Uploads to to public accounts using anonymous access is prohibited.
    at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1587)
    at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:2692)
    at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1754)
    at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1561)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
    at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:230)
    at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
    at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
    at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
    at org.apache.spark.scheduler.Task.run(Task.scala:113)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)

【问题讨论】:

  • 请提供完整的错误堆栈跟踪
  • 您是否使用 SAS 令牌或凭据登录并对 blob 存储执行操作?
  • 我尝试了访问密钥和 SAS 令牌。此外,我能够将 Spark Dataframe 作为 CSV 写入 blob,但 blob 中的 CSV 文件为空。但它并没有抛出上述错误。

标签: apache-spark pyspark databricks azure-databricks


【解决方案1】:

通过在您的代码上方添加以下代码,在笔记本会话 conf 中设置 Blob 存储帐户访问密钥:-

spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

【讨论】:

  • @jeffinjacob 你有机会检查一下吗
猜你喜欢
  • 2020-02-11
  • 2021-03-19
  • 1970-01-01
  • 1970-01-01
  • 2020-05-28
  • 2020-08-25
  • 2018-08-18
  • 2016-08-05
  • 2020-12-01
相关资源
最近更新 更多