【问题标题】:Accessing Azure Data Lake Storage gen2 from Scala从 Scala 访问 Azure Data Lake Storage gen2
【发布时间】:2019-09-10 12:05:37
【问题描述】:

我能够从在 Azure Databricks 上运行的笔记本连接到 ADLS gen2,但无法从使用 jar 的作业连接。除了使用 dbutils 之外,我使用了与笔记本中相同的设置。

我在 Scala 代码中对笔记本中的 Spark conf 使用了相同的设置。

笔记本:

spark.conf.set(
"fs.azure.account.key.xxxx.dfs.core.windows.net",
dbutils.secrets.get(scope = "kv-secrets", key = "xxxxxx"))

spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "true")

spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")

val rdd = sqlContext.read.format
("csv").option("header", 
"true").load(
"abfss://catalogs@xxxx.dfs.core.windows.net/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is 
//required to use toDF function.
val df: DataFrame = rdd.toDF()
// Write file to parquet
df.write.parquet
("abfss://catalogs@xxxx.dfs.core.windows.net/test/Sales.parquet")

Scala 代码:

val sc = SparkContext.getOrCreate()
val spark = SparkSession.builder().getOrCreate()
sc.getConf.setAppName("Test")

sc.getConf.set("fs.azure.account.key.xxxx.dfs.core.windows.net",
"<actual key>")

sc.getConf.set("fs.azure.account.auth.type", "OAuth")

sc.getConf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

sc.getConf.set("fs.azure.account.oauth2.client.id", "<app id>")

sc.getConf.set("fs.azure.account.oauth2.client.secret", "<app password>")

sc.getConf.set("fs.azure.account.oauth2.client.endpoint",
  "https://login.microsoftonline.com/<tenant id>/oauth2/token")

sc.getConf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")

val sqlContext = spark.sqlContext
val rdd = sqlContext.read.format
("csv").option("header", 
"true").load
("abfss://catalogs@xxxx.dfs.core.windows.net/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is 
//required to use toDF function.
val df: DataFrame = rdd.toDF()
println(df.count())
// Write file to parquet

df.write.parquet
("abfss://catalogs@xxxx.dfs.core.windows.net/test/Sales.parquet")

我希望 parquet 文件能够被写入。相反,我收到以下错误: 20 年 4 月 19 日 13:58:40 错误未从用户代码中捕获可抛出:找不到配置属性 xxxx.dfs.core.windows.net。 在 shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:385) 在 shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:802) 在 shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.(AzureBlobFileSystemStore.java:133) 在 shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:103) 在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)

【问题讨论】:

    标签: scala apache-spark azure-data-lake azure-databricks


    【解决方案1】:

    没关系,愚蠢的错误。应该是:

    val sc = SparkContext.getOrCreate()
    val spark = SparkSession.builder().getOrCreate()
    sc.getConf.setAppName("Test")
    
    spark.conf.set("fs.azure.account.key.xxxx.dfs.core.windows.net",
    "<actual key>")
    
    spark.conf.set("fs.azure.account.auth.type", "OAuth")
    
    spark.conf.set("fs.azure.account.oauth.provider.type",
    "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    
    spark.conf.set("fs.azure.account.oauth2.client.id", "<app id>")
    
    spark.conf.set("fs.azure.account.oauth2.client.secret", "<app password>")
    
    spark.conf.set("fs.azure.account.oauth2.client.endpoint",
      "https://login.microsoftonline.com/<tenant id>/oauth2/token")
    
    spark.conf.set
    ("fs.azure.createRemoteFileSystemDuringInitialization", "false")
    

    【讨论】:

      猜你喜欢
      • 2020-07-21
      • 2020-01-13
      • 1970-01-01
      • 1970-01-01
      • 2021-10-02
      • 1970-01-01
      • 2020-01-01
      • 2020-12-01
      • 2019-12-02
      相关资源
      最近更新 更多