【发布时间】:2021-10-04 09:05:29
【问题描述】:
我在 azure 中使用突触。我在无服务器 sql 池中有数据。我想将该数据导入数据块中的数据框。
我收到以下错误:
Py4JJavaError: An error occurred while calling o568.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.sqldw. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:656)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:195)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:168)
at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.sqldw.DefaultSource
...
...
...
我使用的pyspark代码是:
spark.conf.set(
"fs.azure.account.key.adlsAcct.blob.core.windows.net",
"GVk3234fds2JX/fahOcjig3gNy198yasdhfkjasdyf87HWmDVlx1wLRmu7asdfaP3g==")
sc._jsc.hadoopConfiguration().set(
"fs.azure.account.key.adlsAcct.blob.core.windows.net",
"GVk3234fds2JX/fahOcjig3gNy198yasdhfkjasdyf87HWmDVlx1wLRmu7asdfaP3g==")
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url","jdbc:sqlserver://synapse-myworkspace-ondemand.sql.azuresynapse.net:1433;database=myDB;user=myUser;password=userPass123;encrypt=false;trustServerCertificate=true;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;") \
.option("tempdir", "wasbs://projects@adlsAcct.dfs.core.windows.net/Lakehouse/tempDir") \
.option("forwardSparkAzureStorageCredentials","true") \
.option("dbtble","tbl_sampledata") \
.load()
我可以确认:
- 已配置允许 Azure 服务连接的防火墙设置。
- 用户有权访问 sql serverless pool 数据库。
- 我尝试过集成身份验证,得到了相同的结果。
在我看来,错误看起来像 databricks 找不到格式 com.databricks.spark.sqldw,但这可能是一个红鲱鱼。
感谢任何建议和专业知识
【问题讨论】:
标签: azure-databricks azure-synapse