【发布时间】:2022-01-01 22:00:58
【问题描述】:
我已通过启用诊断设置将数据块日志发送到存储帐户,现在我必须使用 azure 数据块读取这些日志以进行高级分析。当我尝试安装路径时它可以工作但读取不会工作。
step 1-
containerName = "insights-logs-jobs"
storageAccountName = "smk"
config = "fs.azure.sas." + containerName+ "." + storageAccountName + ".blob.core.windows.net"
sas = "sp=r&st=2021-12-07T08:07:08Z&se=2021-12-07T16:07:08Z&spr=https&sv=2020-08-04&sr=b&sig=3skdlskdlkf5tt3FiR%2FLM%3D"
spark.conf.set(config,sas)
step 2
df = spark.read.json("wasbs://insights-logs-jobs.gtoollogging.blob.core.windows.net/resourceId=/SUBSCRIPTIONS/xxxBD-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xxxx-xxx-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xxx-ADB/y=2021/m=12/d=07/h=00/m=00/*.json")
Getting below error
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container $root in account insights-logs-jobs.gtjjjng.blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration.
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:796)
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorage.
尝试了很多方法,但都低于错误。 [![在此处输入图片描述][1]][1]
【问题讨论】:
-
请确认存储账户中的数据格式。大多数集群日志以 parquet 格式存储。
-
不,它在 json 文件中生成 yy/mm/dd/hh 格式为 json。这是以下路径 resourceId=/SUBSCRIPTIONS/dklgd-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xyz- PROD-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xyz-PROCESS-PROD-ADB/y=2021/m=10/d=07/h=10/m=00/PT1H.JSON
标签: azure logging pyspark azure-blob-storage azure-databricks