【发布时间】:2020-01-09 02:16:38
【问题描述】:
我有以下 AWS Athena 创建表语句:
CREATE EXTERNAL TABLE IF NOT EXISTS s2cs3dataset.s2c_storage (
`MessageHeader` string,
`TimeToProcess` float,
`KeyCreated` string,
`KeyLastTouch` string,
`CreatedDateTime` string,
`TableReference` array<struct<`BusinessObject`: string,
`TransactionType`: string,
`ReferenceKeyId`: float,
`ReferencePrimaryKey`: string,
`IncludedTables`: array<string>>>,
`SAPStoreReference` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' ) LOCATION 's3://api-dev-dpstorage-s3/S2C_INPUT/storage/' TBLPROPERTIES ('has_encrypted_data'='false');
据此,我想通过此查询选择以下项目:
SELECT MessageHeader,
TimeToProcess,
KeyCreated,
KeyLastTouch,
CreatedDateTime,
tr.BusinessObject,
tr.TransactionType,
tr.ReferencePrimaryKey,
it.IncludedTables,
SAPStoreReference
FROM s2c_storage
cross join UNNEST(s2c_storage.tablereference) as p(tr)
cross join UNNEST(tr.IncludedTables) as p(it)
但是我收到以下错误:
SYNTAX_ERROR:第 9:1 行:表达式“it”不是 ROW 类型
如果我删除底部交叉连接和引用它的列,查询工作正常,所以在尝试解压缩结构数组中字符串数组的 JSON 数据时我做错了。有小费吗?
【问题讨论】:
-
SELECT typeof(tr.IncludedTables), tr.IncludedTables FROM s2c_storage CROSS JOIN UNNEST(s2c_storage.tablereference) AS p(tr) LIMIT 1的输出是什么? -
_col0 包含的表数组(varchar)[PLU]
-
如果
tr.IncludedTables是array(varchar),那么在UNNEST(tr.IncludedTables) as p(it)之后,it是varchar。在您的查询中将it.IncludedTables替换为it- 这有帮助吗? -
这对 Piotr 有效。你能帮我理解它为什么起作用/有帮助吗?
-
我添加了一些解释作为答案。
标签: json presto amazon-athena