【发布时间】:2020-08-16 23:41:25
【问题描述】:
我正在尝试在 Bigquery 中取消嵌套多个嵌套数组,对其进行过滤并将新数组重新组合在一起。我的问题是,我最终得到了重复的值。
示例数据:
图像有两个示例数据行,数组“vendor”包含两个数组“topic”和“categories”
我想过滤 vendor.topic.score >= 0.8、vendor.categories.score >= 0.8 并去掉 vendor.topic.position 列。
结果应如下所示:
首先,我尝试使用每个数组的多个 unnest 来解决它,但这会给我在新创建的数组中重复值:
SELECT
id,
ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
vendor_topics.score AS topics_score)),
ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
vendor_categories.score AS category_score))
FROM
`source_table`,
UNNEST(vendor.topics) vendor_topics,
UNNEST(vendor.categories) vendor_categories
WHERE
AND vendor_categories.score >= 0.8
AND vendor_topics.score >= 0.8
GROUP BY
1
LIMIT
10
接下来我尝试使用子查询,结果显示“超出 API 限制:无法返回超出 API 限制的行。要检索该行,请导出表。”
SELECT
id,
(
SELECT
ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
vendor_topics.score AS topics_score))
FROM
`source_table` articles,
UNNEST(vendor.topics) vendor_topics
WHERE
vendor_topics.score >= 0.8),
(
SELECT
ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
vendor_categories.score AS category_score))
FROM
`source_table`,
UNNEST(vendor.categories) vendor_categories
WHERE
vendor_categories.score >= 0.8)
FROM
`source_table`
GROUP BY
1
现在我没有想法,希望有人能帮我解决这个问题。
【问题讨论】:
标签: arrays google-bigquery unnest