【问题标题】:Unnest multiple arrays in Bigquery and aggregate again在 Bigquery 中取消嵌套多个数组并再次聚合
【发布时间】:2020-08-16 23:41:25
【问题描述】:

我正在尝试在 Bigquery 中取消嵌套多个嵌套数组,对其进行过滤并将新数组重新组合在一起。我的问题是,我最终得到了重复的值。

示例数据:

图像有两个示例数据行,数组“vendor”包含两个数组“topic”和“categories”

我想过滤 vendor.topic.score >= 0.8、vendor.categories.score >= 0.8 并去掉 vendor.topic.position 列。

结果应如下所示:

首先,我尝试使用每个数组的多个 unnest 来解决它,但这会给我在新创建的数组中重复值:

SELECT
  id,
  ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
      vendor_topics.score AS topics_score)),
  ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
      vendor_categories.score AS category_score))
FROM
  `source_table`,
  UNNEST(vendor.topics) vendor_topics,
  UNNEST(vendor.categories) vendor_categories
WHERE
  AND vendor_categories.score >= 0.8
  AND vendor_topics.score >= 0.8
GROUP BY
  1
LIMIT
  10

接下来我尝试使用子查询,结果显示“超出 API 限制:无法返回超出 API 限制的行。要检索该行,请导出表。”

SELECT
  id,
  (
  SELECT
    ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
        vendor_topics.score AS topics_score))
  FROM
    `source_table` articles,
    UNNEST(vendor.topics) vendor_topics
    WHERE
  vendor_topics.score >= 0.8),

    (
  SELECT
    ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
      vendor_categories.score AS category_score))
  FROM
    `source_table`,
    UNNEST(vendor.categories) vendor_categories
    WHERE
  vendor_categories.score >= 0.8)
FROM
  `source_table`
GROUP BY
  1

现在我没有想法,希望有人能帮我解决这个问题。

【问题讨论】:

    标签: arrays google-bigquery unnest


    【解决方案1】:

    我还以两种方式构建您的示例数据,不确定供应商是数组还是不是数组。从中您可能会遇到并发症。

    第一个示例供应商是数组

    #standardSQL
    WITH `yourTable` AS (
      select 111 as id, (select 
              array(select 
                       struct(array(select struct('A' as label, 0.1 as score,2 as position) 
                             union all select struct('B' as label, 0.9 as score,5 as position)
                             union all select struct('C' as label, 0.9 as score,8 as position)
                                  ) as topic,
                                  array(select struct('Cat1' as label, 0.8 as score) 
                             union all select struct('Cat2' as label, 0.3 as score)
                                  ) as categories 
                              )
                      )) as vendor
        union all 
        select 222 as id, (select 
              array(select 
                       struct(array(select struct('X' as label, 0.3 as score,2 as position) 
                             union all select struct('Y' as label, 0.9 as score,3 as position)
                                  ) as topic,
                                  array(select struct('Cat33' as label, 0.9 as score) 
                             union all select struct('Cat99' as label, 0.4 as score)
                             union all select struct('Cat44' as label, 0.85 as score)
                                  ) as categories 
                              )
                      )) as vendor
    )
     ------
      SELECT id,array(
      select struct(
                  (select array_agg(t) as topic from unnest(vendor),unnest(topic) t where t.score>=0.8) as topic,
                  (select array_agg(t) as categories from unnest(vendor),unnest(categories) t where t.score>=0.8) as categories
                )
       ) as vendor2   from yourTable
    

    这会返回:

    基本上,您需要阅读的内容是: - 您正在选择带有idvendor2 的行 -本质上vendor2是一个数组(第二个例子跳过这个) - 然后你需要两个键作为结构 topiccategories - topiccategories 是一个结构数组。

    第二个示例(其中 vendor 不是数组):

    #standardSQL
    WITH `yourTable` AS (
      select 111 as id, (select 
                       struct(array(select struct('A' as label, 0.1 as score,2 as position) 
                             union all select struct('B' as label, 0.9 as score,5 as position)
                             union all select struct('C' as label, 0.9 as score,8 as position)
                                  ) as topic,
                                  array(select struct('Cat1' as label, 0.8 as score) 
                             union all select struct('Cat2' as label, 0.3 as score)
                                  ) as categories 
                              )
                      ) as vendor
        union all 
        select 222 as id, (select 
                       struct(array(select struct('X' as label, 0.3 as score,2 as position) 
                             union all select struct('Y' as label, 0.9 as score,3 as position)
                                  ) as topic,
                                  array(select struct('Cat33' as label, 0.9 as score) 
                             union all select struct('Cat99' as label, 0.4 as score)
                             union all select struct('Cat44' as label, 0.85 as score)
                                  ) as categories 
                              )
                      ) as vendor
    )
    
    
     SELECT id,struct(
                  (select array_agg(t) as topic from unnest(vendor.topic) t where t.score>=0.8) as topic,
                  (select array_agg(t) as categories from unnest(vendor.categories) t where t.score>=0.8) as categories
       ) as vendor2   from yourTable
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-03-06
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多