【问题标题】:Unnesting Multiple Columns in BigQuery在 BigQuery 中取消嵌套多个列
【发布时间】:2020-05-13 18:49:02
【问题描述】:

我在处理一个 UNNEST 查询时遇到问题。下面是一个查询示例,我当前得到的结果以及我希望从中得到的结果。

一点上下文,我目前正在上传的内容是将ID'sA 隔开,而不是, 将其强制为字符串而不是数字,因为其中有多个ID同一个单元格。 Price 也被 , 隔开。正在上传的数据示例:

    Name    |    Date    |   Item_ID   |  Price
    John    |  4/17/2020 | 123A456A678 | 19.99,21.99,30.00
    Joe     |  4/17/2020 | 555A777A888 | 8.99,10.00,15.99
    Jake    |  4/18/2020 |   444A333   | 15.99,9.00
    John    |  4/18/2020 |     432     | 75.99
    Megan   |  4/18/2020 | 12A890A23A99| 5.99,6.99,9.99,10.00

这是尝试 UNNEST 之前表中数据的示例。下面是当前 UNNEST 查询的示例以及输出示例。

With data AS(
  SELECT
    Name,
    Date,
    SPLIT(Item_ID, 'A') AS Item_ID_Split,
    SPLIT(Price, ',') AS Price_Split
FROM
  Example.Table
SELECT
  Name,
  Date,
  Item_ID_Split,
  Price_Split
FROM data,
UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
UNNEST(Price_Split) Price_Split WITH OFFSET pos2

当前输出如下所示:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   19.99
    John   | 4/17/2020 |      678       |   19.99
    John   | 4/17/2020 |      123       |   21.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   21.99
    John   | 4/17/2020 |      123       |   30.00
    John   | 4/17/2020 |      456       |   30.00
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   8.99
    Joe    | 4/17/2020 |      888       |   8.99
    Joe    | 4/17/2020 |      555       |   10.00
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   10.00
    Joe    | 4/17/2020 |      555       |   15.99
    Joe    | 4/17/2020 |      777       |   15.99
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   15.99
    Jake   | 4/18/2020 |      444       |   9.00
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   5.99
    Megan  | 4/18/2020 |      23        |   5.99
    Megan  | 4/18/2020 |      99        |   5.99
    Megan  | 4/18/2020 |      12        |   6.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   6.99
    Megan  | 4/18/2020 |      99        |   6.99
    Megan  | 4/18/2020 |      12        |   9.99
    Megan  | 4/18/2020 |      890       |   9.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   9.99
    Megan  | 4/18/2020 |      12        |   10.00
    Megan  | 4/18/2020 |      890       |   10.00
    Megan  | 4/18/2020 |      23        |   10.00
    Megan  | 4/18/2020 |      99        |   10.00

这是上面查询的当前输出。如您所见,有重复的 Item_IDs/Prices,我想要的结果如下:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   10.00

这是我正在寻找在Item_ID_SplitPrice_Split 之间完全没有重复的结果。我试图将SPLIT 函数放在UNNEST 中,但我得到了相同的输出。我不完全确定如何做到这一点,所以任何帮助将不胜感激!

提前谢谢你!

【问题讨论】:

    标签: sql google-bigquery


    【解决方案1】:

    你可以使用with offset:

    SELECT Name, Date, Item_ID_Split, Price_Split
    FROM data LEFT JOIN
         UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
         ON 1=1 LEFT JOIN
         UNNEST(Price_Split) Price_Split WITH OFFSET pos2
         ON pos1 = po2;
    

    【讨论】:

    • 这成功了!!!我在第二个UNNEST 之前的ON 1=1 下添加了LEFT JOIN,因为我假设这就是您的意思。但这完美解决了问题,非常感谢!我真的很感激!
    【解决方案2】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT Name, Day, Splits.*
    FROM (
      SELECT Name, Day, 
        ARRAY(
          SELECT AS STRUCT Item_ID_Split, Price_Split
          FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
          JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
          USING(OFFSET)
        ) AS arr
      FROM `project.dataset.table`
    ), UNNEST(arr) Splits   
    

    如果要应用到您的问题中的示例数据,如下例所示

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT 'John' Name, '4/17/2020' Day, '123A456A678' Item_ID,'19.99,21.99,30.00' Price UNION ALL
      SELECT 'Joe', '4/17/2020', '555A777A888','8.99,10.00,15.99' UNION ALL
      SELECT 'Jake', '4/18/2020', '444A333','15.99,9.00' UNION ALL
      SELECT 'John', '4/18/2020', '432','75.99' UNION ALL
      SELECT 'Megan', '4/18/2020', '12A890A23A99','5.99,6.99,9.99,10.00' 
    )
    SELECT Name, Day, Splits.*
    FROM (
      SELECT Name, Day, 
        ARRAY(
          SELECT AS STRUCT Item_ID_Split, Price_Split
          FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
          JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
          USING(OFFSET)
        ) AS arr
      FROM `project.dataset.table`
    ), UNNEST(arr) Splits   
    

    输出是

    Row Name    Day         Item_ID_Split   Price_Split  
    1   John    4/17/2020   123             19.99    
    2   John    4/17/2020   456             21.99    
    3   John    4/17/2020   678             30.00    
    4   Joe     4/17/2020   555             8.99     
    5   Joe     4/17/2020   777             10.00    
    6   Joe     4/17/2020   888             15.99    
    7   Jake    4/18/2020   444             15.99    
    8   Jake    4/18/2020   333             9.00     
    9   John    4/18/2020   432             75.99    
    10  Megan   4/18/2020   12              5.99     
    11  Megan   4/18/2020   890             6.99     
    12  Megan   4/18/2020   23              9.99     
    13  Megan   4/18/2020   99              10.00   
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-10-01
      • 1970-01-01
      • 2022-01-10
      • 2020-03-14
      • 2020-08-16
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多