【问题标题】:Big Query: Join single latest row from second table大查询:加入第二个表中的单个最新行
【发布时间】:2020-08-10 20:51:53
【问题描述】:

我有两张桌子。一个是Orders 的列表,一个是Events 的列表。

对于每个Order,我想加入在created_atcreated_at 之前发生的最后一个Event(使用clicked_at)。

我已经尝试了多种方法来使其正常工作,并在 Stack Overflow 上尝试了其他几个答案,但我正在努力返回正确的数据。

我心目中子查询的 sudo 逻辑是这样的:

SELECT campaign, user_id, created_at 
FROM `Events`
WHERE order.user_id = user_id AND clicked_at < order.created_at
ORDER created_at DESC
LIMIT 1

请看下面的示例数据:

# Orders

| order_id | user_id | created_at |
-----------------------------------
| 123      | abc     | 2020-07-04 |
| 456      | abc     | 2020-05-01 |


# Events

| campaign | keyword  | user_id | clicked_at |
----------------------------------------------
| facebook | shoes    | abc     | 2020-07-03 |
| google   | hair     | abc     | 2020-07-01 |

我想要的结果

# Orders with campaign attribution

| order_id | user_id | created_at | campaign | keyword  |
---------------------------------------------------------
| 123      | abc     | 2020-07-04 | facebook | shoes    |
| 456      | abc     | 2020-06-04 | null     | null     | 

谢谢! 亚历克斯

【问题讨论】:

    标签: mysql google-bigquery


    【解决方案1】:
    with orders as (
      select 123 as order_id, 'abc' as user_id, cast('2020-07-04' as date) as created_at union all
      select 456, 'abc', '2020-05-01'
    ),
    events as (
      select 'facebook' as campaign, 'shoes' as keyword, 'abc' as user_id, cast('2020-07-03' as date) as clicked_at union all
      select 'google', 'hair', 'abc', '2020-07-01'
    ),
    logic as (
      select
        orders.order_id, 
        orders.user_id, 
        orders.created_at, 
        events.clicked_at,
        events.campaign, 
        events.keyword, 
        row_number() over (partition by orders.order_id order by events.clicked_at desc) as rn
      from orders
      left join events 
      on orders.user_id = events.user_id and events.clicked_at < orders.created_at
    )
    select * except(rn)
    from logic 
    where rn = 1
    

    【讨论】:

      【解决方案2】:

      以下是 BigQuery 标准 SQL

      #standardSQL
      SELECT a.*, campaign, keyword
      FROM  `project.dataset.orders` a
      LEFT JOIN (
        SELECT  
          ANY_VALUE(o).*, 
          ARRAY_AGG(STRUCT(campaign, keyword) ORDER BY clicked_at DESC LIMIT 1)[OFFSET(0)].*
        FROM `project.dataset.orders` o
        JOIN `project.dataset.events` e
        ON o.user_id = e.user_id
        AND clicked_at < created_at
        GROUP BY FORMAT('%t', o)
      )
      USING(order_id)   
      

      如果应用于我们问题的样本数据 - 结果是

      Row order_id    user_id created_at  campaign    keyword  
      1   123         abc     2020-07-04  facebook    shoes    
      2   456         abc     2020-05-01  null        null     
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-12-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-04-19
        • 1970-01-01
        • 2015-08-03
        • 1970-01-01
        相关资源
        最近更新 更多