【问题标题】:Convert SQL Lag over PartitionBy query to pandas将 SQL Lag over PartitionBy 查询转换为 pandas
【发布时间】:2021-04-11 03:17:05
【问题描述】:

我有一张包含一些客户购买数据的表格。我想知道顾客在店里的进出时间是多少,我为此编写了下面的 SQL 查询。如何将其转换为 python pandas?

SELECT MyTable.*, 
                   LAG(EventTypeID, 1, 0) 
                      OVER ( PARTITION BY ID,Name
                         ORDER BY Time) AS LastEvent , 
                   LEAD(EventTypeID, 1, 0) 
                      OVER ( PARTITION BY ID,Name
                         ORDER BY Time) AS NextEvent 
                FROM DL.dbo.DataTable MyTable

输入:

+-------------+--------+--------+-------+
| EventTypeID |   ID   |  Name  | Time  |
+-------------+--------+--------+-------+
|           1 | QWERTY | Joseph | 10.20 |
|           1 | QWERTY | Joseph | 10.25 |
+-------------+--------+--------+-------+

想要的结果:

+-------------+--------+--------+-------+-----------+-----------+
| EventTypeID |   ID   |  Name  | Time  | LastEvent | NextEvent |
+-------------+--------+--------+-------+-----------+-----------+
|      1      | QWERTY | Joseph | 10.20 |         0 |         1 |
|      1      | QWERTY | Joseph | 10.25 |         1 |         0 |
+-------------+--------+--------+-------+-----------+-----------+

【问题讨论】:

标签: python sql pandas lag partition-by


【解决方案1】:
df['LastEvent'] = df.sort_values(by=['Time'], ascending=True)\
                       .groupby(['ID','Name'])['EventTypeID'].shift(1)

df['NextEvent'] = df.sort_values(by=['Time'], ascending=True)\
                       .groupby(['ID','Name'])['EventTypeID'].shift(-1)

感谢Lev Gelman 的指导。上面的代码可以解决问题!

【讨论】:

    猜你喜欢
    • 2022-01-19
    • 2021-11-12
    • 1970-01-01
    • 1970-01-01
    • 2015-10-28
    • 1970-01-01
    • 2014-08-14
    • 2015-10-10
    • 2015-05-08
    相关资源
    最近更新 更多