【问题标题】:Pandas groupby over fixed time periodPandas groupby 在固定时间段内
【发布时间】:2020-12-19 15:35:48
【问题描述】:

我有一份客户、日期和分数的列表:

import pandas as pd
import datetime as dt
import numpy as np
data = pd.DataFrame(
        np.array(
            [
                ["A", dt.datetime(2017, 12, 10), 10.0],
                ["A", dt.datetime(2018, 1, 10), 10.0],
                ["A", dt.datetime(2018, 1, 15), 11.0],
                ["A", dt.datetime(2018, 1, 16), 12.0],
                ["A", dt.datetime(2018, 1, 16), 13.0],
                ["B", dt.datetime(2018, 1, 16), 10.0],
                ["A", dt.datetime(2018, 3, 1), 10.0],
            ]
        ),
        columns=["Customer", "Date", "Score", "Result"],
    )

Customer    Date    Score
0   A   2017-12-10 00:00:00 10
1   A   2018-01-10 00:00:00 10
2   A   2018-01-15 00:00:00 11
3   A   2018-01-16 00:00:00 12
4   A   2018-01-16 00:00:00 13
5   B   2018-01-16 00:00:00 10
6   A   2018-03-01 00:00:00 10

对于每个客户,我想计算过去 14 天(包括今天)的平均得分。结果应如下所示:

    Customer    Date    Score   Result
0   A   2017-12-10 00:00:00 10  10
1   A   2018-01-10 00:00:00 10  10
2   A   2018-01-15 00:00:00 11  10.5
3   A   2018-01-16 00:00:00 12  11.5
4   A   2018-01-16 00:00:00 13  11.5
5   B   2018-01-16 00:00:00 10  10
6   A   2018-03-01 00:00:00 10  10

谢谢!!

【问题讨论】:

标签: python pandas dataframe datetime


【解决方案1】:

Customer 上使用DataFrame.groupby 并在Score 上计算14 days 窗口大小的rolling 平均值,然后使用DataFrame.merge 将此滚动avg 与数据框data 合并:

avg = data.set_index('Date').groupby('Customer').rolling('14d')['Score'].mean()
avg = avg[~avg.index.duplicated(keep='last')]

df = data.merge(avg.rename('Result'), left_on=['Customer', 'Date'], right_index=True)

结果:

print(df)
  Customer       Date Score  Result
0        A 2017-12-10    10    10.0
1        A 2018-01-10    10    10.0
2        A 2018-01-15    11    10.5
3        A 2018-01-16    12    11.5
4        A 2018-01-16    13    11.5
5        B 2018-01-16    10    10.0
6        A 2018-03-01    10    10.0

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-04-10
    • 2022-01-15
    • 1970-01-01
    • 2015-11-10
    • 1970-01-01
    • 2016-06-23
    • 1970-01-01
    • 2018-12-08
    相关资源
    最近更新 更多