【问题标题】:Python, Pandas Dataframe, column that has the summation of whatever before it and starts over for new projects in dfPython,Pandas Dataframe,列,其中包含前面所有内容的总和,并为 df 中的新项目重新开始
【发布时间】:2026-02-03 14:50:01
【问题描述】:

我有这个 DF,我正在尝试获取该行之前的所有书籍总数,并且具有相同的 workDateID。如果 workDateID 发生变化,那么总应该像我的示例一样重新开始。

我将我的 DF 按ID 排序,然后按workDate 排序

|   |workDate   |ID       | Books# | Seller |    
|-------------------------------------------|
| 0 |2020-01-09 |13702    | 10.0   |  Ted   |
| 1 |2020-01-09 |13702    | 20.5   |  Sam   |   
| 2 |2020-01-10 |13702    | 22.0   |  Lili  |   
| 3 |2020-01-10 |13702    | 10.0   |  Ted   |    
| 4 |2020-01-10 |13702    | 30.0   |  John  |  
| 5 |2020-01-10 |23703    | 20.0   |  Fadi  |   
| 6 |2020-01-10 |23703    | 15.0   |  Mo    |    
| 7 |2020-01-10 |23703    | 8.0    |  Samer |     

期望的输出:

|   |workDate   |ID       | Books# |  totalBooks  | Seller |    
|----------------------------------------------------------|
| 0 |2020-01-09 |13702    | 10.0   |    10.0      |  Ted   |
| 1 |2020-01-09 |13702    | 20.5   |    30.5      |  Sam   |    
| 2 |2021-01-10 |13702    | 22.0   |    22.0      |  Lili  |    
| 3 |2021-01-10 |13702    | 10.0   |    32.0      |  Ted   |    
| 4 |2021-01-10 |13702    | 30.0   |    62.0      |  John  |  
| 5 |2021-01-10 |23703    | 20.0   |    20.0      |  Fadi  |    
| 6 |2021-01-10 |23703    | 15.0   |    35.0      |  Mo    |    
| 7 |2021-01-10 |23703    | 8.0    |    43.0      |  Samer | 

我尝试了多种分组方式,但无法获得所需的输出。我可以获得所有值的总列,但这不是我想要的。

【问题讨论】:

    标签: python pandas dataframe pandas-groupby


    【解决方案1】:

    我们可以使用groupby cumsum 得到Books# 每对workDateID 的累积和:

    df['totalBooks'] = df.groupby(['workDate', 'ID'])['Books#'].cumsum()
    

    df:

         workDate     ID  Books# Seller  totalBooks
    0  2020-01-09  13702    10.0    Ted        10.0
    1  2020-01-09  13702    20.5    Sam        30.5
    2  2020-01-10  13702    22.0   Lili        22.0
    3  2020-01-10  13702    10.0    Ted        32.0
    4  2020-01-10  13702    30.0   John        62.0
    5  2020-01-10  23703    20.0   Fadi        20.0
    6  2020-01-10  23703    15.0     Mo        35.0
    7  2020-01-10  23703     8.0  Samer        43.0
    

    或使用insert 进入正确的位置:

    df.insert(3, 'totalBooks', df.groupby(['workDate', 'ID'])['Books#'].cumsum())
    

    df:

        workDate     ID  Books#  totalBooks Seller
    0 2020-01-09  13702    10.0        10.0    Ted
    1 2020-01-09  13702    20.5        30.5    Sam
    2 2020-01-10  13702    22.0        22.0   Lili
    3 2020-01-10  13702    10.0        32.0    Ted
    4 2020-01-10  13702    30.0        62.0   John
    5 2020-01-10  23703    20.0        20.0   Fadi
    6 2020-01-10  23703    15.0        35.0     Mo
    7 2020-01-10  23703     8.0        43.0  Samer
    

    数据帧:

    df = pd.DataFrame({
        'workDate': pd.to_datetime(['2020-01-09', '2020-01-09', '2020-01-10',
                                    '2020-01-10', '2020-01-10', '2020-01-10',
                                    '2020-01-10', '2020-01-10']),
        'ID': [13702, 13702, 13702, 13702, 13702, 23703, 23703, 23703],
        'Books#': [10.0, 20.5, 22.0, 10.0, 30.0, 20.0, 15.0, 8.0],
        'Seller': ['Ted', 'Sam', 'Lili', 'Ted', 'John', 'Fadi', 'Mo', 'Samer']
    })
    

    【讨论】:

      最近更新 更多