【问题标题】:Create a column based on computation of a another column基于另一列的计算创建一列
【发布时间】:2020-04-27 18:35:07
【问题描述】:

我想根据前一周的销售额创建另一个列。这是示例输入:

df = pd.DataFrame({'Week':[1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5],
                   'Category':['Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White'],
                   'id':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
                   'Sales':[100,200,300,400,100,200,300,400,100,200,100,200,300,400,100,200,300,400,100,200],
                  'Sales_others':[10,20,30,40,10,20,30,40,10,20,10,20,30,40,10,20,30,40,10,20]})
print(df)

基于此,我想创建另一个列,该列仅是前一周的销售额。这是所需输出的示例

    df_output = pd.DataFrame({'Week':[1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5],
                       'Category':['Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White'],
                       'id':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
                       'Sales':[100,200,300,400,100,200,300,400,100,200,100,200,300,400,100,200,300,400,100,200],
                      'Sales_others':[10,20,30,40,10,20,30,40,10,20,10,20,30,40,10,20,30,40,10,20],
                      'Sales_previous_week':[0,0,100,200,300,400,100,200,300,400,0,0,100,200,300,400,100,200,300,400]})
print(df_output)

我发现很难创建自联接。前一周应该只受销售文件的影响,我应该能够保留“sales_others”列

--编辑 添加原始代码

CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)

重命名列

CR_UK_NL_Weeklevel.columns.values[4] = 'CURRENT_WEEK'
CR_UK_NL_Weeklevel.columns.values[3] = 'LAST_YEAR_WEEK'
CR_UK_NL_Weeklevel.columns.values

尝试实施解决方案:

CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)

[78]:

CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)

--错误

------------------------------------------ ------------------------------------------- KeyError Traceback(最近一次调用最后一次) 在 ----> 1 CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0) 2 打印(CR_UK_NL_Weeklevel) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\base.py in getitem(self, key) 273 其他: 274 如果 key 不在 self.obj 中: --> 275 raise KeyError("Column not found: {key}".format(key=key)) 第276章 277 KeyError:'未找到列:CURRENT_WEEK'

【问题讨论】:

    标签: python pandas self-join


    【解决方案1】:

    如果每周和连续几周总是有相同的类别,请使用DataFrameGroupBy.shiftCategory 列分组:

    df['Sales_PREVIOUS'] = df.groupby('Category')['Sales'].shift(fill_value=0)
    print (df)
       Week Category  Sales  Sales_PREVIOUS
    0     1      Red    100               0
    1     1    White    200               0
    2     2      Red    300             100
    3     2    White    400             200
    4     3      Red    100             300
    5     3    White    200             400
    6     4      Red    300             100
    7     4    White    400             200
    8     5      Red    100             300
    9     5    White    200             400
    

    旋转的另一个想法是使用DataFrame.pivot,然后使用DataFrame.shiftDataFrame.stack 用于Series,最后通过DataFrame.join 添加新列:

    s = df.pivot('Week','Category','Sales').shift(fill_value=0).stack()
    df = df.join(s.rename('Sales_PREVIOUS WEEK'), on=['Week','Category'])
    

    编辑:

    使用新数据添加列id

    df['Sales_PREVIOUS'] = df.groupby(['id','Category'])['Sales'].shift(fill_value=0)
    

    第二个解决方案:

    s = df.set_index(['Week','id','Category'])['Sales'].unstack([1,2]).shift(fill_value=0).unstack()
    df = df.join(s.rename('Sales_PREVIOUS WEEK'), on=['id','Category','Week'])
    print (df)
        Week Category  id  Sales  Sales_others  Sales_PREVIOUS WEEK
    0      1      Red   1    100            10                    0
    1      1    White   1    200            20                    0
    2      2      Red   1    300            30                  100
    3      2    White   1    400            40                  200
    4      3      Red   1    100            10                  300
    5      3    White   1    200            20                  400
    6      4      Red   1    300            30                  100
    7      4    White   1    400            40                  200
    8      5      Red   1    100            10                  300
    9      5    White   1    200            20                  400
    10     1      Red   2    100            10                    0
    11     1    White   2    200            20                    0
    12     2      Red   2    300            30                  100
    13     2    White   2    400            40                  200
    14     3      Red   2    100            10                  300
    15     3    White   2    200            20                  400
    16     4      Red   2    300            30                  100
    17     4    White   2    400            40                  200
    18     5      Red   2    100            10                  300
    19     5    White   2    200            20                  400
    

    编辑:

    问题在于列名,使用:

    cols = CR_UK_NL_Weeklevel.columns.tolist()
    cols[4] = 'CURRENT_WEEK'
    cols[3] = 'LAST_YEAR_WEEK'
    CR_UK_NL_Weeklevel.columns = cols
    

    【讨论】:

    • 这很好用,再次感谢 Jezrael,但是我也可以保留数据集中的一列。假设我在数据集中有一个名为 ID 的列,它与当前的练习没有任何关系,如果我能保留它会很棒。
    • 顺序无所谓。
    • 当我再添加 1 列时,脚本不起作用。我已经对问题进行了编辑
    • @vevekseetharaman - 我认为问题是 this,已编辑解决方案的答案 - 最后一段
    • 这成功了!你太棒了!我不知道我不能像那样重命名列
    猜你喜欢
    • 2013-05-04
    • 2010-12-23
    • 2021-07-15
    • 1970-01-01
    • 2022-11-28
    • 1970-01-01
    • 2011-09-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多