【问题标题】:Pandas calculating a new column under condition熊猫在条件下计算新列
【发布时间】:2020-11-21 11:31:18
【问题描述】:

这是我的 df:

df = pd.DataFrame({'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
                    'numb_total':[8,25,11,14,8]})
df['date'] = pd.to_datetime(df['date'])

给我:

                   date numb_total
0   2020-01-01 12:00:00          8
1   2020-01-01 15:00:00         25
2   2020-01-06 07:00:00         11
3   2020-01-15 13:00:00         14
4   2020-01-22 12:00:00          8

现在我想添加一个新列,在特殊条件下给我numb_total * x (x=5),否则为*y (y=10)。

条件: 如果date 是“星期一”或“星期二”并且日期的time 介于08:00 - 14:00 之间:

df['numb_new'] = df['numb_total']*x

其他:

df['numb_new'] = df['numb_total']*y

为了获得day_nametime,我这样做了:

df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time

如何创建这个新专栏df['numb_new'] 高效?

【问题讨论】:

    标签: python python-3.x pandas dataframe apply


    【解决方案1】:

    IIUC,你可以这样做:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00', '2020-01-06 07:00:00',
                                '2020-01-15 13:00:00', '2020-01-22 12:00:00'],
                       'numb_total': [8, 25, 11, 14, 8]})
    df['date'] = pd.to_datetime(df['date'])
    
    hour_mask = (8 <= df['date'].dt.hour) & (df['date'].dt.hour <= 14)
    
    # for weekday Monday is 0 and Tuesday 1
    day_mask = np.isin(df['date'].dt.weekday, [0, 1])
    
    df['numb_new'] = df['numb_total'] * np.where(hour_mask & day_mask,  5, 10)
    
    print(df)
    

    输出

                     date  numb_total  numb_new
    0 2020-01-01 12:00:00           8        80
    1 2020-01-01 15:00:00          25       250
    2 2020-01-06 07:00:00          11       110
    3 2020-01-15 13:00:00          14       140
    4 2020-01-22 12:00:00           8        80
    

    【讨论】:

    • 就是这样!谢谢:)
    【解决方案2】:
    def create_col(date_col,value_col,x,y,min_hour,max_hour,days):
       #assert that a column is datatime
       assert ptypes.is_datetime64_any_dtype(date_col)
       #initiate the condition of hours
       hour_cond = (min_hour <= date_col.dt.hour) & (date_col.dt.hour <= max_hour)
       #initiate the condition of days
       day_mask = np.isin(date_col.dt.weekday, days)
       #return the condionned array 
       return value_col * np.where(hour_cond & day_mask,  x, y)
    
    df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00',
    '2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
                   'numb_total': [8, 25, 11, 14, 8]})
    df['date'] = pd.to_datetime(df['date'])
    df["numb_new"] = create_col(df['date'] ,df['numb_total'],5,10,8,14,[0,1])
    print(df)
    

    这是一个更通用的答案,您可以根据传递的参数更改输出。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-01-16
      • 2022-11-14
      • 1970-01-01
      • 1970-01-01
      • 2021-10-26
      • 2020-11-20
      • 2022-08-10
      • 1970-01-01
      相关资源
      最近更新 更多