熊猫在条件下计算新列答案

【问题标题】：Pandas calculating a new column under condition熊猫在条件下计算新列
【发布时间】：2020-11-21 11:31:18
【问题描述】：

这是我的 df：

df = pd.DataFrame({'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
                    'numb_total':[8,25,11,14,8]})
df['date'] = pd.to_datetime(df['date'])

给我：

                   date numb_total
0   2020-01-01 12:00:00          8
1   2020-01-01 15:00:00         25
2   2020-01-06 07:00:00         11
3   2020-01-15 13:00:00         14
4   2020-01-22 12:00:00          8

现在我想添加一个新列，在特殊条件下给我numb_total * x (x=5)，否则为*y (y=10)。

条件：如果date 是“星期一”或“星期二”并且日期的time 介于08:00 - 14:00 之间：

df['numb_new'] = df['numb_total']*x

其他：

df['numb_new'] = df['numb_total']*y

为了获得day_name 和time，我这样做了：

df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time

如何创建这个新专栏df['numb_new'] 高效？

【问题讨论】：

标签： python python-3.x pandas dataframe apply

【解决方案1】：

IIUC，你可以这样做：

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00', '2020-01-06 07:00:00',
                            '2020-01-15 13:00:00', '2020-01-22 12:00:00'],
                   'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])

hour_mask = (8 <= df['date'].dt.hour) & (df['date'].dt.hour <= 14)

# for weekday Monday is 0 and Tuesday 1
day_mask = np.isin(df['date'].dt.weekday, [0, 1])

df['numb_new'] = df['numb_total'] * np.where(hour_mask & day_mask,  5, 10)

print(df)

输出

                 date  numb_total  numb_new
0 2020-01-01 12:00:00           8        80
1 2020-01-01 15:00:00          25       250
2 2020-01-06 07:00:00          11       110
3 2020-01-15 13:00:00          14       140
4 2020-01-22 12:00:00           8        80

【讨论】：

就是这样！谢谢:)

【解决方案2】：

def create_col(date_col,value_col,x,y,min_hour,max_hour,days):
   #assert that a column is datatime
   assert ptypes.is_datetime64_any_dtype(date_col)
   #initiate the condition of hours
   hour_cond = (min_hour <= date_col.dt.hour) & (date_col.dt.hour <= max_hour)
   #initiate the condition of days
   day_mask = np.isin(date_col.dt.weekday, days)
   #return the condionned array 
   return value_col * np.where(hour_cond & day_mask,  x, y)

df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00',
'2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
               'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
df["numb_new"] = create_col(df['date'] ,df['numb_total'],5,10,8,14,[0,1])
print(df)

这是一个更通用的答案，您可以根据传递的参数更改输出。

【讨论】：