【发布时间】:2021-10-05 15:24:09
【问题描述】:
我有两个数据框需要合并。
我的意见
数据帧1:
Date |Employee Name |Non-Billable |Billable |Utilization
0 30-04-2021 |John |92.82 |NaN |0.9282
1 30-04-2021 |Michael |66.66 |26.20 |0.9286
2 31-05-2021 |Peter |98.20 |NaN |0.9820
3 30-06-2021 |James |15.93 |88.72 |1.0465
4 30-04-2021 |Stephen |116.09 |NaN |1.1609
数据帧 2:
Employee Name |Date |Amount
James |2021-04-30 |120000.000000
John |2021-04-30 |32967.032967
|2021-05-31 |34065.934066
|2021-06-30 |32967.032967
Peter |2021-04-30 |266626.080000
代码
df1 = df1.set_index(['Date','Employee Name']).unstack('Employee Name').resample('M').sum(min_count=1).stack('Employee Name',dropna=False).reset_index()
df1['Date'] = pd.to_datetime(df1['Date'], format='%y/%m/%d %H:%M:%S').dt.strftime('%d-%m-%Y')
print("DF1 : ", df1.head())
df2.rename(columns={'Start Date':'Date'},inplace=True)
df2['Date'] = pd.to_datetime((df4['Date']).dt.strftime("%m-%d-%Y"))
df2= df2.set_index ('Date').groupby('Employee Name')["Amount"].resample('M').sum(min_count=1)
print("DF2 : ", df2.head())
# Merge the dataframes
#df3 = pd.merge(df1, df2[['Employee Name', 'Amount']], on ='Employee Name', how='left').groupby(["Date", "Employee Name"], as_index=False).max()
df3 = pd.merge(df1, df2, on='Employee Name', how='left')
df3 = df3.set_index(['Date', 'Employee Name', 'Utilization'])
df3['Billable_hr'] = df3['Amount'].div(df3['Billable']).round(2)
sum1 = df3[["Non-Billable", "Billable"]].sum(axis=1, min_count=1)
df3['Employee_hr'] = df3['Amount'].div(sum1).round(2)
我的输出:
我的预期输出:
如何解决这个问题?
【问题讨论】:
-
你应该在
['Date', 'Employee Name']上合并df1和df2,目前你只在Name上合并 -
如果我包含 DATE 列,则永久 Pon - 出现错误
-
错误是什么?在合并日期和名称之前尝试重置 df2 上的索引。
-
非常感谢 PermanentPon,它正在工作
标签: python pandas dataframe numpy merge