【发布时间】:2020-06-19 02:29:35
【问题描述】:
我有一个熊猫数据框
df = pd.DataFrame({'Firm': ['Firm1','Firm1','Firm1','Firm1','Firm1','Firm1','Firm2','Firm2','Firm2','Firm2','Firm2','Firm2'],'Location' : ['Country1', 'Country1', 'Country1', 'Country2', 'Country2', 'Country2','Country1', 'Country1', 'Country1', 'Country2', 'Country2', 'Country2'], 'Currency' : ['Curr1', 'Curr2', 'Curr3', 'Curr1', 'Curr2', 'Curr3','Curr1', 'Curr2', 'Curr3', 'Curr1', 'Curr2', 'Curr3'], 'Value' : [100, 105, 110, 100, 95, 120, 95, 110, 115, 105, 120, 90] })
看起来像这样:
df:
Firm Location Currency Value
0 Firm1 Country1 Curr1 100
1 Firm1 Country1 Curr2 105
2 Firm1 Country1 Curr3 110
3 Firm1 Country2 Curr1 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr3 120
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 115
9 Firm2 Country2 Curr1 105
10 Firm2 Country2 Curr2 120
11 Firm2 Country2 Curr3 90
现在我想计算每个公司位置组的 Curr3 和 Curr2 (列值)之间的差异,并根据结果更改 Curr3 的值。生成的 df 应如下所示:
Firm Location Currency Value
0 Firm1 Country1 Curr1 100
1 Firm1 Country1 Curr2 105
2 Firm1 Country1 Curr3 5
3 Firm1 Country2 Curr1 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr3 25
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 5
9 Firm2 Country2 Curr1 105
10 Firm2 Country2 Curr2 120
11 Firm2 Country2 Curr3 -30
我尝试过使用.groupby 和.apply,这给了我结果,但是我想在原始数据框中进行转换。
df2 = df.groupby(['Firm','Location']).apply(lambda g: g[g.Currency == 'Curr3'].Value.values[0] - g[g.Currency == 'Curr2'].Value.values[0])
df2:
Firm Location 0
Firm1 Country1 5
Firm1 Country2 25
Firm2 Country1 5
Firm2 Country2 -30
我无法弄清楚如何在原始 df 中就地执行此操作。我也使用.transform 进行了同样的尝试,但是它会产生错误:
df2 = df.groupby(['Firm','Location']).transform(lambda g: g[g.Currency == 'Curr3'].Value.values[0] - g[g.Currency == 'Curr2'].Value.values[0])
AttributeError: ("'Series' object has no attribute 'Currency'", 'occurred at index Currency')
----根据二凡的方案更新:
newvals = (
df.where(df['Currency'].isin(['Curr2', 'Curr3']))
.groupby(['Firm', 'Location'])['Value'].diff()
)
df['Value'] = newvals.fillna(df['Value'])
如果 df 看起来像这样(货币未排序),则解决方案不再有效(因为 diff() 仅计算与前一个值的差异
Firm Location Currency Value
0 Firm1 Country1 Curr2 100
1 Firm1 Country1 Curr1 105
2 Firm1 Country1 Curr3 110
3 Firm1 Country2 Curr3 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr1 120
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 115
9 Firm2 Country2 Curr2 105
10 Firm2 Country2 Curr3 120
11 Firm2 Country2 Curr1 90
-> 结果:
Firm Location Currency Value
0 Firm1 Country1 Curr2 100.0
1 Firm1 Country1 Curr1 105.0
2 Firm1 Country1 Curr3 10.0
3 Firm1 Country2 Curr3 100.0
4 Firm1 Country2 Curr2 -5.0
5 Firm1 Country2 Curr1 120.0
6 Firm2 Country1 Curr1 95.0
7 Firm2 Country1 Curr2 110.0
8 Firm2 Country1 Curr3 5.0
9 Firm2 Country2 Curr2 105.0
10 Firm2 Country2 Curr3 15.0
11 Firm2 Country2 Curr1 90.0
现在,不再每次计算 Curr3 和 Curr 2 之间的差值并替换 Curr3 的值。
【问题讨论】:
-
我看到了你的编辑,我回家后会更新答案
标签: python pandas group-by transform apply