【发布时间】:2021-07-02 00:06:23
【问题描述】:
对于数据框:
>>> df = DataFrame([['2021-03-31', 'A0019', '990RT', 'OFFSET', '0.10'],['2021-03-31', 'A1019', '990CT', 'MARK', '0.10'],['2021-03-31', 'A0019', '990RT', 'M
ARK', '100'],['2021-03-31', 'A0019', '990RT', 'OFFSET', '0.70'],['2021-03-31', 'A0029', '990CT', 'OFFSET', '1.10'],['2021-03-31', 'A0029', '990CT', 'MARK',
'9.10'],['2021-03-31', 'A0019', '990QT', 'MARK', '99.10'], ['2021-03-31', 'C0019', '990QT', 'OFFSET', '1'], ['2021-03-31', 'C0019', '990QT', 'GHTC', '5'],
['2021-03-31', 'C0019', '990QT', 'OFFSET', '15']], columns=['DATE','A_ID','R_ID','TYPE','I_VAL'] )
>>> df
DATE A_ID R_ID TYPE I_VAL
0 2021-03-31 A0019 990RT OFFSET 0.10
1 2021-03-31 A1019 990CT MARK 0.10
2 2021-03-31 A0019 990RT MARK 100
3 2021-03-31 A0019 990RT OFFSET 0.70
4 2021-03-31 A0029 990CT OFFSET 1.10
5 2021-03-31 A0029 990CT MARK 9.10
6 2021-03-31 A0019 990QT MARK 99.10
7 2021-03-31 C0019 990QT OFFSET 1
8 2021-03-31 C0019 990QT GHTC 5
9 2021-03-31 C0019 990QT OFFSET 15
每个NON OFFSET(例如MARK、GHTC)行根据DATE, A_ID, R_ID 的组合唯一匹配零个或多个OFFSET 行。也就是说,NON OFFSET(例如MARK)与OFFSET 行之间存在一对多关系。
我需要分两步完成一个操作:
- 如果值
DATE, A_ID, R_ID相同,则聚合行的值。将聚合值作为I_VAL的值放在NONOFFSET行中。 - 删除带有
TYPEOFFSET的行。
生成的 DataFrame 是:
# The rows with TYPE OFFSET are removed from resulting df.
# Keeping the OFFSET rows for explaining aggregation
# 0, 1, 2, 3, etc. are the indexes (row number) of the rows
DATE A_ID R_ID TYPE I_VAL
0 2021-03-31 A0019 990RT OFFSET 0.10
1 2021-03-31 A1019 990CT MARK 0.10 # no update, condition not met
2 2021-03-31 A0019 990RT MARK 100.80 # updated with sum of 0, self, and 3
3 2021-03-31 A0019 990RT OFFSET 0.70
4 2021-03-31 A0029 990CT OFFSET 1.10
5 2021-03-31 A0029 990CT MARK 10.20 # updated with sum of own value and 4
6 2021-03-31 A0019 990QT MARK 99.10 # no update, condition not met
7 2021-03-31 C0019 990QT OFFSET 1
8 2021-03-31 C0019 990QT GHTC 21 # updated with sum of self, 7, and 9
9 2021-03-31 C0019 990QT OFFSET 15
对于第 2 步,我可以:
filtered_df = df[df.TYPE != 'OFFSET']
但是,我不知道如何汇总这些值? 这个post 讨论了一个类似的问题,但我无法根据我的要求对其进行修改。
【问题讨论】: