我认为你需要:
print (df.groupby('station')['rainfall'].apply(intensity))
但更好的是diff 将NaN 替换为0 为fillna,然后在必要时转换为int:
print (df.groupby('StationID')['rainfall'].diff().fillna(0))
示例:
df = pd.DataFrame({'rainfall': [0, 0, 0 ,1, 5, 6, 6, 8, 8, 15, 0, 1, 14, 14, 14, 15, 18, 18, 18, 20],
'StationID': ['station X'] * 10 + ['station Y'] * 10})
print (df)
StationID rainfall
0 station X 0
1 station X 0
2 station X 0
3 station X 1
4 station X 5
5 station X 6
6 station X 6
7 station X 8
8 station X 8
9 station X 15
10 station Y 0
11 station Y 1
12 station Y 14
13 station Y 14
14 station Y 14
15 station Y 15
16 station Y 18
17 station Y 18
18 station Y 18
19 station Y 20
def intensity(ts):
ts = ts.tolist()
ts2 = [0]
for i in range(0,len(ts[:-1])):
ts2.append((ts[i+1]-ts[i]))
return pd.Series(ts2)
df['diff1'] = df.groupby('StationID')['rainfall'].apply(intensity).reset_index(drop=True)
df['diff2'] = df.groupby('StationID')['rainfall'].diff().fillna(0).astype(int)
print (df)
StationID rainfall diff1 diff2
0 station X 0 0 0
1 station X 0 0 0
2 station X 0 0 0
3 station X 1 1 1
4 station X 5 4 4
5 station X 6 1 1
6 station X 6 0 0
7 station X 8 2 2
8 station X 8 0 0
9 station X 15 7 7
10 station Y 0 0 0
11 station Y 1 1 1
12 station Y 14 13 13
13 station Y 14 0 0
14 station Y 14 0 0
15 station Y 15 1 1
16 station Y 18 3 3
17 station Y 18 0 0
18 station Y 18 0 0
19 station Y 20 2 2