首先通过 fir 不相等和 Series.shift 获取具有累积总和的连续组,通过 Series.duplicated 获取所有重复项并通过 GroupBy.size 获取最后一个计数:
df1 = (df[df.B.ne(df.B.shift()).cumsum().duplicated(keep=False)]
.groupby(df['B'].rename('value'))
.size()
.reset_index(name='count'))
print (df1)
value count
0 11 5
详情:
print (df[df.B.ne(df.B.shift()).cumsum().duplicated(keep=False)])
A B
4 8 11
5 11 11
6 1 11
7 15 11
8 20 11
或使用Series.value_counts 进行计数:
df2 = (df.loc[df.B.ne(df.B.shift()).cumsum().duplicated(keep=False), 'B']
.value_counts()
.rename_axis('value')
.reset_index(name='count'))
print (df2)
value count
0 11 5
编辑后输入数据好像变了,所以最后16创建新组:
df = pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,11,11,11,11,11,15,16,16]})
df1 = (df[df.B.ne(df.B.shift()).cumsum().duplicated(keep=False)]
.groupby(df['B'].rename('value'))
.size()
.reset_index(name='count'))
print (df1)
value count
0 11 5
1 16 2