计算数据框中的连续元素并将它们存储在新列中答案

【问题标题】：counting consequtive elements in a dataframe and storing them in a new column计算数据框中的连续元素并将它们存储在新列中
【发布时间】：2021-08-04 19:26:49
【问题描述】：

所以我有这个代码：

import pandas as pd
id_1=[0,0,0,0,0,0,2,0,4,5,6,7,1,0,5,3]
exp_1=[1,2,3,4,5,6,1,7,1,1,1,1,1,8,2,1]

df = pd.DataFrame(list(zip(id_1,exp_1)), columns =['Patch', 'Exploit'])              
            
df = (
     df.groupby((df.Patch != df.Patch.shift(1)).cumsum())
     .agg({"Patch": ("first", "count")})
     .reset_index(drop=True)
      
    ) 
print(df)

输出是：

   Patch      
   first count
0      0     6
1      2     1
2      0     1
3      4     1
4      5     1
5      6     1
6      7     1
7      1     1
8      0     1
9      5     1
10     3     1

我想创建一个数据框，其中包含一个名为 count 的新列，我可以在其中存储补丁 (id_1) 的连续外观。但是，上面的代码创建了一个补丁字典，我不知道如何单独操作存储在名为 count 的列中的值。

假设我想从 id_1 中删除所有 0，然后计算连续出现的次数。或者我必须找到计数列的平均值？

【问题讨论】：

标签： python dataframe duplicates pandas-groupby

【解决方案1】：

如果您想从Patch 列中删除所有0，那么您可以过滤.groupby 之前的数据框。例如：

df = (
    df[df.Patch != 0]
    .groupby((df.Patch != df.Patch.shift(1)).cumsum())
    .agg({"Patch": ("first", "count")})
    .reset_index(drop=True)
)
print(df)

打印：

  Patch      
  first count
0     2     1
1     4     1
2     5     1
3     6     1
4     7     1
5     1     1
6     5     1
7     3     1

【讨论】：

仍然无法帮助我了解如何仅获取“计数”列的平均值或如何对计数列进行任何操作。