如何使用 python/pandas 消除具有连续值的行答案

【问题标题】：how to eliminate rows with continuous values for a column using python/pandas如何使用 python/pandas 消除具有连续值的行
【发布时间】：2018-11-14 16:52:17
【问题描述】：

我有一个这样的数据框，第 1 列中有连续的零：

col1    col2    col3
  1       2       3
  0       4       5
  0       1       4
  2       7       8
  0       1       2
  4       4       4
  0       1       3
  0       4       2
  0       1       9
  4       6       2

我想跳过那些连续为零的行至少 2 次。

例如，输出将如下所示：

col1    col2    col3
  1       2       3
  2       7       8
  0       1       2
  4       4       4
  4       6       2

【问题讨论】：

标签： python pandas dataframe pandas-groupby

【解决方案1】：

用途：

m = df['col1'].ne(0)
s = m.cumsum() * (~m)
df = df[s.groupby(s).transform('size').lt(2) | m]

或者：

df = df[s.map(s.value_counts()).lt(2) | m]

print (df)
   col1  col2  col3
0     1     2     3
3     2     7     8
4     0     1     2
5     4     4     4
9     4     6     2

解释：

首先通过Series.ne比较不等于0：

print (df['col1'].ne(0))
0     True
1    False
2    False
3     True
4    False
5     True
6    False
7    False
8    False
9     True
Name: col1, dtype: bool

然后将cumsum 用于组 - 带有0 的值具有相同的组：

print (m.cumsum())
0    1
1    1
2    1
3    2
4    2
5    3
6    3
7    3
8    3
9    4
Name: col1, dtype: int32

乘以反向布尔掩码以删除非 0 值：

print (m.cumsum() * (~m))
0    0
1    1
2    1
3    0
4    2
5    0
6    3
7    3
8    3
9    0
Name: col1, dtype: int32

然后通过GroupBy.transform获取组数：

print (s.groupby(s).transform('size'))
0    4
1    2
2    2
3    4
4    1
5    4
6    3
7    3
8    3
9    4
Name: col1, dtype: int64

并通过lt<进行比较：

print (s.groupby(s).transform('size').lt(2))
0    False
1    False
2    False
3    False
4     True
5    False
6    False
7    False
8    False
9    False
Name: col1, dtype: bool

原始掩码m 的最后一个链| 按位OR：

print (s.groupby(s).transform('size').lt(2) | m)
0     True
1    False
2    False
3     True
4     True
5     True
6    False
7    False
8    False
9     True
Name: col1, dtype: bool

最后按boolean indexing过滤：

print (df[s.groupby(s).transform('size').lt(2) | m])

   col1  col2  col3
0     1     2     3
3     2     7     8
4     0     1     2
5     4     4     4
9     4     6     2

【讨论】：