用 pandas 识别连续的 NaN 第 2 部分答案

【问题标题】：Identifying consecutive NaN's with pandas part 2用 pandas 识别连续的 NaN 第 2 部分
【发布时间】：2021-04-25 20:09:28
【问题描述】：

我有一个与之前的问题相关的问题：Identifying consecutive NaN's with pandas

我是 stackoverflow 的新手，所以我无法添加评论，但我想知道在计算连续 nan 的数量时如何部分保留数据帧的原始索引。

所以而不是：

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df
Out[38]:
     a
0    1
1    2
2  NaN
3  NaN
4  NaN
5    6
6    7
7    8
8    9
9   10
10 NaN
11 NaN
12  13
13  14

我想获得以下内容：

【问题讨论】：

您应该将问题标题更改为How to keep original indexes after grouping。另外，看看这个question
这能回答你的问题吗？ How to keep original index of a DataFrame after groupby 2 columns?
@ScottBoston 我将编辑问题以澄清这一点。

标签： python pandas dataframe

【解决方案1】：

我找到了解决方法。这很丑陋，但它可以解决问题。我希望你没有海量数据，因为它可能不是很好：

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df1 = df.a.isnull().astype(int).groupby(df.a.notnull().astype(int).cumsum()).sum()

# Determine the different groups of NaNs. We only want to keep the 1st. The 0's are non-NaN values, the 1's are the first in a group of NaNs. 
b = df.isna()
df2 = b.cumsum() - b.cumsum().where(~b).ffill().fillna(0).astype(int)
df2 = df2.loc[df2['a'] <= 1]

# Set index from the non-zero 'NaN-count' to the index of the first NaN
df3 = df1.loc[df1 != 0]
df3.index = df2.loc[df2['a'] == 1].index

# Update the values from df3 (which has the right values, and the right index), to df2 
df2.update(df3)

NaN-group thingy 的灵感来自以下答案：这来自 answer。

【讨论】：

哇，我很惊讶这比这更容易做到。但它似乎没有使用熊猫，可以使用 numpy 制作。 +1