Pandas：仅在数据帧的开头和结尾删除 NaN答案

【问题标题】：Pandas: Remove NaN only at beginning and end of dataframePandas：仅在数据帧的开头和结尾删除 NaN
【发布时间】：2015-10-09 05:09:01
【问题描述】：

我有一个看起来像这样的 pandas DataFrame：

我只想在开头和结尾截断NaNs（即只保留从 1950 到 1954 的值，包括 NaN）。我已经尝试过.isnull() 和dropna()，但不知何故我找不到合适的解决方案。有人可以帮忙吗？

【问题讨论】：

标签： python pandas time-series dataframe nan

【解决方案1】：

这是一种方法。

import pandas as pd

# your data
# ==============================
df

      sum
1948  NaN
1949  NaN
1950    5
1951    3
1952  NaN
1953    4
1954    8
1955  NaN

# processing
# ===============================
idx = df.fillna(method='ffill').dropna().index
res_idx = df.loc[idx].fillna(method='bfill').dropna().index
df.loc[res_idx]

      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

【讨论】：

【解决方案2】：

使用内置的first_valid_index 和last_valid_index，它们是专门为此设计的并切片您的df：

In [5]:

first_idx = df.first_valid_index()
last_idx = df.last_valid_index()
print(first_idx, last_idx)
df.loc[first_idx:last_idx]
1950 1954
Out[5]:
      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

【讨论】：

【解决方案3】：

这是Numpy的一种方法：

import numpy as np

x    = np.logical_not(pd.isnull(df))
mask = np.logical_and(np.cumsum(x)!=0, np.cumsum(x[::-1])[::-1]!=0)

In [313]: df.loc[mask['sum'].tolist()]

Out[313]:
      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

【讨论】：