从 pandas 和 python 中的数据集中分离完整和不完整的行

【问题标题】：Separate complete and incomplete rows from dataset in pandas and python从 pandas 和 python 中的数据集中分离完整和不完整的行
【发布时间】：2018-12-14 12:06:47
【问题描述】：

如何在 pandas 和 python 中分离数据集中的完整行和不完整行（我需要将它们分开以获得用于插补的测试和训练模型）？在插补之后如何将插补的行放在它的原始索引处？

【问题讨论】：

您对“完整”和“不完整”行的定义是什么？是否有缺失值？
是的，我需要将缺失值的数据集（行）与非缺失值（行）分开。

标签： pandas data-analysis data-cleaning imputation

【解决方案1】：

您可以为此使用函数 notnull() 和 dropna()

    #creating a dummy dataset
    s=[1,2,3,4,np.NAN,5]
    s1=[1,2,np.NAN,np.NAN,3,4]
    s2=[1,2,3,np.NAN,np.NAN,np.NAN]
    df=pd.DataFrame({'r1':s,'r2':s1,'r3':s2})
    #reset_index will add a column index for future concatenation
    df=df.reset_index()

    #getting the rows without null values
    not_nulls=df.dropna()

    #getting only the rows with null values
    nulls=df[df.isnull().any(axis=1)]

    #fill the null values using the required logic, Here im just filling with zero
    nulls=nulls.fillna(0)

    #combining not null and filled null rows
    combined=pd.concat([nulls,not_nulls])
    #sorting to get in the original order
    combined=combined.sort_values(by='index')

【讨论】：