如何删除具有空值的特定行答案

【问题标题】：How to remove a row a specific row with null value如何删除具有空值的特定行
【发布时间】：2017-06-25 06:44:55
【问题描述】：

这是我拥有的数据框的一个子集。对于句子列具有值的每一行，对于接下来的两行重复 A B C D 列，而句子列没有值。如何删除句子的空值的第二行。我需要为句子列保留第一行的空值。

     A    B   C    D             R      sentence              ADR 
    112 135 21  EffexorXR.21    1    lack of good feeling.    good
    113 135 21  EffexorXR.21    1                               1 
    114 135 21  EffexorXR.21    1   
    115 136 21  EffexorXR.21    2   Feel disconnected         disconnected
    116 136 21  EffexorXR.21    2        
    117 136 21  EffexorXR.21    2    
    118 142 22  EffexorXR.22    1   Weight gain                gain
    119 142 22  EffexorXR.22    1                                1
    120 142 22  EffexorXR.22    1

输出是这样的

   A    B   C    D             R        sentence               ADR     
    112 135 21  EffexorXR.21    1    lack of good feeling.     good
    113 135 21  EffexorXR.21    1                               1
    115 136 21  EffexorXR.21    2    Feel disconnected        disconnected       
    116 136 21  EffexorXR.21    2   
    118 142 22  EffexorXR.22    1    Weight gain               gain
    119 142 22  EffexorXR.22    1                               1

如果我使用以下代码：

df = df[pd.notnull(df['sentences'])], 然后它将删除具有空值的两行。有什么建议吗？

以下解决方案不起作用。

df.set_index('A').drop_duplicates().reset_index()

【问题讨论】：

标签： python pandas

【解决方案1】：

也许你可以看到合并列的重复项并将其用于mask original dataframe：

new_df = df[~df[['B','C','D', 'R', 'sentence']].duplicated()]
print(new_df)

输出：

     A    B   C             D  R               sentence           ADR
0  112  135  21  EffexorXR.21  1  lack of good feeling.          good
1  113  135  21  EffexorXR.21  1                                    1
3  115  136  21  EffexorXR.21  2      Feel disconnected  disconnected
4  116  136  21  EffexorXR.21  2                                     
6  118  142  22  EffexorXR.22  1            Weight gain          gain
7  119  142  22  EffexorXR.22  1                                    1

【讨论】：

【解决方案2】：

您可以使用 drop_duplicates。 A 列是唯一的，因此我们将其设置为索引。它将使用剩余的列来检查重复项并删除它们（如果有）。最后 reset_index 将 A 列带回来。

df.set_index('A').drop_duplicates().reset_index()
Out[847]: 
     A    B   C             D  R               sentence
0  112  135  21  EffexorXR.21  1  lack of good feeling.
1  113  135  21  EffexorXR.21  1                       
2  115  136  21  EffexorXR.21  2      Feel disconnected
3  116  136  21  EffexorXR.21  2                       
4  118  142  22  EffexorXR.22  1            Weight gain
5  119  142  22  EffexorXR.22  1

更新了答案，只使用一个子集作为检查重复项的键。

df.drop_duplicates(subset=['B','C','D','sentence'])
Out[866]: 
     A    B   C             D  R               sentence           ADR
0  112  135  21  EffexorXR.21  1  lack of good feeling.          good
1  113  135  21  EffexorXR.21  1                                    1
3  115  136  21  EffexorXR.21  2      Feel disconnected  disconnected
4  116  136  21  EffexorXR.21  2                                  nan
6  118  142  22  EffexorXR.22  1            Weight gain          gain
7  119  142  22  EffexorXR.22  1                                    1

【讨论】：

谢谢，但我不能使用重复。因为在句子列中还有其他列的第一行具有空值。此外，对于第二行，其他列可能有一些值。如果这两行不重复，您有什么建议吗？
@Mary，我已根据您的更新更新了答案。如果 D 列可用于唯一标识组，则只需使用 ['D','sentence'] 作为键。