如何根据多个条件从 numpy 数组中删除行？答案

【问题标题】：How do I remove rows from a numpy array based on multiple conditions?如何根据多个条件从 numpy 数组中删除行？
【发布时间】：2014-08-19 09:42:57
【问题描述】：

我有一个包含三列和数千行的文件。我想删除那些第一列中的项目在一定范围内的行。例如，如果我的文件中的数据如下：

18  6.215   0.025
19  6.203   0.025
20  6.200   0.025
21  6.205   0.025
22  6.201   0.026
23  6.197   0.026
24  6.188   0.024
25  6.187   0.023
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
30  6.189   0.019
31  6.191   0.018
32  6.188   0.019
33  6.187   0.019
34  6.194   0.021
35  6.192   0.024
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

我想删除那些第一项在 20 到 25 或 30 到 35 之间的行。因此预期的输出是：

18  6.215   0.025
19  6.203   0.025
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

我该怎么做？

【问题讨论】：

标签： python arrays numpy

【解决方案1】：

如果你想继续使用numpy，解决方案并不难。

data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))]
data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))]

或者，如果您想将所有内容合并到一个语句中，

data = data[
    np.logical_not(np.logical_or(
        np.logical_and(data[:,0] > 20, data[:,0] < 25),
        np.logical_and(data[:,0] > 30, data[:,0] < 35)
    ))
]

解释一下，像data[:,0] < 25 这样的条件语句会创建布尔数组，逐个元素地跟踪，其中数组中的条件是真还是假。在这种情况下，它会告诉您第一列数据小于 25 的位置。

您还可以使用这些布尔数组索引 numpy 数组。像data[data[:,0] > 30] 这样的语句提取data[:,0] > 30 为真的所有行，或第一个元素大于30 的所有行。这种条件索引是您提取行（或列或元素）的方式想要。

最后，我们需要逻辑工具来逐个元素地组合布尔数组。常规的 and、or 和 not 语句不起作用，因为它们试图将布尔数组作为一个整体组合在一起。幸运的是，numpy 以np.logical_and、np.logical_or 和np.logical_not 的形式提供了一组这些工具供使用。有了这些，我们可以逐个元素地组合布尔数组来找到满足更复杂条件的行。

【讨论】：

或者你可以做col0=a[:,0]和a[~((col0>=20) & (col0<=25) & (col0>=30) & (col0<=35))]

【解决方案2】：

在下面找到我对从 numpy 数组中删除特定行的问题的解决方案。解决方案以单行形式提供：

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

and 基于纯 numpy 函数（np.bitwise_and、np.where、np.delete）。

A = np.array( [   [ 18, 6.215, 0.025 ],
    [ 19, 6.203, 0.025 ],
    [ 20, 6.200, 0.025 ],
    [ 21, 6.205, 0.025 ],
    [ 22, 6.201, 0.026 ],
    [ 23, 6.197, 0.026 ],
    [ 24, 6.188, 0.024 ],
    [ 25, 6.187, 0.023 ],
    [ 26, 6.189, 0.021 ],
    [ 27, 6.188, 0.020 ],
    [ 28, 6.192, 0.019 ],
    [ 29, 6.185, 0.020 ],
    [ 30, 6.189, 0.019 ],
    [ 31, 6.191, 0.018 ],
    [ 32, 6.188, 0.019 ],
    [ 33, 6.187, 0.019 ],
    [ 34, 6.194, 0.021 ],
    [ 35, 6.192, 0.024 ],
    [ 36, 6.193, 0.024 ],
    [ 37, 6.187, 0.026 ],
    [ 38, 6.184, 0.026 ],
    [ 39, 6.183, 0.027 ],
    [ 40, 6.189, 0.027 ] ] )

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

# Remove the rows whose first item is between 30 and 35
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0)

>>> A
array([[  1.80000000e+01,   6.21500000e+00,   2.50000000e-02],
       [  1.90000000e+01,   6.20300000e+00,   2.50000000e-02],
       [  2.60000000e+01,   6.18900000e+00,   2.10000000e-02],
       [  2.70000000e+01,   6.18800000e+00,   2.00000000e-02],
       [  2.80000000e+01,   6.19200000e+00,   1.90000000e-02],
       [  2.90000000e+01,   6.18500000e+00,   2.00000000e-02],
       [  3.60000000e+01,   6.19300000e+00,   2.40000000e-02],
       [  3.70000000e+01,   6.18700000e+00,   2.60000000e-02],
       [  3.80000000e+01,   6.18400000e+00,   2.60000000e-02],
       [  3.90000000e+01,   6.18300000e+00,   2.70000000e-02],
       [  4.00000000e+01,   6.18900000e+00,   2.70000000e-02]])

【讨论】：

A = np.delete(A, np.nonzero( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0) 我们可以用np.nonzero 替换np.where 得到相同的结果。如果np.where 中未提供所有三个参数，则建议在文档中使用np.nonzero。

【解决方案3】：

在选择标准是一个值是否命中区间的特殊但频繁的情况下，我使用与区间中间的差的abs()，特别是如果midInterval具有物理意义：

data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval

如果数据类型是整数而中间值不是（如 Jun 的要求），您可以将值加倍而不是转换为浮点数（对于大整数，舍入误差变为 > 1）：

data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits]

重复以删除两个间隔。在君的问题中有限制：

data = data[abs(2*data[:,0] - 45) > 3]
data = data[abs(2*data[:,0] - 65) > 3]

【讨论】：

【解决方案4】：

您不需要为此添加 numpy 的复杂性。我猜您正在将文件读入此处的列表列表中（每一行都是整个数据列表中的列表，如下所示： ((18, 6.215, 0.025), (19, 6.203, 0.025), . ..））。在这种情况下，请使用以下规则：

for row in data:
    if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)):
        data.remove(row)

【讨论】：

我使用 loadtxt() 从文件中读取数据。如果我用readlines()读取，每一项都变成了字符串类型，那我该怎么处理呢？