从多个列表中识别相似的数字答案

【问题标题】：Identify similar numbers from several lists从多个列表中识别相似的数字
【发布时间】：2020-08-29 23:59:25
【问题描述】：

我有 3 个列表：

r=[0.611695403733703, 0.833193902333201, 1.09120811998494]
g=[0.300675698437847, 0.612539072191236, 1.18046695352626]
b=[0.00668849762984564, 0.611946522017357, 1.16778502636141]

我想计算最相似数字的平均值。在上面的示例中，r[0]、g[1] 和b[1] 非常相似（大约为0.61...）。如何识别这种模式？

【问题讨论】：

在这个问题中没有 numpy .. 为什么要这样标记它 - 或者会这样标记它？
如果使用 numpy 可以提供更简洁的解决方案，那就是 x=np.array(x)

标签： python-3.x list numpy pattern-matching

【解决方案1】：

使用列表推导的蛮力：

r=[0.611695403733703, 0.833193902333201, 1.09120811998494]
g=[0.300675698437847, 0.612539072191236, 1.18046695352626]
b=[0.00668849762984564, 0.611946522017357, 1.16778502636141]


rg = [ (idx_r, idx_g,r,g) if abs(rr-gg) < 0.001 else None 
      for idx_r,rr in enumerate(r) 
      for idx_g, gg in enumerate(g)]

rb = [ (idx_r, idx_b,r,b) if abs(rr-bb) < 0.001 else None 
      for idx_r,rr in enumerate(r) 
      for idx_b, bb in enumerate(b)]

gb = [ (idx_g, idx_b,g,b) if abs(gg-bb) < 0.001 else None 
      for idx_g,gg in enumerate(g) 
      for idx_b, bb in enumerate(b)]

print(filter(None,rg+rb+gb))

输出：

[(0, 1, [0.611695403733703, 0.833193902333201, 1.09120811998494], 
        [0.300675698437847, 0.612539072191236, 1.18046695352626]), 
 (0, 1, [0.611695403733703, 0.833193902333201, 1.09120811998494], 
        [0.00668849762984564, 0.611946522017357, 1.16778502636141]), 
 (1, 1, [0.300675698437847, 0.612539072191236, 1.18046695352626], 
        [0.00668849762984564, 0.611946522017357, 1.16778502636141])]

输出是 1.list 中的索引元组、2.list 中的索引以及两个列表中的索引。

【讨论】：

【解决方案2】：

您正在计算所有点集之间的距离。最好的方法是scipy.spatial.distance.cdist:

from scipy.spatial.distance import cdist
import numpy as np

r=[0.611695403733703, 0.833193902333201, 1.09120811998494]
g=[0.300675698437847, 0.612539072191236, 1.18046695352626]
b=[0.00668849762984564, 0.611946522017357, 1.16778502636141]

arr = np.array([r,g,b])
# need 2d set of points
arr_flat = arr.ravel()[:, np.newaxis]

# computes distance between every point, pairwise
dists = cdist(arr_flat, arr_flat)
# (1,2) is the same as (2,1), so only consider each pair once
# ie. use upper triangle
dists = np.triu(dists)
# set 0 values to inf so we don't consider the,m
dists[dists == 0] = np.inf

# get all pairs that are below this threshold level
ahold = 0.01
coords = np.nonzero(dists<thold)

labels = 'rgb'
print(f'Pairs of points closer than {thold}:')
for i, j in zip(*coords):
    print(labels[i//3] + f'[{i%3}]', labels[j//3] + f'[{j%3}]')

>>> Pairs of points closer than 0.01:
    r[0] g[1]
    r[0] b[1]
    g[1] b[1]

# can easily count the number of points as
np.count_nonzero(dists<thold)
>>> 3

【讨论】：