【问题标题】:Vectorized interpretation of distance matrix距离矩阵的向量化解释
【发布时间】:2016-05-10 22:17:06
【问题描述】:

我有几个点,想确定它们之间是否有特定的距离。如果是,我想将它们合并为一个点。我建立了一个搜索树并从中得到了一个距离矩阵。是否有一种优雅的(如果可能的话不使用慢循环)方法来确定哪些点在特定距离内,而不使用一些复杂的聚类算法(kmeans、分层等)?

import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.neighbors import radius_neighbors_graph

RADIUS = 0.025
points = np.array([
    [13.2043373032, 52.3818529896],
    [13.0530692845, 52.3991668707],
    [13.229309674, 52.3840231],
    [13.489018081, 52.4180538095],
    [13.3209738098, 52.6375963146],
    [13.0160362703, 52.4187139243],
    [13.0448485, 52.4143229343],
    [13.32478977, 52.5090253],
    [13.35514839, 52.5219323867],
    [13.1982523828, 52.3592620828]
])

tree = NearestNeighbors(n_neighbors=2, radius=RADIUS, leaf_size=30, algorithm="auto", n_jobs=1).fit(points)
nnGraph = radius_neighbors_graph(tree, RADIUS, mode='distance', include_self=False)

print nnGraph

(0, 9)        0.0233960536484
(1, 6)        0.0172420289306
(6, 1)        0.0172420289306
(9, 0)        0.0233960536484

【问题讨论】:

    标签: arrays numpy vectorization distance


    【解决方案1】:

    您可以使用scipy.spatial.distance 中的pdistsquareform 进行矢量化解决方案,就像这样 -

    from scipy.spatial.distance import pdist, squareform
    
    # Get pairwise euclidean distances              
    dists = squareform(pdist(points))
    
    # Get valid distances mask and the corresponding indices
    mask = dists < RADIUS
    np.fill_diagonal(mask,0)
    idx = np.argwhere(mask)
    
    # Present indices and corresponding distances as zipped output
    out = zip(map(tuple,idx),dists[idx[:,0],idx[:,1]])
    

    示例运行 -

    In [91]: RADIUS
    Out[91]: 0.025
    
    In [92]: points
    Out[92]: 
    array([[ 13.2043373 ,  52.38185299],
           [ 13.05306928,  52.39916687],
           [ 13.22930967,  52.3840231 ],
           [ 13.48901808,  52.41805381],
           [ 13.32097381,  52.63759631],
           [ 13.01603627,  52.41871392],
           [ 13.0448485 ,  52.41432293],
           [ 13.32478977,  52.5090253 ],
           [ 13.35514839,  52.52193239],
           [ 13.19825238,  52.35926208]])
    
    In [93]: out
    Out[93]: 
    [((0, 9), 0.023396053648436933),
     ((1, 6), 0.017242028930573985),
     ((6, 1), 0.017242028930573985),
     ((9, 0), 0.023396053648436933)]
    

    【讨论】:

    【解决方案2】:

    对于小点数 (Efficiently Calculating a Euclidean Distance Matrix Using Numpy

    pointsCmplx = np.array([points[...,0] + 1j * points[...,1]])
    dists = abs(pointsCmplx.T - pointsCmplx)
    

    我的目标是在半径方面获得不重叠的点。我拿走了你的代码并删除了下三角矩阵,最后我只是删除了第二点。这些点按特定的观察进行排序。较低的指数意味着更重要。有什么其他建议可以有效地合并附近的集群而不是删除点吗?我正在寻找一个非常快速的解决方案,并且不想使用一些复杂的聚类算法。

    # overlapping points
    points = np.array([
        [13.2043373032, 52.3818529896],
        [13.0530692845, 52.3991668707],
        [13.229309674, 52.3840231],
        [13.489018081, 52.4180538095],
        [13.3209738098, 52.6375963146],
        [13.0160362703, 52.4187139243],
        [13.0448485, 52.4143229343],
        [13.32478977, 52.5090253],
        [13.35514839, 52.5219323867],
        [13.1982523828, 52.3592620828],
        [13.1982523828, 52.3592620830]       # nearly identical
    ])
    
    dists = squareform(pdist(points))
    mask = dists < RADIUS
    np.fill_diagonal(mask,0)
    
    # delete lower triangular matrix
    mask = np.triu(mask)
    idx = np.argwhere(mask)
    
    # delete the target ids
    idx = idx[:,1]   
    points = np.delete(points, idx, 0)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-08-23
      • 2012-12-21
      • 1970-01-01
      • 2017-06-09
      • 1970-01-01
      • 2020-03-13
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多