在numpy中查找两个多边形之间的距离答案

【问题标题】：Finding the distance between two polygons in numpy在numpy中查找两个多边形之间的距离
【发布时间】：2020-02-15 07:43:45
【问题描述】：

我有两个多边形，P 和 Q，其中多边形的外部线性环由两个封闭的点集定义，存储为 numpy 数组，以逆时针方向连接。 P 和 Q 的格式如下：

P['x_coords'] = [299398.56 299402.16 299410.25 299419.7  299434.97 299443.75 299454.1 299465.3  299477.   299488.25 299496.8  299499.5  299501.28 299504. 299511.62 299520.62 299527.8  299530.06 299530.06 299525.12 299520.2 299513.88 299508.5  299500.84 299487.34 299474.78 299458.6  299444.66 299429.8  299415.4  299404.84 299399.47 299398.56 299398.56] 
P['y_coords'] = [822975.2  822989.56 823001.25 823005.3  823006.7  823005.06 823001.06 822993.4  822977.2  822961.   822943.94 822933.6  822925.06 822919.7 822916.94 822912.94 822906.6  822897.6  822886.8  822869.75 822860.75 822855.8  822855.4  822857.2  822863.44 822866.6  822870.6  822876.94 822886.8  822903.   822920.3  822937.44 822954.94 822975.2]

Q['x_coords'] = [292316.94 292317.94 292319.44 292322.47 292327.47 292337.72 292345.75 292350.   292352.75 292353.5  292352.25 292348.75 292345.75 292342.5 292338.97 292335.97 292333.22 292331.22 292329.72 292324.72 292319.44 292317.2  292316.2  292316.94]
Q['y_coords'] = [663781.   663788.25 663794.   663798.06 663800.06 663799.3  663796.56 663792.75 663788.5  663782.   663773.25 663766.   663762.   663758.25 663756.5  663756.25 663757.5  663761.   663763.75 663767.5  663769.5 663772.25 663777.5  663781.  ]

## SIMPLIFIED AND FORMATTED FOR EASY TESTING:
import numpy as np

px_coords = np.array([299398,299402,299410.25,299419.7,299398])
py_coords = np.array([822975.2,822920.3,822937.44,822954.94,822975.2])

qx_coords = np.array([292316,292331.22,292329.72,292324.72,292319.44,292317.2,292316])
qy_coords = np.array([663781,663788.25,663794,663798.06,663800.06,663799.3,663781])

P的外环由P['x_coords'][0], P['y_coords'][0] -> P['x_coords'][1], P['y_coords'][1]等连接而成。每个数组的最后一个坐标与第一个相同，表示形状是拓扑闭合的。

是否可以使用 numpy 以几何方式计算 P 和 Q 的外环之间的简单最小距离？我在 SO 上进行了高低搜索，但没有找到任何明确的内容，所以我怀疑这可能是对一个非常复杂的问题的过度简化。我知道距离计算可以使用 GDAL 或 Shapely 等开箱即用的空间库来完成，但我很想通过在 numpy 中从头开始构建一些东西来了解这些是如何工作的。

我考虑过或尝试过的一些事情：

计算两个阵列中每个点之间的距离。这不起作用，因为 P 和 Q 之间的最近点可以是边-顶点对。使用每个形状的凸包，使用scipy.spatial计算有同样的问题。
一种低效的蛮力方法计算每对点之间的距离，以及边缘点对的每个组合

有没有更好的方法来解决这个问题？

【问题讨论】：

有很多方法可以做到这一点，但首先你需要确定你所说的“距离”是什么意思。可能最常见的方法是计算每个多边形的质心在哪里，然后计算质心之间的距离或找到质心之间的向量，然后计算边缘与该向量相交的位置。
好点 - 我正在寻找两组外环之间的最小距离。我已经更新了我的问题以反映这一点
哈！看到您的编辑后，我才更新了我的评论
我认为你需要点点距离和线点距离，见stackoverflow.com/a/56634290/7916438的中间部分（水平线下方）
有没有关于凸性和交集的假设？

标签： python numpy geometry

【解决方案1】：

在 k-d 树上有 many variations 用于存储具有范围的对象，例如多边形的边 .我最熟悉（但没有链接）的方法涉及将轴对齐的边界框与每个节点相关联；叶子对应于对象，内部节点的盒子是最小的包围其两个孩子的盒子（通常重叠）。通常的中值切割方法应用于对象框的中点（对于线段，这是它们的中点）。

为每个多边形构建了这些后，以下双重递归会找到最接近的方法：

def closest(k1,k2,true_dist):
  return _closest(k1,0,k2,0,true_dist,float("inf"))

def _closest(k1,i1,k2,i2,true_dist,lim):
  b1=k1.bbox[i1]
  b2=k2.bbox[i2]
  # Call leaves their own single children:
  cc1=k1.child[i1] or (i1,)
  cc2=k2.child[i2] or (i2,)
  if len(cc1)==1 and len(cc2)==1:
    return min(lim,true_dist(i1,i2))
  # Consider 2 or 4 pairs of children, possibly-closest first:
  for md,c1,c2 in sorted((min_dist(k1.bbox[c1],k2.bbox[c2]),c1,c2)
                         for c1 in cc1 for c2 in cc2):
    if md>=lim: break
    lim=min(lim,_closest(k1,c1,k2,c2,true_dist,lim)
  return lim

注意事项：

两条不相交的线段之间的最接近true_dist 必须至少包含一个端点。
点和线段之间的距离可以大于点和包含线段的线之间的距离。
无需点对点检查：将通过相邻边找到这样的一对（四次）。
min_dist 的边界框参数可能重叠，在这种情况下它必须返回 0。

【讨论】：

谢谢，明天会努力理解/实施
嗨@Davis Herring，我今天早上一直在尝试在我的解决方案中实现一个kd-tree。我试图通过你的双重递归来提供帮助，但我有几个问题。 1) kd-tree 是仅建立在“目标”多边形上，然后由另一个多边形中的每个点查询，还是两个多边形都被索引？ 2) kd-trees 似乎仅用于点-您提到线段的中点可以用作节点。这向我表明 kd-tree 是建立在中点上的，但我认为我不能正确理解，因为在某些情况下这可能会导致错误的结果？
@bm13563：每个多边形都有一棵树。通过递归分析一棵树来回答一些查询（例如“从多边形到点的最小距离”）；该算法同时在两者上递归。中点用于帮助划分（边界框的）边，但边界框是存储在树中的（这种特殊设计！），它们保证了完整的分析。跨度>
感谢您的帮助。不幸的是，我真的无法掌握构建 kdtree - 我尝试调整此实现以接受每个段 github.com/Vectorized/Python-KD-Tree/blob/master/kdtree.py 的两个点，但刚刚结束了一堆可怕的列表。我想当我更好地掌握递归时，我需要回到这个答案
很抱歉不断回复您。我编写了自己的脚本来构建/查询 kd 树的点，比上面链接的更简单，这有助于我更好地理解 kd 树和递归。现在我的分数已经下降了，我又回到了线段。节点是仅建立在边缘中点上，还是两个端点也用于划分树？如果仅使用中点来划分树，我很难看到您如何参考端点（距离计算可能需要这些端点）

【解决方案2】：

感谢 Davis Herring 的回答 - 他不是我最终使用的解决方案（因为我对递归不是很熟悉），但我使用了他概述的原则来开发解决方案。正如戴维斯所建议的那样，我计划在此解决方案中建立一个索引，以帮助处理非常大的多边形。

我最后使用了一种蛮力方法，该方法比较了两个多边形的每条边之间的距离，计算了距离，并跟踪了最小距离。我改编了这个问题中提供的答案：Shortest distance between two line segments。这种方法循环很重，运行速度很慢，所以我将其调整为在 cython 中运行以提高效率。

pure python
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 6.889256715774536
average time per loop = 0.006896152868643179
max time per loop = 0.022176027297973633
min time per loop = 0.0

cython loop
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 0.046829938888549805
average time per loop = 4.687681570425406e-05
max time per loop = 0.015621423721313477
min time per loop = 0.0

为了清楚起见，我附上了下面代码的纯python版本，如果需要，可以提供cython版本。

import numpy as np
import time
import math


def segments_distance(x11, y11, x12, y12, x21, y21, x22, y22):
    if segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22): return 0
    distances = []
    distances.append(point_segment_distance(x11, y11, x21, y21, x22, y22))
    distances.append(point_segment_distance(x12, y12, x21, y21, x22, y22))
    distances.append(point_segment_distance(x21, y21, x11, y11, x12, y12))
    distances.append(point_segment_distance(x22, y22, x11, y11, x12, y12))
    return min(distances)


def segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22):
    dx1 = x12 - x11
    dy1 = y12 - y11
    dx2 = x22 - x21
    dy2 = y22 - y21
    delta = dx2 * dy1 - dy2 * dx1
    if delta == 0: return False  # parallel segments
    s = (dx1 * (y21 - y11) + dy1 * (x11 - x21)) / delta
    t = (dx2 * (y11 - y21) + dy2 * (x21 - x11)) / (-delta)
    return (0 <= s <= 1) and (0 <= t <= 1)


def point_segment_distance(px, py, x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    if dx == dy == 0:  # the segment's just a point
        return math.hypot(px - x1, py - y1)

    # Calculate the t that minimizes the distance.
    t = ((px - x1) * dx + (py - y1) * dy) / (dx * dx + dy * dy)

    # See if this represents one of the segment's
    # end points or a point in the middle.
    if t < 0:
        dx = px - x1
        dy = py - y1
    elif t > 1:
        dx = px - x2
        dy = py - y2
    else:
        near_x = x1 + t * dx
        near_y = y1 + t * dy
        dx = px - near_x
        dy = py - near_y

    return math.hypot(dx, dy)

px_coords=np.array([299398.56,299402.16,299410.25,299419.7,299434.97,299443.75,299454.1,299465.3,299477.,299488.25,299496.8,299499.5,299501.28,299504.,299511.62,299520.62,299527.8,299530.06,299530.06,299525.12,299520.2,299513.88,299508.5,299500.84,299487.34,299474.78,299458.6,299444.66,299429.8,299415.4,299404.84,299399.47,299398.56,299398.56])
py_coords=np.array([822975.2,822989.56,823001.25,823005.3,823006.7,823005.06,823001.06,822993.4,822977.2,822961.,822943.94,822933.6,822925.06,822919.7,822916.94,822912.94,822906.6,822897.6,822886.8,822869.75,822860.75,822855.8,822855.4,822857.2,822863.44,822866.6,822870.6,822876.94,822886.8,822903.,822920.3,822937.44,822954.94,822975.2])
qx_coords=np.array([384072.1,384073.2,384078.9,384085.7,384092.47,384095.3,384097.12,384097.12,384093.9,384088.9,384082.47,384078.9,384076.03,384074.97,384073.53,384072.1])
qy_coords=np.array([780996.8,781001.1,781003.6,781003.6,780998.25,780993.25,780987.9,780981.8,780977.5,780974.7,780974.7,780977.2,780982.2,780988.25,780992.5,780996.8])

px_edges = np.stack((px_coords, np.roll(px_coords, -1)),1)
py_edges = np.stack((py_coords, np.roll(py_coords, -1)),1)
p_edges = np.stack((px_edges, py_edges), axis=-1)[:-1]

qx_edges = np.stack((qx_coords, np.roll(qx_coords, -1)),1)
qy_edges = np.stack((qy_coords, np.roll(qy_coords, -1)),1)
q_edges = np.stack((qx_edges, qy_edges), axis=-1)[:-1]

timings = []
for i in range(1,1000):
    start = time.time()

    edge_distances = [segments_distance(p_edges[n][0][0],p_edges[n][0][1],p_edges[n][1][0],p_edges[n][1][1],q_edges[m][0][0],q_edges[m][0][1],q_edges[m][1][0],q_edges[m][1][1]) for m in range(0,len(q_edges)) for n in range(0,len(p_edges))]

    end = time.time() - start
    timings.append(end)

print(f'shape a edges: {len(px_coords)}')
print(f'shape b edges: {len(qy_coords)}')
print(f'total loops: {i+1}')
print(f'total time = {sum(timings)}')
print(f'average time per loop = {sum(timings)/len(timings)}')
print(f'max time per loop = {max(timings)}')
print(f'min time per loop = {min(timings)}')

【讨论】：

但是请注意，对于较大的输入，k-d 树算法比您使用的蛮力方法更有效。
啊，我明白你的意思了——我以为它们只能在多边形之间使用，但你当然是对的，如果多边形足够大，kd 树会有所帮助。我现在将编辑我的答案