【问题标题】:Python: Minimum Average DistancePython:最小平均距离
【发布时间】:2014-10-19 21:49:00
【问题描述】:

我有一组用户的纬度和经度以及一组办公室位置纬度经度。

我必须找到与所有用户的平均距离最短的办公地点。

在 python 中执行此操作的有效方法是什么?我有 3k 用户和 40k 办公地点...

例如:

输入: 用户 1 (x1, y1)
用户 2 (x2,y2)
办公室 1 (x3,y3)
办公室 2 (x4,y4)

然后我必须找出与所有用户的平均距离最短的办公室位置。

办公室 1 距用户 1 200m,距用户 2 400m。与所有用户的平均距离 = 300m

办公室 2 距用户 1 100m,距用户 2 200m。与所有用户的平均距离 = 150m

Office 2 是首选地点。

【问题讨论】:

  • 您的意思是您正在为所有用户寻找一个办公地点吗?还是每个用户一个办公地点?
  • 为所有用户提供一个办公地点。将在上面澄清问题。
  • 找到用户的质心,然后是最近的办公室
  • 质心的想法是可以的。我想补充一点,你所有的客户都住在这里,你所有的办公室都放在一个曲面上。这个事实意味着加拿大的 1 度经度是 60 公里,墨西哥的 1 度经度是 90 公里。对于相对较小的区域,您可以使用地图投影并使用地图的直角坐标定义质心和距离,并且具有足够好的近似值,但如果您的区域很大(大陆比例,世界比例),您应该仔细检查您的结果。
  • @PeterWood 我们错了,考虑 P1=(0,0), P2=(0,0), P3=(12,0)。质心是 C=(4,0),最小总距离点(又名几何中位数)是 M=(0,0),可以通过检查来验证。文章 en.wikipedia.org/wiki/Geometric_median 解释了我们陷入的陷阱,勾勒出一种合理的计算方法,并提供了对已发表材料的参考。质心可能是一个很好的近似值,但绝对不是解决方案......

标签: python numpy distance


【解决方案1】:

这是一个使用 django 的 geodjango 部分的示例。您可以使用shapelypyproj 执行相同的操作。 (这些安装起来可能有点麻烦,但是一旦你完成了所有设置,这种工作就很简单了。

from django.contrib.gis.geos import Point, MultiPoint

WGS84_SRID = 4326
office1 = Point(x1, y1, srid=WGS84_SRID )
office2 = Point(x2, y1, srid=WGS84_SRID )

# load user locations
user_locations = []
with open("user_locations.csv", "r") as in_f:
    # assuming wgs84 decimal degrees 
    # one location per line in format, 'lon, lat'
    for line in in_f:
        x, y = [float(i.strip()) for i in line.split(",")]
        user_locations.append(Point(x, y, srid=WGS84_SRID ))

# get points in a meters projection
GOOGLE_MAPS_SRID = 3857
office1_meters = office1.transform(GOOGLE_MAPS_SRID, clone=True)
office2_meters = office2.transform(GOOGLE_MAPS_SRID, clone=True)
user_locations_meters = [user_loc.transform(GOOGLE_MAPS_SRID, clone=True) for user_loc in user_locations]

# centroid method
mp = MultiPoint(user_locations, srid=4326)
centroid_distance_from_office1 = mp.centroid.distance(office1_meters)
centroid_distance_from_office2 = mp.centroid.distance(office1_meters)

print "Centroid Location: {}".format(mp.centroid.ewkt)
print("centroid_distance_from_office1: {}m".format(centroid_distance_from_office1)
print("centroid_distance_from_office2: {}m".format(centroid_distance_from_office2)

# average distance method
total_user_locations = float(len(user_locations))
office1_user_avg_distance = sum( user_loc.distance(office1_meters) for user_loc in user_locations_meters)/total_user_locations 
office2_user_avg_distance = sum( user_loc.distance(office2_meters) for user_loc in user_locations_meters)/total_user_locations 

print "avg user distance OFFICE-1: {}".format(office1_user_avg_distance)
print "avg user distance OFFICE-2: {}".format(office2_user_avg_distance)

【讨论】:

    【解决方案2】:

    主要是代码,在http://en.wikipedia.org/wiki/Geometric_median#Computation中实现算法 并给出了一个基于一组随机点的使用示例。

    注意:这是针对平面中的点,因为我无法决定如何将两个球坐标相加...因此您必须事先用平面投影映射球坐标,但这个点已经在之前的回答中提到过。

    代码

    from math import sqrt
    from random import seed, uniform
    from operator import add
    seed(100)
    
    class Point():
        """Very basic point class, supporting "+", scalar "/" and distances."""
        def __init__(self, x, y):
            self.x = x
            self.y = y
        def __repr__(self):
            return "("+repr(self.x)+","+repr(self.y)+")"
        def __add__(self, P):
            return Point(self.x+P.x, self.y+P.y)
        def __div__(self, scalar):
            return Point(self.x/float(scalar), self.y/float(scalar))
        def delta(self, P):
            dx = self.x - P.x
            dy = self.y - P.y
            return sqrt(dx*dx+dy*dy)
    
    def iterate(GM,points):
        "Simple implementation of http://en.wikipedia.org/wiki/Geometric_median#Computation"
        # distances from the tentative GM
        distances = [GM.delta(p) for p in points]
        normalized_positions = [p/d for p,d in zip(points,distances)]
        normalization_factor = sum(1.0/d for d in distances)
        new_median = reduce(add, normalized_positions)/normalization_factor
        return new_median
    
    # The "clients"
    nclients = 10
    points = [Point(uniform(-3,3),uniform(-3,3)) for i in range(nclients)]
    
    # Centroid of clients and total of distances
    centroid = reduce(add,points)/nclients
    print "Position of centroid:",centroid
    print "Sum of distances from centroid:",
    print reduce(add,[centroid.delta(p) for p in points])
    
    
    print
    print "Compute the Geometric Median using random starting points:"
    nstart = 10
    for P0 in [Point(uniform(-5,5),uniform(-5,5)) for i in range(nstart)]:
        p0 = P0
        for i in range(10):
            P0 = iterate(P0, points)
        print p0,"-->",P0
    
    print
    print "Sum of distances from last Geometric Median:",
    print reduce(add,[P0.delta(p) for p in points])
    

    输出

    Position of centroid: (-0.4647467432024398,0.08675910209912471)
    Sum of distances from centroid: 22.846445119
    
    Compute the Geometric Median using random starting points:
    (1.2632163919279735,4.633157837008632) --> (-0.8739691868669638,-0.019827884361901298)
    (-2.8916600791314986,4.561006461166512) --> (-0.8929310891388812,-0.025857080003665663)
    (0.5539966580106901,4.011520429873922) --> (-0.8764828849474395,-0.020607834485528134)
    (3.1801819335743033,-3.395781900250662) --> (-0.8550062003820846,-0.014134334529992666)
    (1.48542908120573,-3.7590671941155627) --> (-0.8687797019011291,-0.018241177226221747)
    (-4.943549141082007,-1.044838193982506) --> (-0.9066276248482427,-0.030440865315529194)
    (2.73500702168781,0.6615770729288597) --> (-0.8231318436739281,-0.005320464433689587)
    (-3.073593440129266,3.411747144619733) --> (-0.8952513352350909,-0.026600471220747438)
    (4.137768422492282,-2.6277493707729596) --> (-0.8471586848200597,-0.011875801531868494)
    (-0.5180751681772549,1.377998063140823) --> (-0.8849056106235963,-0.02326386487180884)
    
    Sum of distances from last Geometric Median: 22.7019120091
    

    我自己的评论

    在这种情况下,位置(质心与 GM)完全不同,但结果相似。当您在某个点(城市)周围进行某种聚类时,或者关于某些特征(例如一条线(一条道路)等)时,我预计位置和平均距离会有显着差异

    最终,使用numpy 可以加快速度,由于时间资源有限,我避免使用numpy :)

    【讨论】:

      猜你喜欢
      • 2014-03-31
      • 1970-01-01
      • 1970-01-01
      • 2014-07-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-11-12
      相关资源
      最近更新 更多