【问题标题】:How to find 2 nearest points?如何找到最近的 2 个点?
【发布时间】:2019-11-04 04:49:26
【问题描述】:

我是 python 新手,不知道如何处理这个任务: 我有 2 个数据帧,我需要为点数据帧中的每个点找到轨迹数据帧中的 2 个最近点

轨迹数据框:

datetime                 lon_deg              lat_deg
2501    28.03.2018 11:58    13.35994653 48.59990204
2502    28.03.2018 11:58    13.35880586 48.60004335
2503    28.03.2018 11:59    13.35766636 48.600205100000004
2504    28.03.2018 11:59    13.35653218 48.60039648
2505    28.03.2018 12:00    13.35539451 48.60058775
2506    28.03.2018 12:00    13.35426064 48.60079647
2507    28.03.2018 12:01    13.3531299  48.60096096
2508    28.03.2018 12:01    13.352004   48.60099219

点数据框:

datetime    lon_deg                        lat_deg
2018-01-29 08:08:59.000 13.359284659333333  48.600108882
29.01.2018 8:09 13.358371081166666  48.60023545666667
2018-01-29 08:09:19.000 13.358347605833334  48.600238692333335
29.01.2018 8:09 13.358324105166666  48.600241913333335
2018-01-29 08:09:20.000 13.358300611666667  48.600245154666666
29.01.2018 8:09 13.358277134    48.600248416
2018-01-29 08:09:21.000 13.358253648166666  48.60025165216667
2018-01-29 08:09:54.000 13.356701967    48.60046564733333
29.01.2018 8:09 13.356678427    48.6004688765
2018-01-29 08:09:55.000 13.356654635    48.6004718285
29.01.2018 8:09 13.356443313166666  48.600502414833336
2018-01-29 08:10:00.000 13.356419901333334  48.60050610933333
29.01.2018 8:10 13.356396262666667  48.600509612
2018-01-29 08:10:09.000 13.355999669    48.6005754975
29.01.2018 8:10 13.355976287333334  48.600579365
2018-01-29 08:10:10.000 13.355952748166667  48.60058305983333
29.01.2018 8:10 13.355929286666667  48.600586781666664
2018-01-29 08:10:11.000 13.355905869    48.6005904815
29.01.2018 8:10 13.355882745166667  48.60059446966667
2018-01-29 08:10:12.000 13.355859396333333  48.600598258666665
29.01.2018 8:10 13.3558361535   48.600602143
2018-01-29 08:10:13.000 13.355812639    48.600605769
29.01.2018 8:10 13.355789295666666  48.60060949333333
2018-01-29 08:10:14.000 13.355765727833333  48.60061298866667
29.01.2018 8:10 13.355742236833333  48.60061659483333
2018-01-29 08:10:15.000 13.3557187615   48.60062014216667
29.01.2018 8:10 13.355695496166666  48.60062391466667
2018-01-29 08:10:16.000 13.35567225 48.600627667833336
29.01.2018 8:10 13.355649023166666  48.600631406
2018-01-29 08:10:17.000 13.355625505    48.60063494533333
29.01.2018 8:10 13.3556019655   48.60063844983333
2018-01-29 08:10:18.000 13.355578551333334  48.60064199316667
29.01.2018 8:10 13.355461117166668  48.60065928433333
2018-01-29 08:10:21.000 13.355437626833334  48.600662660333334
2018-01-29 08:10:24.000 13.3552968655   48.600682845166666
29.01.2018 8:10 13.3552734295   48.600686212333336
2018-01-29 08:10:25.000 13.355249975    48.600689552333336
2018-01-29 08:10:29.000 13.355062269    48.6007157075
29.01.2018 8:10 13.355038871833333  48.60071868083333
2018-01-29 08:10:30.000 13.355015400166666  48.6007218995
29.01.2018 8:10 13.354991943833333  48.60072502533333
2018-01-29 08:10:31.000 13.354968547333334  48.60072815216667
29.01.2018 8:10 13.353912527    48.60085315883333
2018-01-29 08:10:54.000 13.353889066666667  48.60085595533333
2018-01-29 08:11:00.000 13.353607144333333  48.60088610016667

我将不胜感激!

【问题讨论】:

  • 两个数据集的时间矩(最左边的名为 datetime 的列)似乎非常不同。它们有什么相关性还是可以忽略?第二,你想用什么样的地球模型:地球形状的球面近似还是更真实的大地测量椭球体表示(椭球体更复杂)?
  • 在我的情况下,Datatime 就像点的 id 以知道这是什么意思。我在想把坐标转换成UTM坐标系,对吗?
  • 你使用什么坐标并不重要,更重要的是你是否有正确的方法来测量距离。我个人认为转换为 UTM 是昂贵的、费力的和不必要的。您只需要使用球体或椭球体(无论您使用哪种地球模型)的度量张量来测量点之间的距离。对于您要检查的每个点与参考轨迹的距离,只需使用笛卡尔坐标和距离的线性校正,来自在该点评估的度量张量。
  • 所以我可以使用地心(笛卡尔)坐标x,y?在这种情况下,我可以使用哪个公式来计算距离?
  • 不,地心笛卡尔坐标是三维的。我的意思是二维坐标,可以认为是在给定点与地球表面相切的平面上的坐标。

标签: python math geo


【解决方案1】:

我猜这在很大程度上取决于您的数据大小。

蛮力方法类似于:

import numpy as np

points_dataframe = np.random.rand(20,2)
trajecotry_dataframe = np.random.rand(5,2)

print('points_dataframe:')
print(points_dataframe)
print('\n\ntrajecotry_dataframe:')
print(trajecotry_dataframe)
print('\n\n')



for index_points, (x1, y1) in enumerate(points_dataframe):

    distance_list = []

    for index_trajecotry, (x2, y2) in enumerate(trajecotry_dataframe):

        distance_list.append(np.sqrt((x1-x2)**2 + (y1-y2)**2))


    sorted_list = np.sort(distance_list)


    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')
    print(f'for element {index_points} in the points_dataframe the two closest points are:')
    point0 = np.where(distance_list==sorted_list[0])[0][0]
    print(f'element {point0} from the trajecotry_dataframe')  
    point1 = np.where(distance_list==sorted_list[1])[0][0]
    print(f'element {point1} from the trajecotry_dataframe')  

但是当数据集更大或者你必须更频繁地重复计算时, 也许您应该考虑将数据保存在地理编码数据库中。

【讨论】:

    【解决方案2】:

    这里有一些用 Matlab 编写的代码,可能会有所帮助。如果有用,您必须将它们转换为 Python。这种方法是蛮力的,而不是最优雅的。但是,我尝试包括将地球形状解释为椭球体的近似坐标系转换。如果假设地球是一个球体,事情可以简化一点。或者,为了提高精度(尽管它的精度很可能可以忽略不计),可以通过球体表面(在给定点最接近椭球体的球体)局部近似椭球体表面,并使用球体代替欧几里得几何学。

    可能有一些拼写错误或错误,但也许您可以了解坐标、转换和方法。

    使用以下两个函数可以转换为:

    1. long_lat0 = [long0, lat0] 点附近的大地(即经度纬度)坐标到欧几里得坐标,是 WGS84 地球椭球体上实际真实大地坐标的一阶线性近似

    2. 相反,您可以将欧几里得坐标转换回大地经纬度

    long_lat0 = [long0, lat0]; % a point from dataset 2
    long_lat % the n x 2 matrix of points from dataset 1 (or a chunk of it) 
    
    %center of approximate Euclidean coordinate system is point long_lat0 
    % with long_lat coordinates and the scaling coefficient 
    % a of longitude and b of latitude, 
    % which equalizes longitude and latitude distance at point long_lat0, is
    
    function  [x, a, b] = convert_to_local_Eucl(long_lat, long_lat0) 
    
       % long_lat0 = [long_0, lat_0] is the origin of the local coordinate system 
       % long_lat  = [long_1, lat_1;
       %              long_2, lat_2;
       %              ............
       %              long_n, lat_n]  is an n x 2 array of points in lat and long coordinates 
       %  on the Earth's ellipsoid
       %  x = [x_1, y_1;
       %       x_2, y_2;
       %      ..........
       %       x_n, y_n] 
       % is the n x 2 matrix of Euclidean coordinates with origin the point long_lat0 
       % a is a number, correction factor of longitude coordinate
       % b is a number, correction factor of latitude
    
       R = 6378137.0 %in meters;
       e_2 = ( R^2 - (6356752.314245)^2 ) / R^2; 
       a = R * (1-e_2) * cosd(long_lat0(2)) / (1 - e_2*sind(long_lat0(2))^(1/2)); % dlong
       b = R * (1-e_2) / (1 - e_2*sind(long_lat0(2))^(3/2); %dlat
       % a and b are correcting/rescaling coefficients 
       % that correct the longitude-latitude coordinates of all points 
       % near point long_lat0 in geodetic coordinates of WGS84.
    
       x = long_lat .- long_lat0; % subtract the long_lat0 from the coordinates of every point 
       % from the list long_lat, i.e. for each  j = 1...n
       %  x(j, 1) = long_lat(j, 1) - long_lat0(1); 
       %  x(j, 2) = long_lat(j, 2) - long_lat0(2); 
    
       x = [ a * x(:,1),  b * x(:, 2)]; 
       % multiply the first column of coordinates by the scaling factor a and 
       % multiply the second column of coordinates by the scaling factor b 
       % these coordinates are first order linear Euclidean approximation 
       % of the real geodetic coordinates of WGS84. 
       % Near the point long_lat0 
       % the error is negligible, especially within a couple of kilometers. 
       % The farther you go from that point, the error slowly increases, 
       % but then it doesn't matter since such points are not the closest anyway.    
    
    end
    
    function  long_lat = convert_to_long_lat(x, long_lat0, a, b) 
    
       % from Euclidean coordinates x = [x(1), x(2)] of a point near long_lat0 go back to 
       % long_lat = [long, lat] coordinates of that points. a and b are the scaling
       % coefficients at point long_lat0
    
       long_lat = [long_lat0(1) + x(1)/a,  long_lat0(2) + x(2)/b];
    
    end
    

    对于数据集 2 中的每个点 long_lat0 = [long0, lat0],首先将大地经纬度转换为 long_lat0 处的近似欧几里得坐标 数据集 1 第二和第三列的整个(或一大块)long_lat 列表:

    x = convert_2_local_Eucl(long_lat, long_lat0);
    

    然后计算所有二维行向量的大小(即长度) x(j,:) = [x(j,1), x(j,1)] 来自数据集x

    magnitudes = norm(x); %you have to either find this function or write one yourself
    

    然后从 x 中找到元素的索引和最小值:

    [j, min] = min(magnitudes);
    

    那么对于两对: x1 = x(j,:) and x2 = x(j+1,:)x1 = x(j,:) and x2 = x(j-1,:) 使用下面的函数计算最近点:

    
    function [dist, long_lat] = dist_point_to_reference(x1, x2, long_lat0, a, b)
        % calculates the shortest distance dist from the point long_lat0 
        % to the closest point on the segment between x1 and x2 
        % and then obtain the long_lat coordinates of this closest point
    
       dist = dot(x1, x1) * dot(x2 - x1, x2 - x1) - dot(x1, x2 - x1)^2 ; % dot is dot product 
       dist = sqrt( dist / ( dot(x2 - x1, x2 - x1)^2) );
       % dist is the distance from the point at the origin [0, 0] 
       % to the straight Euclidean interval between
       % the points x1 = [x1(1), x1(2)] and  x2 = [x2(1), x2(2)] 
    
       if dot(x1, x2 - x1) > 0 % if the height of the triangle is outside, on the side of x1 
          dist = sqrt( dot(x1, x1) );
          long_lat = x1;
       elseif dot(x2, x1 - x2) > 0 % if the height of the triangle is outside, on the side of x2  
          dist = sqrt( dot(x2, x2) );
          long_lat = x1;
       else
          long_lat(1) =  - x2(2) + x1(2); 
          long_lat(2) = x2(1) - x1(1);
          long_lat = long_lat / sqrt(dot(long_lat, long_lat));
          long_lat = - dot(x1, long_lat) * long_lat; % despite the name, these are Eucldean coordinates
       end
    
       long_lat = convert_to_long_lat(long_lat, a, b); % finally, geodetic coordinates
    
    end
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-30
      • 1970-01-01
      • 1970-01-01
      • 2021-11-27
      • 2021-09-16
      • 1970-01-01
      • 2012-08-05
      • 1970-01-01
      相关资源
      最近更新 更多