【问题标题】:Finding the Distance Between Several Points- Lat/Long查找几个点之间的距离 - 纬度/经度
【发布时间】:2024-01-07 09:19:02
【问题描述】:

我有一张大表,上面有接送经纬度数据。这张表有几十万条记录,我想找出每次上车和下车之间的距离。

这可以在 BigQuery 中使用 SQL 来完成吗?

【问题讨论】:

  • 距单个基点的距离,还是排序的,还是旅行商问题?
  • 不是从一个单一的基点。这与司机和他一天的行程有关。所以他有不同的上车和下车 GPS 坐标,具体取决于上一次下车​​和下一次上车之间的距离。
  • 理论上 - 您可以使用 BigQuery 做很多事情!!实际可行与否 - 取决于具体情况 - 提供有关您的挑战的更多详细信息,我们将尽力提供帮助
  • 所以每条记录都有一个车号,pickup lat,pickup long,dropoff lat,dropoff long。我想知道每条记录中pickoff lat/long和dropoff lat/long之间的距离,以及上一条记录中dropoff lat/long与下一条记录中pickup lat/long之间的距离。
  • 是的,可以在 BigQuery 中完成。提供一些示例数据(仅几行)

标签: sql google-bigquery distance latitude-longitude


【解决方案1】:

试试下面,对你来说应该是一个好的开始

SELECT 
  orderid, 
  car_number, 
  ROUND(distance) AS distance, 
  ROUND(next_distance) AS next_distance
FROM JS(
  (
  // input table
  SELECT
    orderid,
    car_number,
    pickup_lon, 
    pickup_lat, 
    dropoff_lon, 
    dropoff_lat,
    LEAD(pickup_lon) OVER(PARTITION BY car_number ORDER BY orderid) AS next_pickup_lon,
    LEAD(pickup_lat) OVER(PARTITION BY car_number ORDER BY orderid) AS next_pickup_lat
  FROM 
    (SELECT 1 AS orderid, 1 AS car_number, -121.23200000000001 AS pickup_lon, 38.1964 AS pickup_lat, -117.48 AS dropoff_lon, 34.5894 AS dropoff_lat ),
    (SELECT 2 AS orderid, 1 AS car_number, -118.76 AS pickup_lon, 34.1445 AS pickup_lat, -122.26 AS dropoff_lon, 37.7606 AS dropoff_lat), 
    (SELECT 3 AS orderid, 2 AS car_number, -117.736 AS pickup_lon, 33.5761 AS pickup_lat, -117.19333333333333 AS dropoff_lon, 34.47484444444444 AS dropoff_lat)      
  ) ,
// input columns
orderid, car_number, pickup_lon, pickup_lat, dropoff_lon, dropoff_lat, next_pickup_lon, next_pickup_lat,
// output schema
"[{name: 'orderid', type: 'integer'},
{name: 'car_number', type: 'integer'},
{name: 'distance', type: 'float'},
{name: 'next_distance', type: 'float'}]",
// function
"function(r, emit){

  emit({
    orderid: r.orderid, car_number: r.car_number, 
    distance: dist(r.pickup_lon, r.pickup_lat, r.dropoff_lon, r.dropoff_lat),
    next_distance: dist(r.dropoff_lon, r.dropoff_lat, r.next_pickup_lon, r.next_pickup_lat)
    });

  function deg2rad(deg) {
    return deg * (Math.PI/180)
  }

  function dist(pickup_lon, pickup_lat, dropoff_lon, dropoff_lat) {
    var R = 3959; // Radius of the earth in miles
    var dLat = deg2rad(dropoff_lat-pickup_lat);  
    var dLon = deg2rad(dropoff_lon-pickup_lon); 
    var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
      Math.cos(deg2rad(pickup_lat)) * Math.cos(deg2rad(dropoff_lat)) * 
      Math.sin(dLon/2) * Math.sin(dLon/2); 
    var c = 2 * R * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); 
    return c;
  }

}"
)

结果是:

orderid car_number  distance    next_distance    
1       1           325.0       79.0     
2       1           317.0       NaN  
3       2            69.0       NaN  

来自 cmets:我的表架构是这样的:

car_number     (string)
pu_datetime    (timestamp)
do_datetime    (timestamp)
pu_lat         (float) 
pu_long        (float) 
do_lat         (float) 
do_long        (float)   

添加以匹配您的表架构

SELECT 
  pu_datetime, 
  car_number, 
  ROUND(distance) AS distance, 
  ROUND(next_distance) AS next_distance
FROM JS(
  (
  // input table
  SELECT
    pu_datetime,
    car_number,
    pu_lon, 
    pu_lat, 
    do_lon, 
    do_lat,
    LEAD(pu_lon) OVER(PARTITION BY car_number ORDER BY pu_datetime) AS next_pu_lon,
    LEAD(pu_lat) OVER(PARTITION BY car_number ORDER BY pu_datetime) AS next_pu_lat
  FROM 
    (SELECT timestamp('2016-07-01 13:00:00') AS pu_datetime, '1' AS car_number, -121.23200000000001 AS pu_lon, 38.1964 AS pu_lat, -117.48 AS do_lon, 34.5894 AS do_lat ),
    (SELECT timestamp('2016-07-02 10:00:00') AS pu_datetime, '1' AS car_number, -118.76 AS pu_lon, 34.1445 AS pu_lat, -122.26 AS do_lon, 37.7606 AS do_lat), 
    (SELECT timestamp('2016-07-03 11:00:00') AS pu_datetime, '2' AS car_number, -117.736 AS pu_lon, 33.5761 AS pu_lat, -117.19333333333333 AS do_lon, 34.47484444444444 AS do_lat)      
  ) ,
// input columns
pu_datetime, car_number, pu_lon, pu_lat, do_lon, do_lat, next_pu_lon, next_pu_lat,
// output schema
"[{name: 'pu_datetime', type: 'timestamp'},
{name: 'car_number', type: 'string'},
{name: 'distance', type: 'float'},
{name: 'next_distance', type: 'float'}]",
// function
"function(r, emit){

  emit({
    pu_datetime: r.pu_datetime, car_number: r.car_number, 
    distance: dist(r.pu_lon, r.pu_lat, r.do_lon, r.do_lat),
    next_distance: dist(r.do_lon, r.do_lat, r.next_pu_lon, r.next_pu_lat)
    });

  function deg2rad(deg) {
    return deg * (Math.PI/180)
  }

  function dist(pu_lon, pu_lat, do_lon, do_lat) {
    var R = 3959; // Radius of the earth in miles
    var dLat = deg2rad(do_lat-pu_lat);  
    var dLon = deg2rad(do_lon-pu_lon); 
    var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
      Math.cos(deg2rad(pu_lat)) * Math.cos(deg2rad(do_lat)) * 
      Math.sin(dLon/2) * Math.sin(dLon/2); 
    var c = 2 * R * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); 
    return c;
  }

}"
)

【讨论】:

  • 是否可以在不指定 SQL 中的准确纬度/经度并改为引用整个表的情况下完成此操作?我有一张很大的桌子。
  • 我试过了,但它在 BigQuery 中不起作用 - BigQuery 没有可用的“输出架构”功能。
  • 好的,按原样运行它,它就可以工作了。当我引用存储在 BigQuery 中的表时,它似乎失败了。此外,我正在查询的表没有唯一标识符。
  • 所以要么调整代码以匹配您的表的架构 - 或提供您的架构 - 所以我会在时间允许时提供帮助!
  • 我的表模式是这样的: - car_number (string) - pu_datetime (timestamp) - do_datetime (timestamp) - pu_lat (float) - pu_long (float) - do_lat (float) - do_long (float)
【解决方案2】:

此函数将计算两点之间的英里数。这不是驾驶距离。为此,您必须使用 Google API,而且我相信每天的点击次数限制为 2500 次。

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE Function [dbo].[udf-Geo-CalcMiles] (@Long1 float,@Lat1 float,@Long2 float,@Lat2 Float)  
Returns Float as  
Begin 
   Declare @Miles Float
   Set @Miles = 0
   Set @Miles = (Sin(Radians(@Lat1)) * Sin(Radians(@Lat2))) + (Cos(Radians(@Lat1)) * Cos(Radians(@Lat2)) * Cos(Radians(@Long2) - Radians(@Long1)))
   Return Case When @Miles =0 then 0 else abs((3958.75 * Atan(Sqrt(1 - power(@Miles, 2)) / @Miles))) end
End

例如

;with cteBase as (
   Select Top 10 
          Box_Nr
         ,From_Lat=Box_Lat
         ,From_Lng=Box_Lng
         ,To_Lat = Lead(Box_Lat,1) over (Order By Box_Nr)
         ,To_Lng = Lead(Box_Lng,1) over (Order By Box_Nr)
    from [dbo].[USPS-Collection-Point] Order by Box_Nr
)
Select * 
      ,Distance = [dbo].[udf-Geo-CalcMiles] (From_Lng,From_Lat,To_Lng,To_Lat)  
 From cteBase

返回

Box_Nr  From_Lat    From_Lng    To_Lat      To_Lng      Distance
1       41.6947535  -71.1394048 41.6980289  -71.1341529 0.353027299635122
2       41.6980289  -71.1341529 41.682761   -71.124812  1.15978643547294
3       41.682761   -71.124812  41.7051854  -71.1477813 1.95061187457874
4       41.7051854  -71.1477813 41.7080851  -71.1441519 0.274205428604983
5       41.7080851  -71.1441519 41.702242   -71.1282279 0.915266219941071
6       41.702242   -71.1282279 41.711085   -71.142123  0.941821767229312
7       41.711085   -71.142123  41.7055301  -71.1516977 0.625473329587972
8       41.7055301  -71.1516977 41.7071749  -71.1513423 0.115113681673717
9       41.7071749  -71.1513423 41.7079604  -71.1543306 0.163416766676813
10      41.7079604  -71.1543306 41.667808   -71.154372  2.77425950023261

【讨论】: