【问题标题】:PostGis nearest neighbours queryPostGis最近邻查询
【发布时间】:2016-04-19 14:09:01
【问题描述】:

我想检索另一组点的给定范围内的所有点。比方说,找到任何地铁站 500m 范围内的所有商店。

我写了这个查询,很慢,想优化一下:

SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);

我正在运行最新版本的 Postgres 和 PostGis(Postgres 9.5、PostGis 2.2.1)

这是表元数据:

                                         Table "public.locations"
       Column       |            Type             |                       Modifiers
--------------------+-----------------------------+--------------------------------------------------------
 id                 | integer                     | not null default nextval('locations_id_seq'::regclass)
 coordinates        | geometry                    |
Indexes:
    "locations_coordinates_index" gist (coordinates)


                                      Table "public.pois"
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+---------------------------------------------------
 id          | integer                     | not null default nextval('pois_id_seq'::regclass)
 coordinates | geometry                    |
 poi_kind_id | integer                     |
Indexes:
    "pois_pkey" PRIMARY KEY, btree (id)
    "pois_coordinates_index" gist (coordinates)
    "pois_poi_kind_id_index" btree (poi_kind_id)
Foreign-key constraints:
    "pois_poi_kind_id_fkey" FOREIGN KEY (poi_kind_id) REFERENCES poi_kinds(id)

这是 EXPLAIN (ANALYZE, BUFFERS) 的结果:

Unique  (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.080..3338.252 rows=918 loops=1)
Buffers: shared hit=559
->  Sort  (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.079..3338.145 rows=963 loops=1)
      Sort Key: locations.id
      Sort Method: quicksort  Memory: 70kB
      Buffers: shared hit=559
      ->  Nested Loop  (cost=0.00..2407390.71 rows=2 width=4) (actual time=2.466..3337.835 rows=963 loops=1)
            Join Filter: (((pois.coordinates)::geography && _st_expand((locations.coordinates)::geography, 500::double precision)) AND ((locations.coordinates)::geography && _st_expand((pois.coordinates)::geography, 500::double precision)) AND _st_dwithin((pois.coordinates)::geography, (locations.coordinates)::geography, 500::double precision, false))
            Rows Removed by Join Filter: 4531356
            Buffers: shared hit=559
            ->  Seq Scan on locations  (cost=0.00..791.68 rows=24168 width=36) (actual time=0.005..3.100 rows=24237 loops=1)
                  Buffers: shared hit=550
            ->  Materialize  (cost=0.00..10.47 rows=187 width=32) (actual time=0.000..0.009 rows=187 loops=24237)
                  Buffers: shared hit=6
                  ->  Seq Scan on pois  (cost=0.00..9.54 rows=187 width=32) (actual time=0.015..0.053 rows=187 loops=1)
                        Filter: (poi_kind_id = 3)
                        Rows Removed by Filter: 96
                        Buffers: shared hit=6
Planning time: 0.184 ms
Execution time: 3338.304 ms
(20 rows)

【问题讨论】:

  • 它们是几何还是地理?
  • @FrancescoD'Alesio 几何
  • 您使用的是公制坐标系吗?结果慢但正确吗?
  • @FrancescoD'Alesio 是的,它是一个公制系统。是的,当前结果是正确的,但太慢了(大约 3 秒才能匹配 100.000 家商店和 200 个地铁站)

标签: sql postgresql postgis query-performance nearest-neighbor


【解决方案1】:

我最终得出的结论是,我无法在真实的时间(

因此,我预先计算所有内容:每次创建/更新位置或 POI 时,我都会存储每个位置与每种 POI 之间的最小距离,以便能够回答“哪些位置比 X 更近”的问题距离这种 POI 米”。

这是我为此目的编写的模块(它在 Elixir 中,但主要部分是原始 SQL)

defmodule My.POILocationDistanceService do

  alias Ecto.Adapters.SQL
  alias My.Repo

  def delete_distance_for_location(location_id) do
    run_query!("DELETE FROM poi_location_distance WHERE location_id = $1::integer", [location_id])
  end

  def delete_distance_for_poi_kind(poi_kind_id) do
    run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = $1::integer", [poi_kind_id])
  end

  def insert_distance_for_location(location_id) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON (p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM locations l, pois p
    WHERE
      l.id = $1
      AND ST_DWithin(l.coordinates, p.coordinates, $2, FALSE)
    GROUP BY p.poi_kind_id, p.id, l.id
    ORDER BY p.poi_kind_id, distance;
    """

    run_query!(sql, [location_id, max_distance])
  end

  def insert_distance_for_poi_kind(poi_kind_id, offset \\ 0, limit \\ 10_000_000) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON(l.id, p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM pois p, (SELECT * FROM locations OFFSET $1 LIMIT $2) as l
    WHERE
      p.poi_kind_id = $3
      AND ST_DWithin(l.coordinates, p.coordinates, $4, FALSE)
    GROUP BY l.id, p.poi_kind_id, p.id;
    """

    run_query!(sql, [offset, limit, poi_kind_id, max_distance])
  end

  defp run_query!(query, params) do
    SQL.query!(Repo, query, params)
  end

  def max_distance, do: 5000

end

【讨论】:

    【解决方案2】:

    我认为您使用的是地理版本的 st_dwithin,因为第四个参数。

    尝试将您的查询更改为这个:

    SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
    WHERE pois.poi_kind = 'subway'
    AND ST_DWithin(locations.coordinates, pois.coordinates, 500);
    

    如果不能解决,请重新发布解释分析。

    【讨论】:

    【解决方案3】:

    我觉得你应该换个解决方案,postgis 还在结构化数据库中运行查询,它功能强大,但在特殊要求下速度不快,可能你需要 elasticsearch。

    elasticsearch擅长地理定位搜索,但不擅长地理数据处理,我认为你需要它们。

    https://www.elastic.co/blog/geo-location-and-search

    【讨论】:

    • 感谢您的回答,但我真的很想坚持使用 postgres。如果没有办法满足我的需求,我会考虑使用额外的存储引擎
    • 如果进行相应调整,postgis 可以非常快。它是 pg 的一个专门的 GIS db 扩展
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-08-07
    • 2018-10-10
    • 1970-01-01
    • 1970-01-01
    • 2014-02-23
    • 2020-11-06
    • 2011-07-03
    相关资源
    最近更新 更多