我也一直在努力解决这个问题。但是,我意识到我一直使用错误的关键字。如果您希望点结果成员的数量相同,则您正在进行分组,而不是再进行聚类。我终于能够使用简单的 python 脚本和 postgis 查询来解决问题。
例如,我有一个名为 tb_points 的表,它有 4000 个坐标点,你想把它分成 10 个相同大小的组,每个组包含 400 个坐标点。下面是表结构的例子
CREATE TABLE tb_points (
id SERIAL PRIMARY KEY,
outlet_id INTEGER,
longitude FLOAT,
latitide FLOAT,
group_id INTEGER
);
那么你需要做的是:
- 找到作为起点的第一个坐标
- 查找距您的起点最近的坐标,按距离升序排列,将结果限制为您首选成员的数量(在本例中为 400)
- 通过更新 group_id 列来更新结果
- 对其余数据执行 10 次以上 3 步,其中 group_id 列仍为 NULL
这是python中的实现:
import psycopg2
dbhost = ''
dbuser = ''
dbpass = ''
dbname = ''
dbport = 5432
conn = psycopg2.connect(host = dbhost,
user = dbuser,
password = dbpass,
database = dbname,
port = dbport)
def fetch(sql):
cursor = conn.cursor()
rs = None
try:
cursor.execute(sql)
rs = cursor.fetchall()
except psycopg2.Error as e:
print(e.pgerror)
rs = 'error'
cursor.close()
return rs
def execScalar(sql):
cursor = conn.cursor()
try:
cursor.execute(sql)
conn.commit()
rowsaffected = cursor.rowcount
except psycopg2.Error as e:
print(e.pgerror)
rowsaffected = -1
conn.rollback()
cursor.close()
return rowsaffected
def select_first_cluster_id():
sql = """ SELECT a.outlet_id as ori_id, a.longitude as ori_lon,
a.latitude as ori_lat, b.outlet_id as dest_id, b.longitude as
dest_lon, b.latitude as dest_lat,
ST_Distance(CAST(ST_SetSRID(ST_Point(a.longitude,a.latitude),4326)
AS geography),
CAST(ST_SetSRID(ST_Point(b.longitude,b.latitude),4326) AS geography))
AS air_distance FROM tb_points a CROSS JOIN tb_points b WHERE
a.outlet_id != b.outlet_id and a.group_id is NULL and b.group_id is
null order by air_distance desc limit 1 """
return sql
def update_group_id(group_id, ori_id, limit_constraint):
sql = """ UPDATE tb_points
set group_id = %s
where outlet_id in
(select b.outlet_id
from tb_points a,
tb_points b
where a.outlet_id = '%s'
and a.group_id is null
and b.group_id is null
order by ST_Distance(CAST(ST_SetSRID(ST_Point(a.longitude,a.latitude),4326) AS geography),
CAST(ST_SetSRID(ST_Point(b.longitude,b.latitude),4326) AS geography)) asc
limit %s)
""" % (group_id, ori_id, limit_constraint)
return sql
def clustering():
data_constraint = [100]
n = 1
while n <= 10:
sql = select_first_cluster_id()
res = fetch(sql)
ori_id = res[0][0]
sql = update_group_id(n, ori_id, data_constraint[0])
print(sql)
execScalar(sql)
n += 1
clustering()
希望对你有帮助