【发布时间】:2011-12-26 15:14:59
【问题描述】:
我有一个由 Django 的 ORM 生成的查询,需要数小时才能运行。
report_rank 表(5000 万行)与 report_profile(100k 行)存在一对多关系。我正在尝试为每个 report_profile 检索最新的 report_rank。
我在一个超大的 Amazon EC2 服务器上运行 Postgres 9.1,该服务器具有大量可用 RAM(使用 2GB/15GB)。当然,磁盘 IO 很糟糕。
我在report_rank.created 以及所有外键字段上都有索引。
我可以做些什么来加快这个查询?我很乐意为查询尝试不同的方法,如果它是高性能的,或者调整所需的任何数据库配置参数。
EXPLAIN
SELECT "report_rank"."id", "report_rank"."keyword_id", "report_rank"."site_id"
, "report_rank"."rank", "report_rank"."url", "report_rank"."competition"
, "report_rank"."source", "report_rank"."country", "report_rank"."created"
, MAX(T7."created") AS "max"
FROM "report_rank"
LEFT OUTER JOIN "report_site"
ON ("report_rank"."site_id" = "report_site"."id")
INNER JOIN "report_profile"
ON ("report_site"."id" = "report_profile"."site_id")
INNER JOIN "crm_client"
ON ("report_profile"."client_id" = "crm_client"."id")
INNER JOIN "auth_user"
ON ("crm_client"."user_id" = "auth_user"."id")
LEFT OUTER JOIN "report_rank" T7
ON ("report_site"."id" = T7."site_id")
WHERE ("auth_user"."is_active" = True AND "crm_client"."is_deleted" = False )
GROUP BY "report_rank"."id", "report_rank"."keyword_id", "report_rank"."site_id"
, "report_rank"."rank", "report_rank"."url", "report_rank"."competition"
, "report_rank"."source", "report_rank"."country", "report_rank"."created"
HAVING MAX(T7."created") = "report_rank"."created";
EXPLAIN的输出:
GroupAggregate (cost=1136244292.46..1276589375.47 rows=48133327 width=72)
Filter: (max(t7.created) = report_rank.created)
-> Sort (cost=1136244292.46..1147889577.16 rows=4658113881 width=72)
Sort Key: report_rank.id, report_rank.keyword_id, report_rank.site_id, report_rank.rank, report_rank.url, report_rank.competition, report_rank.source, report_rank.country, report_rank.created
-> Hash Join (cost=1323766.36..6107863.59 rows=4658113881 width=72)
Hash Cond: (report_rank.site_id = report_site.id)
-> Seq Scan on report_rank (cost=0.00..1076119.27 rows=48133327 width=64)
-> Hash (cost=1312601.51..1312601.51 rows=893188 width=16)
-> Hash Right Join (cost=47050.38..1312601.51 rows=893188 width=16)
Hash Cond: (t7.site_id = report_site.id)
-> Seq Scan on report_rank t7 (cost=0.00..1076119.27 rows=48133327 width=12)
-> Hash (cost=46692.28..46692.28 rows=28648 width=8)
-> Nested Loop (cost=2201.98..46692.28 rows=28648 width=8)
-> Hash Join (cost=2201.98..5733.23 rows=28648 width=4)
Hash Cond: (crm_client.user_id = auth_user.id)
-> Hash Join (cost=2040.73..5006.71 rows=44606 width=8)
Hash Cond: (report_profile.client_id = crm_client.id)
-> Seq Scan on report_profile (cost=0.00..1706.09 rows=93009 width=8)
-> Hash (cost=1761.98..1761.98 rows=22300 width=8)
-> Seq Scan on crm_client (cost=0.00..1761.98 rows=22300 width=8)
Filter: (NOT is_deleted)
-> Hash (cost=126.85..126.85 rows=2752 width=4)
-> Seq Scan on auth_user (cost=0.00..126.85 rows=2752 width=4)
Filter: is_active
-> Index Scan using report_site_pkey on report_site (cost=0.00..1.42 rows=1 width=4)
Index Cond: (id = report_profile.site_id)
【问题讨论】:
标签: sql django performance postgresql aggregate-functions