【发布时间】:2021-05-22 23:59:16
【问题描述】:
我有一个 ActiveRecord 查询,它使用 OR 运算符将 2 个查询链接在一起。结果恢复正常,但执行组合查询的速度大约是单独执行 2 个查询中的任何一个的 10 倍。
我们有一个Event 模型和一个Invitation 模型。可以通过邀请过滤器将User 邀请到Event,或者通过拥有Invitation 记录单独邀请。
因此,在确定邀请多少用户参加特定活动时,我们必须查看所有带有Invitations 的用户以及所有符合过滤条件的用户。我们在这里这样做:
@invited_count = @invited_by_individual.or(@invited_by_filter).distinct.count(:id)
需要注意的是,@invited_by_individual 和 @invited_by_filter 关系中都有 references 和 includes 语句。
现在,问题是当我们执行该查询时,大约需要 1200 毫秒。如果我们单独进行查询,每个查询只需要大约 80 毫秒。所以@invited_by_filter.distinct.count 和@invited_by_individual.distinct.count 都在大约 80 毫秒内返回结果,但这些都不是单独完成的。
有什么方法可以加快 OR 运算符的查询速度?为什么会发生这种情况?
这是 ActiveRecord 查询生成的 SQL:
快速、单一的查询:
(79.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE "invitations"."event_id" = $1 [["event_id", 732]]
慢,结合查询:
(1220.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2) [["event_id", 732], ["organization_id", 13]]
更新,这里是解释:
(1418.2ms) SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
=>
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=121781.56..121781.57 rows=1 width=8)
-> Hash Right Join (cost=113248.88..121778.64 rows=1165 width=8)
Hash Cond: (invitations.user_id = users.id)
Filter: ((users.root_organization_id = '-1'::integer) OR (invitations.event_id = 749))
-> Seq Scan on invitations (cost=0.00..1299.70 rows=63470 width=8)
-> Hash (cost=93513.28..93513.28 rows=1135328 width=12)
-> Seq Scan on users (cost=0.00..93513.28 rows=1135328 width=12)
(7 rows)
更新 2,EXPLAIN 用于单独运行的查询,确实使用索引:
(91.5ms) SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
=>
EXPLAIN for: SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
#=> QUERY PLAN
Aggregate (cost=19.05..19.06 rows=1 width=8)
-> Nested Loop (cost=0.72..19.05 rows=1 width=0)
-> Index Scan using index_users_on_root_organization_id on users (cost=0.43..4.45 rows=1 width=8)
Index Cond: (root_organization_id = '-1'::integer)
-> Index Only Scan using index_invitations_on_user_id on invitations (cost=0.29..14.57 rows=3 width=4)
Index Cond: (user_id = users.id)
(6 rows)
和
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "invitations"."event_id" = $1 [["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=536.34..536.35 rows=1 width=8)
-> Nested Loop (cost=0.72..536.19 rows=62 width=8)
-> Index Scan using index_invitations_on_event_id on invitations (cost=0.29..11.98 rows=62 width=4)
Index Cond: (event_id = 749)
-> Index Only Scan using users_pkey on users (cost=0.43..8.45 rows=1 width=8)
Index Cond: (id = invitations.user_id)
(6 rows)
【问题讨论】:
-
您的索引是什么样的?该查询的 EXPLAIN 输出是什么?
-
我刚刚用慢查询的解释更新了原帖。回复:索引,我已经确保查询中包含的每一列都有一个索引。
-
这是对其他内容的解释(可能是使用
includes的查询),而不是您要询问的count(distinct ...)查询。 -
抱歉,那是没有
count.的查询的解释我已经用正确的解释更新了帖子。澄清一下,@invited_by_individual和@ invited_by_filterActiveRecord 关系在invitations表上都有一个includes。 -
我在
users和invitations上有很多索引,包括invitations.user_id和invitations.event_id。当我单独对查询运行 EXPLAIN 时,我可以看到它使用了这些索引。但是当我运行结合or运算符的查询时,它没有。使用单独的 EXPLAIN 输出更新帖子
标签: sql ruby-on-rails database postgresql activerecord