如何使用正确的索引加快这个慢查询？答案

【问题标题】：How can I speed up this slow query with the right indexes?如何使用正确的索引加快这个慢查询？
【发布时间】：2014-02-27 01:44:51
【问题描述】：

SELECT "items".* FROM "items" 
INNER JOIN item_mods ON item_mods.item_id = items.id 
INNER JOIN mods ON mods.id = item_mods.mod_id 
AND item_mods.mod_id = 3 
WHERE (items.player_id = '1') 
GROUP BY items.id, item_mods.primary_value 
ORDER BY item_mods.primary_value DESC NULLS LAST, items.created_at DESC LIMIT 100

此查询目前大约需要 7 秒。我在 items 表上有大约 550k 条记录，在 item_mods 表上有大约 250 万条记录，在 mods 表上有大约 800 条记录。我有很多索引，但我不确定我是否使用了正确的索引。

如果你要优化这个查询，你会推荐什么？

这里是解释分析。

http://explain.depesz.com/s/aiYH

"Limit  (cost=107274.88..107275.13 rows=100 width=554) (actual time=6648.872..6648.888 rows=100 loops=1)"
"  ->  Sort  (cost=107274.88..107419.24 rows=57745 width=554) (actual time=6648.870..6648.879 rows=100 loops=1)"
"        Sort Key: item_mods.primary_value, items.created_at"
"        Sort Method: top-N heapsort  Memory: 103kB"
"        ->  Group  (cost=104634.82..105067.91 rows=57745 width=554) (actual time=6358.348..6529.342 rows=57498 loops=1)"
"              ->  Sort  (cost=104634.82..104779.18 rows=57745 width=554) (actual time=6358.344..6423.184 rows=57498 loops=1)"
"                    Sort Key: items.id, item_mods.primary_value"
"                    Sort Method: external sort  Disk: 25624kB"
"                    ->  Nested Loop  (cost=23182.35..71248.94 rows=57745 width=554) (actual time=3339.625..6127.659 rows=57498 loops=1)"
"                          ->  Index Scan using mods_pkey on mods  (cost=0.00..8.27 rows=1 width=4) (actual time=0.323..0.324 rows=1 loops=1)"
"                                Index Cond: (id = 3)"
"                          ->  Merge Join  (cost=23182.35..70663.22 rows=57745 width=558) (actual time=3339.298..6108.202 rows=57498 loops=1)"
"                                Merge Cond: (items.id = item_mods.item_id)"
"                                ->  Index Scan using items_pkey on items  (cost=0.00..45112.64 rows=543004 width=550) (actual time=3.190..2575.715 rows=543024 loops=1)"
"                                      Filter: (player_id = 1)"
"                                ->  Materialize  (cost=23182.33..23471.20 rows=57774 width=12) (actual time=3336.099..3388.810 rows=57547 loops=1)"
"                                      ->  Sort  (cost=23182.33..23326.76 rows=57774 width=12) (actual time=3336.095..3370.179 rows=57547 loops=1)"
"                                            Sort Key: item_mods.item_id"
"                                            Sort Method: external sort  Disk: 1240kB"
"                                            ->  Bitmap Heap Scan on item_mods  (cost=1084.27..17622.45 rows=57774 width=12) (actual time=31.728..3263.762 rows=57547 loops=1)"
"                                                  Recheck Cond: (mod_id = 3)"
"                                                  ->  Bitmap Index Scan on primary_value_mod_id_desc  (cost=0.00..1069.83 rows=57774 width=0) (actual time=29.565..29.565 rows=57547 loops=1)"
"                                                        Index Cond: (mod_id = 3)"
"Total runtime: 6652.100 ms"

更新

我已按照建议修改了查询。我使用 GROUP BY 只为每个项目 ID 选择 1 个项目，但我想 distinct 也可以。这是新的查询和解释，它仍然需要太长时间。查询的想法是查找玩家 '1' 拥有的所有带有物品修饰符 '3' 的物品，这些物品由具有最高主值的修饰符排序。

SELECT DISTINCT("items".id), "item_mods".primary_value, "items".created_at 
FROM "items" INNER JOIN item_mods ON item_mods.item_id = items.id 
INNER JOIN mods ON mods.id = item_mods.mod_id AND item_mods.mod_id = 3 
WHERE (items.player_id = '1') 
ORDER BY item_mods.primary_value DESC NULLS LAST, items.created_at DESC LIMIT 100

解释http://explain.depesz.com/s/t4Zq

"Limit  (cost=73737.59..73738.59 rows=100 width=16) (actual time=6450.253..6450.344 rows=100 loops=1)"
"  ->  Unique  (cost=73737.59..74315.04 rows=57745 width=16) (actual time=6450.248..6450.316 rows=100 loops=1)"
"        ->  Sort  (cost=73737.59..73881.95 rows=57745 width=16) (actual time=6450.242..6450.272 rows=100 loops=1)"
"              Sort Key: item_mods.primary_value, items.created_at, items.id"
"              Sort Method: external merge  Disk: 1456kB"
"              ->  Hash Join  (cost=46944.77..68183.71 rows=57745 width=16) (actual time=3018.769..6342.109 rows=57498 loops=1)"
"                    Hash Cond: (item_mods.item_id = items.id)"
"                    ->  Nested Loop  (cost=1084.27..18208.45 rows=57774 width=8) (actual time=15.911..3219.086 rows=57547 loops=1)"
"                          ->  Index Scan using mods_pkey on mods  (cost=0.00..8.27 rows=1 width=4) (actual time=0.486..0.489 rows=1 loops=1)"
"                                Index Cond: (id = 3)"
"                          ->  Bitmap Heap Scan on item_mods  (cost=1084.27..17622.45 rows=57774 width=12) (actual time=15.416..3197.257 rows=57547 loops=1)"
"                                Recheck Cond: (mod_id = 3)"
"                                ->  Bitmap Index Scan on primary_value_mod_id_desc  (cost=0.00..1069.83 rows=57774 width=0) (actual time=13.517..13.517 rows=57547 loops=1)"
"                                      Index Cond: (mod_id = 3)"
"                    ->  Hash  (cost=36420.95..36420.95 rows=543004 width=12) (actual time=2987.089..2987.089 rows=543024 loops=1)"
"                          Buckets: 4096  Batches: 32  Memory Usage: 811kB"
"                          ->  Seq Scan on items  (cost=0.00..36420.95 rows=543004 width=12) (actual time=0.012..2825.650 rows=543024 loops=1)"
"                                Filter: (player_id = 1)"
"Total runtime: 6457.586 ms"

更新 2

好的，我想我快到了。此查询需要 6 秒并产生我想要的结果

SELECT "items".id, item_mods.primary_value
FROM "items" 
INNER JOIN item_mods ON item_mods.item_id = items.id AND item_mods.mod_id = 36 
WHERE (items.player_id = '1') 
ORDER BY item_mods.primary_value DESC, item_mods.id DESC
LIMIT 100

但是这个查询需要 9 毫秒！注意 ORDER BY 的区别。但我需要它们按最近的顺序排列。我在 (item_mods.primary_value DESC, item_mods.id DESC) 上有一个索引，但它似乎没有使用它？

SELECT "items".id, item_mods.primary_value
FROM "items" 
INNER JOIN item_mods ON item_mods.item_id = items.id AND item_mods.mod_id = 36 
WHERE (items.player_id = '1') 
ORDER BY item_mods.primary_value DESC
LIMIT 100

【问题讨论】：

只有在您确实需要选择所有字段时才使用 *
另外，确保 items.player_id、item_mods.item_id、item_mods.mod_id 都有自己的 idex
当您有GROUP BY 时，如何选择*？
如果 items.id 是唯一的，首先删除 group by items.id，其次为什么使用 group by 而不使用聚合函数？
external merge Disk: 1456kB。 SET work_mem = '50MB' 并重试。另外，items(player_id) 上是否有索引？

标签： sql postgresql

【解决方案1】：

我假设您正在使用 Postgres“功能”，您可以按表中的主键/唯一键进行分组，然后从该表中选择所有列。否则，select * 在聚合查询中没有意义。

SELECT "items".*
FROM "items"  INNER JOIN
     item_mods
     ON item_mods.item_id = items.id INNER JOIN
     mods
     ON mods.id = item_mods.mod_id AND item_mods.mod_id = 3 
WHERE (items.player_id = '1') 
GROUP BY items.id, item_mods.primary_value 
ORDER BY item_mods.primary_value DESC NULLS LAST, items.created_at DESC
LIMIT 100;

以下索引应该有助于此查询：

items(player_id, id)
item_mods(item_id, mod_id);
mods(id);

【讨论】：

我有这些索引。我假设 mods(id) 已经被索引为它的主键？

【解决方案2】：

我通过将索引 (mod_id, primary_value desc, id desc) 添加到 item_mods 表来修复它。查询现在在 10-15 毫秒内运行

【讨论】：

【解决方案3】：

使用复合索引。

什么是索引？

索引是关于表中数据的特殊信息块，需要在每次更新包含该索引的表时进行更新，这意味着如果您不断更新索引表索引的数据可能会对性能产生负面影响.

积极的一面是减少了搜索/排序/分组时间。

什么是复合索引？复合索引是一个特殊的信息块，可以被视为已排序的数组，其中行包含由构成键的所有列的值连接而成的数据。复合键仅由单个表的列组成（MySQL，对其他人不确定！），它可以加速对单个表进行的多种查询。

索引的潜在候选者（列）是什么？用于搜索（选择）、分组和排序（排序）的那些。

有没有办法强制/忽略索引的使用？是的。（MySQL！）

索引的潜在候选对象是什么？那怎么找呢？查询中性能被视为 - 慢的列。

【讨论】：