【发布时间】:2016-04-29 01:03:02
【问题描述】:
我在 AWS m4.large(2 个 vCPU,8 GB 内存)上运行,我看到关于 MySQL 和 GROUPBY 的行为有点令人惊讶。我有这个测试数据库:
CREATE TABLE demo (
time INT,
word VARCHAR(30),
count INT
);
CREATE INDEX timeword_idx ON demo(time, word);
我插入了 4,000,000 条记录,其中包含(统一)随机词 "t%s" % random.randint(0, 30000) 和次数 random.randint(0, 86400)。
SELECT word, time, sum(count) FROM demo GROUP BY time, word;
3996922 rows in set (1 min 28.29 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo GROUP BY time, word;
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
| 1 | SIMPLE | demo | index | NULL | timeword_idx | 38 | NULL | 4002267 | |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
然后我不使用索引:
SELECT word, time, sum(count) FROM demo IGNORE INDEX (timeword_idx) GROUP BY time, word;
3996922 rows in set (34.75 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo IGNORE INDEX (timeword_idx) GROUP BY time, word;
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | demo | ALL | NULL | NULL | NULL | NULL | 4002267 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
如您所见,使用索引查询需要多 3 倍的时间。我并不感到惊讶,因为通过使用索引,查询可能不得不避免读取time 和word 列,但不幸的是,由于索引如此稀疏,它不会获得太多收益。相反,在检索count 时,它会将直接扫描转换为随机访问模式。
我只是想确认这就是原因,并想知道是否有一个“紧凑规则”关于何时和索引在用于 GROUP BY 时最终会带来更差的性能。
编辑:
我遵循 Gordon Linoff 的回答并使用:
CREATE INDEX timeword_idx ON demo(time, word, count);
“覆盖索引”计算结果的速度比全扫描快 10 倍:
SELECT word, time, sum(count) FROM demo GROUP BY time, word;
3996922 rows in set (3.36 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo GROUP BY time, word;
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
| 1 | SIMPLE | demo | index | NULL | timeword_idx | 43 | NULL | 4002267 | Using index |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
非常令人印象深刻!
【问题讨论】:
标签: mysql database group-by database-indexes