【发布时间】:2010-11-26 04:27:13
【问题描述】:
这个问题是a previous question I asked的更具体的版本
表格
CREATE TABLE Test4_ClusterMatches
(
`match_index` INT UNSIGNED,
`cluster_index` INT UNSIGNED,
`id` INT NOT NULL AUTO_INCREMENT,
`tfidf` FLOAT,
PRIMARY KEY (`cluster_index`,`match_index`,`id`)
);
我要运行的查询
mysql> explain SELECT `match_index`, SUM(`tfidf`) AS total
FROM Test4_ClusterMatches WHERE `cluster_index` IN (1,2,3 ... 3000)
GROUP BY `match_index`;
查询的问题
它使用临时和文件排序,所以它很慢
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | Test4_ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 51540 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
使用当前索引,查询需要首先按 cluster_index 排序,以消除临时和文件排序的使用,但这样做会给 sum(tfidf) 带来错误的结果。
将主键更改为
PRIMARY KEY (`match_index`,`cluster_index`,`id`)
不使用文件排序或临时表,但它使用 14,932,441 行,所以它也很慢
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
| 1 | SIMPLE | Test5_ClusterMatches | index | NULL | PRIMARY | 16 | NULL | 14932441 | Using where; Using index |
+----+-------------+----------------------+-------+---------------+---------+---------+------+----------+--------------------------+
紧密索引扫描
Using tight index scan 只搜索一个索引
mysql> explain SELECT match_index, SUM(tfidf) AS total
FROM Test4_ClusterMatches WHERE cluster_index =3000
GROUP BY match_index;消除临时表和文件排序。
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+
| 1 | SIMPLE | Test4_ClusterMatches | ref | PRIMARY | PRIMARY | 4 | const | 27 | Using where; Using index |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+--------------------------+ 我不确定这是否可以用我还没有遇到的一些神奇的 sql-fu 来利用?
问题
如何更改我的查询以使其使用 3,000 个 cluster_indexes,避免使用临时和文件排序而不需要使用 14,932,441 行?
更新
使用表格
CREATE TABLE Test6_ClusterMatches
(
match_index INT UNSIGNED,
cluster_index INT UNSIGNED,
id INT NOT NULL AUTO_INCREMENT,
tfidf FLOAT,
PRIMARY KEY (id),
UNIQUE KEY(cluster_index,match_index)
);
然后下面的查询给出了 10 行集合(0.41 秒):)
SELECT `match_index`, SUM(`tfidf`) AS total FROM Test6_ClusterMatches WHERE
`cluster_index` IN (.....)
GROUP BY `match_index` ORDER BY total DESC LIMIT 0,10;
但它使用临时和文件排序
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | Test6_ClusterMatches | range | cluster_index | cluster_index | 5 | NULL | 78663 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+----------------------------------------------+
我想知道是否有办法通过消除使用临时和使用文件排序来更快地获得它?
【问题讨论】:
-
@Ben - 您不需要
cluster_index,match_index作为主键,id已经定义为 auto_increment,将其单独作为主键。在cluster_index,match_index上建立一个唯一键,然后重复查询,看看有什么改进吗? -
要利用 innodb CLUSTERED INDEX 或 PRIMARY KEY (dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html) 你应该有以下主键 (cluster_id, match_id,...) - 然后试试看:P
-
@ajreal 是的,它将性能提高了大约 20 个数量级(并给出了正确的结果),但它仍然具有使用临时和文件排序
-
@Ben - 用于文件排序 - mysqlperformanceblog.com/2009/03/05/… 。至于临时的,dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html,而这个mysqlperformanceblog.com/2007/08/16/…如果是内存表,也不会太差
-
@f00 将其作为主键而不是唯一键会有所不同吗?
标签: optimization mysql query-optimization