'WHERE column LIKE "%expression%" ' 如何比 MySQL 中的 MATCH(column) AGAINST("expression") 表现更好？答案

【问题标题】：How can a 'WHERE column LIKE "%expression%" ' perform better than a MATCH(column) AGAINST("expression") in MySQL?'WHERE column LIKE "%expression%" ' 如何比 MySQL 中的 MATCH(column) AGAINST("expression") 表现更好？
【发布时间】：2011-09-28 16:15:29
【问题描述】：

我遇到了严重的 MySQL 性能瓶颈，我无法理解和解决。以下是表结构、索引和记录数（请耐心等待，它只有两个表）：

mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid        | bigint(20) unsigned | NO   | PRI | NULL    |       |
| title       | text                | NO   | MUL | NULL    |       |
| description | text                | NO   |     | NULL    |       |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table              | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity |          0 | PRIMARY  |            1 | guid        | A         |      613637 |     NULL | NULL   |      | BTREE      |         |
| elggobjects_entity |          1 | title    |            1 | title       | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
| elggobjects_entity |          1 | title    |            2 | description | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
|   613637 |
+----------+
1 row in set (0.00 sec)

mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field        | Type                | Null | Key | Default | Extra          |
+--------------+---------------------+------+-----+---------+----------------+
| id           | int(11)             | NO   | PRI | NULL    | auto_increment |
| guid_one     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| relationship | varchar(50)         | NO   | MUL | NULL    |                |
| guid_two     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| time_created | int(11)             | NO   |     | NULL    |                |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                    | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships |          0 | PRIMARY      |            1 | id           | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            1 | guid_one     | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            2 | relationship | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            3 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | relationship |            1 | relationship | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | guid_two     |            1 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)

mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)

现在我想在这两个表上使用 INNER JOIN 并执行全文搜索。

查询：

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((MATCH (o.title, o.description) AGAINST ('scelerisque' )))

这给了我 6 分钟 (!) 的响应时间。

另一方面，这个

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))

在 0.02 秒内返回相同的计数值。

这怎么可能？我在这里想念什么？（MySQL 信息：mysql Ver 14.14 Distrib 5.1.49，适用于使用 readline 6.1 的 debian-linux-gnu (x86_64)）

编辑

解释第一个查询（使用 match .. against）给出：

+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type     | possible_keys         | key          | key_len | ref   | rows | Extra       |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
|  1 | SIMPLE      | r     | ref      | guid_one,relationship | relationship | 152     | const | 6145 | Using where |
|  1 | SIMPLE      | o     | fulltext | PRIMARY,title         | title        | 0       |       |    1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+

而第二个查询（使用 LIKE "%..%"）：

+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys         | key          | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | r     | ref    | guid_one,relationship | relationship | 152     | const               | 6145 | Using where |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY               | PRIMARY      | 8       | elgg1710.r.guid_one |    1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+

【问题讨论】：

您是否尝试过“解释”每个查询？
@James Anderson 将 EXPLAIN 输出添加到问题中。
您是否进行了多次测试以试图消除数据缓存的影响？
@Tim 是的，好几次。结果大致相同，LIKE 子句总是给出 0.01 - 0.03 响应时间，而 MATCH .. AGAINST 子句在几分钟后返回（在 4 到 7 之间）。我一定忽略了一些显而易见的事情，这些结果根本不可能。
是否将匹配扩展为 2 个单独的语句会加快速度？ (MATCH (o.title) AGAINST ('scelerisque' )) OR (MATCH (o.description) AGAINST ('scelerisque'))

标签： mysql performance full-text-search

【解决方案1】：

结合您的经验和 EXPLAIN 的结果，在这种特殊情况下，全文索引似乎没有您期望的那么有用。这取决于数据库中的特定数据、数据库结构或/和特定查询。

通常数据库引擎每个表使用不超过一个索引。因此，当表有多个索引时，查询优化器会尝试使用更好的索引。但优化器并不总是足够聪明。

EXPLAIN 的输出显示数据库查询优化器决定使用relationship 和title 的索引。关系过滤器将表 elggentity_relationships 减少到 6145 行。并且标题过滤器将表 elggobjects_entity 减少到 72697 行。然后 MySQL 需要在不使用任何索引的情况下连接这些表（6145 x 72697 = 446723065 次过滤操作），因为索引已经用于过滤。在这种情况下，这可能太多了。 MySQL 甚至可以通过尝试在内存中保留足够的可用空间来决定将中间计算保留在硬盘中。

现在让我们看看另一个查询。它使用relationship 和PRIMARY KEY（表elggobjects_entity）作为其索引。关系过滤器将表 elggentity_relationships 减少到 6145 行。通过在 PRIMARY KEY 索引上加入这些表，结果仅获得 3957 行。这对于最后一个过滤器（即LIKE "%scelerisque%"）来说并不多，即使索引根本没有用于此目的。

如您所见，速度很大程度上取决于为查询选择的索引。因此，在这种特殊情况下，PRIMARY KEY 索引比全文title 索引更有用，因为PRIMARY KEY 比title 对结果减少的影响更大。

MySQL 并不总是很聪明地设置正确的索引。我们可以手动执行此操作，使用 IGNORE INDEX (index_name)、FORCE INDEX (index_name) 等子句。

但在您的情况下，问题是如果我们在查询中使用MATCH() AGAINST()，则需要全文索引，因为如果没有全文索引，MATCH() AGAINST() 根本无法工作。所以这就是 MySQL 为查询选择了错误索引的主要原因。

更新

好的，我做了一些调查。

首先，您可以尝试强制 MySQL 在表 elggentity_relationships:USE INDEX (guid_one) 上使用 guid_one 索引而不是 relationship。

但为了获得更好的性能，我认为您可以尝试为两列的组合创建一个索引（guid_one、membership）。当前索引guid_one 非常相似，但为 3 列，而不是 2。在此查询中，仅使用了 2 列。在我看来，创建索引后 MySQL 应该自动使用正确的索引。如果没有，强制 MySQL 使用它。

注意：创建索引后不要忘记从查询中删除旧的USE INDEX 指令，因为这可能会阻止查询使用新创建的索引。 :)

【讨论】：

感谢您深入研究！那么你是在告诉我 MySQL 试图在整个 elggobjects_entity 表上进行匹配 .. against() ，而不是在由 INNER JOIN 子句过滤的行集上？如果是这样，有没有办法告诉 MySQL 应该以什么顺序执行 join 和 where 子句？与 STRAIGHT_JOINS 一样，我可以指定连接顺序，但在这种情况下不起作用。
@András，是的，MySQL 尝试在整个 elggobjects_entity 表上匹配...against()。但这不是问题。由于索引，这很快。问题是该表使用一个索引进行匹配...反对（），因此该表不能使用另一个索引进行连接操作。所以这意味着表连接操作是在没有任何索引的情况下完成的。
好的，现在我明白了。这是否意味着没有办法重构上述查询（使用匹配...反对）以达到合理的性能？真的坚持简单的旧 LIKE "%..%" 是我唯一的解决方案吗？
与我使用 LIKE "%..%" 子句所能达到的大致相同。几百毫秒。此查询由多个用户经常使用的搜索页面调用。不能让 Web 界面挂起几分钟。