使用连接的第一种方法要快得多。其次,将为每一行执行查询。不过,一些数据库将嵌套查询优化为连接。
Join vs. sub-query
文章MySQL performance: INNER JOIN vs. sub-select
我发现在我的表上使用“虚拟表”而不是 ROW 子查询要快得多。似乎行子查询没有优化,“虚拟表”上的连接被优化。
以下是用于教育目的的查询和返回的“EXPLAIN”。
-- 使用 ROW 子查询查询
EXPLAIN
SELECT
*
FROM
region
WHERE
ROW (PDB,CHAIN) IN (
SELECT
region.PDB,
region.CHAIN
FROM
region LEFT JOIN split_domain USING (SUNID)
WHERE
split_domain.SUNID IS NULL
GROUP BY
PDB, CHAIN
HAVING
COUNT(*)>1
)
LIMIT
10
;
+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
| 1 | PRIMARY | region | ALL | NULL | NULL | NULL | NULL | 57362 | Using where |
| 2 | DEPENDENT SUBQUERY | region | ALL | NULL | NULL | NULL | NULL | 57362 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | split_domain | eq_ref | PRIMARY | PRIMARY | 3 | scop_1_65.region.SUNID | 1 | Using where; Using index; Not exists |
+----+--------------------+--------------+--------+---------------+---------+---------+------------------------+-------+--------------------------------------+
3 rows in set (0.04 sec)
我无法从上述内容中得到任何结果(花费的时间太长)- 也许限制条款没有生效?
-- 使用joined virtual table查询
EXPLAIN
SELECT
*
FROM
region
INNER JOIN (
SELECT
region.PDB,
region.CHAIN
FROM
region LEFT JOIN split_domain USING (SUNID)
WHERE
split_domain.SUNID IS NULL
GROUP BY
PDB, CHAIN
HAVING
COUNT(*)>1
) AS x
ON
region.PDB = x.PDB
AND
region.CHAIN = x.CHAIN
LIMIT
10
;
+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 8624 | |
| 1 | PRIMARY | region | ref | PDB,CHAIN,pdb_chain | pdb_chain | 5 | x.PDB,x.CHAIN | 1 | |
| 2 | DERIVED | region | ALL | NULL | NULL | NULL | NULL | 57362 | Using temporary; Using filesort |
| 2 | DERIVED | split_domain | eq_ref | PRIMARY | PRIMARY | 3 | scop_1_65.region.SUNID | 1 | Using where; Using index; Not exists |
+----+-------------+--------------+--------+---------------------+-----------+---------+------------------------+-------+--------------------------------------+
4 rows in set (1.02 sec)
以上返回...
大约 1 秒内 10 个结果
大约 1 秒内 100 个结果
1000 个结果大约需要 1.5 秒
2秒左右完成(20437)
前一个查询不会在 5 分钟内返回(即使限制为 10)。
我希望这对任何设计(或试图优化)复杂子查询的人有用,并且数据的精确细节对于传达此处显示的结果不是必需的。