查询耗时过长，而将其拆分为两个查询耗时 0.2 秒答案

【问题标题】：query taking too long, while split it to two queries taking 0.2 sec查询耗时过长，而将其拆分为两个查询耗时 0.2 秒
【发布时间】：2018-03-10 06:33:16
【问题描述】：

我有当前的查询：

select m.id, ms.severity, ms.risk_score, count(distinct si.id), boarding_date_tbl.boarding_date
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true
join (select m.id merchant_id, min(s_for_boarding.scan_date) boarding_date
    from merchant m
    left join merchant_has_scan ms on m.id = ms.merchant_id
    left join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
    group by m.id) boarding_date_tbl on boarding_date_tbl.merchant_id = m.id
group by m.id
limit 100;

当我在大型计划（大约 200 万“商人”）上运行它时，它需要超过 20 秒。但如果我将其拆分为：

select m.legal_name, m.unique_id, m.merchant_status, s_for_boarding.scan_date
from merchant m
join merchant_has_scan ms on m.id = ms.merchant_id
join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
group by m.id
limit 100;

和

select m.id, ms.severity, ms.risk_score, count(distinct si.id)
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true

group by m.id
limit 100;

两者都需要大约 0.1 秒原因很清楚，下限意味着它不需要做太多事情来获得前 100 个。同样清楚的是，内部选择会导致第一个查询尽可能多地运行。我的问题是有没有办法只对相关商家而不是整个表进行内部选择？

更新

在内部查询之前创建 left join 而不是 join 有助于将其缩短到 6 秒，但它仍然比我执行 2 次查询可以获得的多得多

更新 2

为商家创建表：

CREATE TABLE `merchant` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `last_scan_completed_id` bigint(20) DEFAULT NULL,
  `last_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  CONSTRAINT `FK_9lhkm7tb4bt87qy4j3fjayec5` FOREIGN KEY (`last_scan_completed_id`) REFERENCES `merchant_has_scan` (`id`)
)

merchant_has_scan：

CREATE TABLE `merchant_has_scan` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `merchant_id` bigint(20) NOT NULL,
  `risk_score` int(11) DEFAULT NULL,
  `scan_id` bigint(20) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_merchant_id` (`scan_id`,`merchant_id`),
  CONSTRAINT `FK_3d8f81ts5wj2u99ddhinfc1jp` FOREIGN KEY (`scan_id`) REFERENCES `scan` (`id`),
  CONSTRAINT `FK_e7fhioqt9b9rp9uhvcjnk31qe` FOREIGN KEY (`merchant_id`) REFERENCES `merchant` (`id`)
)

scan_item：

CREATE TABLE `scan_item` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `is_registered` bit(1) NOT NULL,
  `merchant_has_scan_id` bigint(20) NOT NULL,
 PRIMARY KEY (`id`),
  CONSTRAINT `FK_avcc5q3hkehgreivwhoc5h7rb` FOREIGN KEY (`merchant_has_scan_id`) REFERENCES `merchant_has_scan` (`id`)
)

扫描：

CREATE TABLE `scan` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `scan_date` datetime DEFAULT NULL,
  `scan_type` int(11) NOT NULL,
  PRIMARY KEY (`id`)
)

以及解释：

【问题讨论】：

您最好发布表架构
@tvelykyy 有点大，我不确定它是否有帮助，但如果有帮助我可以做到
@tvelykyy 添加了解释和创建声明

标签： mysql performance join

【解决方案1】：

您没有最新版本的 MySQL，它可以为派生表创建索引。（您运行的是什么版本？）
“派生表”（子查询）将是EXPLAIN 中的第一个表，因为它必须是。
merchant_has_scan 是一个多：多表，但没有优化提示 here - 修复这个可能是加速它的最大因素。警告：提示建议摆脱 id，但您似乎对 id 有用，所以保留它。
COUNT(DISTINCT si.id) 和JOIN si... 可以替换为( SELECT COUNT(*) FROM scan_item WHERE ...)，从而消除JOINs 之一，并可能减少Explode-Implode。
LEFT JOIN -- 你是否有时期望得到NULL 为boarding_date？如果没有，请使用JOIN，而不是LEFT JOIN。（最好说明您的意图，而不是让查询有多种解释。）
如果您可以删除LEFTs，那么既然m.id 和merchant_id 被指定为相等，为什么还要将它们都列在SELECT 中？（这是一个混淆因素，而不是速度问题）。
你说你把它一分为二——但你没有。将LIMIT 100 拉出时，您将其添加到内部查询中。如果需要，也将其添加到派生表中。然后您也许可以从外部查询中删除GROUP BY m.id LIMIT 100。

【讨论】：

添加了解释和创建语句。使用 MYSQL 5.7