【发布时间】:2021-04-18 18:42:07
【问题描述】:
我正在为一个旧应用程序(MySQL 5.6)构建一个报告工具。它使用单个reports 表,所有相关列都有一个索引,并且该表大约有 120'000 行。
我的查询如下,其中 GROUP BY 列和 WHERE 子句可能会根据用户配置报告的方式而改变:
SELECT
nationality AS groupValue,
( SELECT COUNT( request_id ) FROM reports WHERE create_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND nationality = groupValue ) AS totalRequests,
( SELECT SUM(effective_amount) FROM reports WHERE pay_off_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND nationality = groupValue ) AS totalSum,
( SELECT COUNT( request_id ) FROM reports WHERE pay_off_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND current_status = 5 AND nationality = groupValue ) AS totalPaid,
( SELECT COUNT( request_id ) FROM reports WHERE failed_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND current_status = 3 AND nationality = groupValue ) AS totalRefused,
( SELECT COUNT( request_id ) FROM reports WHERE failed_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND current_status = 4 AND nationality = groupValue ) AS totalRenounced,
( SELECT COUNT( request_id ) FROM reports WHERE failed_date BETWEEN '2020-10-01 00:00:00' AND '2021-12-31 23:59:59' AND current_status = 2 AND nationality = groupValue ) AS totalInProgress
FROM
reports
GROUP BY
nationality;
GROUP BY 列有大约 40 个不同的值,性能很好,但如果我选择一个有大约 200 个不同值的列(例如国籍,见下文),性能会下降到未知时间(我放弃了等待)。
我的感觉是,这可能与某些二次复杂性有关,因为 GROUP BY 在主查询和子查询中都使用,我需要这样做,因为 WHERE 子句根据它们各自的 SELECT 值而变化。但是我的SQL fu 太久忘记了……
我的问题:
- 是什么导致性能下降?
- 如何改进此查询?
更新一:
这是reports 表定义(简化):
CREATE TABLE `reports` (
`request_id` int(11) NOT NULL,
`employee_id` int(11) NOT NULL,
`create_date` datetime NOT NULL,
`failed_date` datetime DEFAULT NULL,
`pay_off_date` datetime DEFAULT NULL,
`nationality` varchar(30) DEFAULT NULL,
`effective_amount` decimal(13,2) DEFAULT NULL,
`current_status` tinyint(2) DEFAULT NULL,
PRIMARY KEY (`request_id`,`customer_id`,`employee_id`),
UNIQUE KEY `request_id` (`request_id`),
KEY `fk_request_id` (`request_id`),
KEY `fk_employee_id` (`employee_id`),
KEY `create_date_index` (`create_date`),
KEY `failed_date_index` (`failed_date`),
KEY `pay_off_date_index` (`pay_off_date`),
KEY `nationality_index` (`nationality`(2))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
这是一个执行良好的查询:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | reports | NULL | ALL | NULL | NULL | NULL | NULL | 118923 | 100.00 | Using temporary; Using filesort |
| 7 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 6 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 5 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 4 | DEPENDENT SUBQUERY | reports | NULL | ALL | pay_off_date_index,nationality_index | NULL | NULL | NULL | 118923 | 1.00 | Range checked for each record (index map: 0x8080) |
| 3 | DEPENDENT SUBQUERY | reports | NULL | ALL | pay_off_date_index,nationality_index | NULL | NULL | NULL | 118923 | 10.00 | Range checked for each record (index map: 0x8080) |
| 2 | DEPENDENT SUBQUERY | reports | NULL | ref | create_date_index,nationality_index | nationality_index | 5 | func | 720 | 2.10 | Using where |
这是带有国籍的慢查询:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | reports | NULL | ALL | NULL | NULL | NULL | NULL | 118923 | 100.00 | Using temporary; Using filesort |
| 7 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 6 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 5 | DEPENDENT SUBQUERY | reports | NULL | ref | failed_date_index,nationality_index | nationality_index | 5 | func | 720 | 0.12 | Using where |
| 4 | DEPENDENT SUBQUERY | reports | NULL | ALL | pay_off_date_index,nationality_index | NULL | NULL | NULL | 118923 | 1.00 | Range checked for each record (index map: 0x8080) |
| 3 | DEPENDENT SUBQUERY | reports | NULL | ALL | pay_off_date_index,nationality_index | NULL | NULL | NULL | 118923 | 10.00 | Range checked for each record (index map: 0x8080) |
| 2 | DEPENDENT SUBQUERY | reports | NULL | ref | create_date_index,nationality_index | nationality_index | 5 | func | 720 | 2.10 | Using where |
更新二
正如下面评论者所指出的,这不是 GROUP BY 的犹太教用法。我不确定它到底是如何工作的,但它确实需要每个子查询的 WHERE 子句中的 nationality = groupValue 表达式,这似乎导致子查询中的聚合在主查询中为 GROUP BY 分组(欢迎解释!)。
我不建议以这种方式使用子查询。就我而言,查询通常有效,但是一旦您分组的不同值的数量变高,它们似乎也会在性能方面下降(或进入无限循环?)。
相反,请使用答案中完美展示的 CASE。
【问题讨论】:
-
显示
show create table reports和explain SELECT ...的输出,用于国籍查询和执行良好的查询 -
恐怕这个查询是无意义的。在没有任何聚合函数(在 GROUP BY 子句级别)的情况下,GROUP BY 子句永远不合适。
-
@Strawberry 子查询作为聚合操作,请参阅子查询中引用的 groupValue 变量。
-
是的。你可以有一个没有 GROUP BY 子句的聚合函数。没有聚合函数就不能有 GROUP BY 子句;就像这里的情况一样。
-
它可以正确执行并且是一个无意义或不正确的查询
标签: mysql