【问题标题】:GROUP BY is really slow when joining the temporary table in MySQLMySQL中加入临时表时GROUP BY真的很慢
【发布时间】:2016-12-22 04:03:37
【问题描述】:

表结构简单:

CREATE TABLE `trade` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `account` int(11) NOT NULL,
  `date` date NOT NULL,
  `amount` double DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `all_idx` (`date`,`account`,`amount`) USING BTREE
) ENGINE=InnoDB;

这张表大约有5M条记录。

要求是:

  • 给出日期范围
  • 查找日期范围内每个账户的FIRST MAXIMUM交易金额
  • 找出MINIMUM交易量AFTER
  • 计算这两个数量之间的DIFFERENCE(可能为0)

这是我编写 SQL 的方式:

-- step 1: find the max amount, took about 0.6s
select account, max(amount) max_amount
from trade
where date between '20160101' and '20161220'
group by account;

-- step 2: find the first date, took about 1s
drop temporary table if exists tmp_max_amount;
create temporary table tmp_max_amount
select t1.account, min(t1.date) date, t1.amount
from trade t1, (
    select account, max(amount) max_amount
    from trade
    where date between '20160101' and '20161220'
    group by account
) t2
where t1.account = t2.account and t1.amount = t2.amount
group by t1.account, t1.amount;

-- step 3: find the min amount, took about 50s
drop temporary table if exists tmp_min_amount;
create temporary table tmp_min_amount
select t1.account, min(t1.amount) min_amount
from trade t1, tmp_max_amount t2
where t1.account = t2.account and t1.date >= t2.date
group by t1.account;

-- step 4: calculate the difference, took about 0.8s
select x.account, (max_amount - min_amount) diff
from tmp_max_amount x, tmp_min_amount n
where x.account = n.account;

第 3 步中的 SQL 耗时约 50 秒。有什么办法可以提高速度吗?

样本数据:

    id | account | date     | amount
 ------|---------|----------|---------
     1 |    1000 | 20151001 |   1000 <- not in range
     2 |    3000 | 20151002 |    100 <- not in range
     3 |    1000 | 20160105 |    800 <- max of 1000
     4 |    2000 | 20160110 |    200 <- max of 2000
     5 |    2000 | 20160115 |    100 <- min of 2000
     6 |    3000 | 20160201 |   1200
....
 10000 |    2000 | 20161210 |    200 <- no the first max
 10001 |    3000 | 20161210 |    500
 10002 |    3000 | 20161212 |   1500 <- max & min of 3000
 10003 |    1000 | 20161213 |    300 <- min of 1000

预期结果:

account | diff
--------|------
   1000 |  500 <- (800 - 300)
   2000 |  100 <- (200 - 100)
   3000 |    0 <- (1500 - 1500)
...

【问题讨论】:

  • 或许可以避免使用临时表!你能发布一些示例数据和预期的输出吗?
  • @e4c5 感谢您的回复,我刚刚按要求添加了示例数据。
  • 尝试在 tmp_max_amount 上添加索引,就像对任何其他表一样。此外,请务必在第三个查询中使用 EXPLAIN 从交易表和临时表中检查您的索引使用情况。

标签: mysql performance group-by


【解决方案1】:

请使用JOIN...ON 语法。

第二步需要INDEX(account, amount)

第 3 步需要一个在第 2 步中最容易创建的索引

create temporary table tmp_max_amount
    ( INDEX(account, date) )   -- This was added
SELECT ..;

(这可能不是最佳的,但应该会有所帮助。)

【讨论】:

  • 临时表上的索引有帮助!
猜你喜欢
  • 1970-01-01
  • 2021-12-26
  • 1970-01-01
  • 2014-04-30
  • 1970-01-01
  • 2019-09-12
  • 1970-01-01
  • 2011-02-20
  • 1970-01-01
相关资源
最近更新 更多