Mysql使用count(distinct)和join时如何优化答案

【问题标题】：How to optimize when Mysql using count(distinct) and joinMysql使用count(distinct)和join时如何优化
【发布时间】：2012-10-20 17:33:38
【问题描述】：

我有两张表，它们的结构如下：

CREATE TABLE  `metaservice`.`user` (
  `id` bigint(18) NOT NULL AUTO_INCREMENT,
  `userId` bigint(18) NOT NULL,
  `name` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `userId` (`userId`) USING BTREE,
  KEY `nameIndex` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

CREATE TABLE  `metaservice`.`tweet` (
  `id` bigint(18) NOT NULL AUTO_INCREMENT,
  `tweetId` bigint(18) NOT NULL,
  `reqId` int(8) NOT NULL DEFAULT '0',
  `postedTime` datetime NOT NULL,
  `body` text NOT NULL,
  `userId` bigint(18) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `FK69A46713BA64537` (`userId`),
  KEY `reqId` (`reqId`),
  CONSTRAINT `FK69A46713BA64537` FOREIGN KEY (`userId`) REFERENCES `user` (`userId`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

我在下面得到了这个 sql 查询：

select
        count(distinct user.name) as c 
    from
        tweet as tweet 
    inner join
        user as user 
            on tweet.userId=user.userId  
            and tweet.reqId in (
                327774,
            215173,
            104302,
            239188,
            317122,
            972632,
            424187,
            644254,
            946792,
            543258)

tweet 表有 6W 条记录而 user 表有 6w+ 条记录时太慢了此查询返回结果：60594 in 10.45sec

【问题讨论】：

问题大概是'Why' + 'Mysql在使用count(distinct)和join时运行缓慢'
您是否检查了两个表中的 ID 列是否有正确的索引？
将 IN 子句从 JOIN 移出到 WHERE 有什么不同吗？

标签： mysql join distinct

【解决方案1】：

我建议你按如下方式使用 EXPLAIN：

EXPLAIN select
        count(distinct user.name) as c 
    from
        tweet as tweet 
    inner join
        user as user 
            on tweet.userId=user.userId  
            and tweet.reqId in (
                327774,
            215173,
            104302,
            239188,
            317122,
            972632,
            424187,
            644254,
            946792,
            543258)

然后分析 EXPLAIN 给你的响应。有关 MySQL 解释的更多信息，请参见以下来源：

MySQL Explain Syntax

Optimize Queries with Explain

Using MySQL Explain

MySQL Explain Reference

查看结果后，您需要决定应该索引哪些列。

【讨论】：

谢谢，我试过了，但看起来还是很慢。我在表用户上创建了索引用户名，在表推文上创建了索引 reqId。但是 mysql 的优化器没有在推文上使用索引。
你对userId做了索引吗？
可能是数据集的大小

【解决方案2】：

试试这个：

select count(*) as c from (
select
       user.name 
    from
        tweet as tweet 
    inner join
        user as user 
            on tweet.userId=user.userId  
            and tweet.reqId in (
                327774,
            215173,
            104302,
            239188,
            317122,
            972632,
            424187,
            644254,
            946792,
            543258)
group by user.name) a

【讨论】：

【解决方案3】：

我建议您为您的密钥编制索引。索引不仅仅用于主键或唯一键。如果您要搜索表中的任何列，您几乎应该始终为它们编制索引。

另请阅读Link

【讨论】：

【解决方案4】：

试试这个 -

SELECT
  COUNT(DISTINCT user.name) AS c
FROM
  tweet AS tweet
INNER JOIN user AS user
  ON tweet.userId = user.userId
INNER JOIN (
    SELECT 327774 AS reqId UNION
    SELECT 215173 UNION
    SELECT 104302 UNION
    SELECT 239188 UNION
    SELECT 317122 UNION
    SELECT 972632 UNION
    SELECT 424187 UNION
    SELECT 644254 UNION
    SELECT 946792 UNION
    SELECT 543258
  ) t
  ON tweet.reqId  = t.reqId;

【讨论】：

谢谢，我试过了，但看起来还是很慢。我在表用户上创建了索引用户名，在表推文上创建了索引 reqId。但是 mysql 的优化器没有在推文上使用索引。