【问题标题】:MySql GROUP BY using filesort - query optimizationMySql GROUP BY 使用文件排序 - 查询优化
【发布时间】:2017-09-09 12:08:26
【问题描述】:

我有一张这样的桌子:

CREATE TABLE `purchase` (
  `fact_purchase_id` binary(16) NOT NULL,
  `purchase_id` int(10) unsigned NOT NULL,
  `purchase_id_primary` int(10) unsigned DEFAULT NULL,
  `person_id` int(10) unsigned NOT NULL,
  `person_id_owner` int(10) unsigned NOT NULL,
  `service_id` int(10) unsigned NOT NULL,
  `fact_count` int(10) unsigned NOT NULL DEFAULT '0',
  `fact_type` tinyint(3) unsigned NOT NULL,
  `date_fact` date NOT NULL,
  `purchase_name` varchar(255) DEFAULT NULL,
  `activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `renew_price` decimal(7,2) unsigned DEFAULT '0.00',
  `renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `activation_cost` decimal(7,2) unsigned DEFAULT '0.00',
  `activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `renew_cost` decimal(7,2) unsigned DEFAULT '0.00',
  `renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`fact_purchase_id`),
  KEY `purchase_id_idx` (`purchase_id`),
  KEY `person_id_idx` (`person_id`),
  KEY `person_id_owner_idx` (`person_id_owner`),
  KEY `service_id_idx` (`service_id`),
  KEY `fact_type_idx` (`fact_type`),
  KEY `renew_price_idx` (`renew_price`),
  KEY `renew_cost_idx` (`renew_cost`),
  KEY `renew_price_year_idx` (`renew_price_year`),
  KEY `renew_cost_year_idx` (`renew_cost_year`),
  KEY `date_created_idx` (`date_created`),
  KEY `purchase_id_primary_idx` (`purchase_id_primary`),
  KEY `fact_count` (`fact_count`),
  KEY `renew_price_year_total_idx` (`renew_price_total`),
  KEY `renew_cost_year_total_idx` (`renew_cost_total`),
  KEY `date_fact` (`date_fact`) USING BTREE,
  CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

我正在启动这个查询:

SELECT 
    purchase.date_fact,
    UNIX_TIMESTAMP(purchase.date_fact),
    COUNT(DISTINCT purchase.purchase_id) AS Num
FROM
    purchase
WHERE
    purchase.date_fact >= '2017-01-01'
    AND purchase.date_fact <= '2017-01-31'
    AND purchase.fact_type = 3
    AND purchase.purchase_id_primary IS NULL
GROUP BY purchase.date_fact

该表总共包含 5.629.670 条记录,并在查询中运行 EXPLAIN 我得到以下结果:

  • rows = 2.814.835
  • possible_keys = fact_type_idx,purchase_id_primary_idx,date_fact
  • key = fact_type_idx
  • key_len = 1
  • ref = const
  • filtered = 25.00
  • Extra = Using index condition;Using where;Using filesort

执行查询需要 30-35 秒。等待时间太长了。

问题是GROUP BY 导致应用文件排序。 ORDER BY NULL 应用于查询不会改变任何内容

我可以使用覆盖索引,但我只需要在此查询中使用 date_fact:我可以使用哪些字段?

如何避免GROUP BY 上的文件排序?如何优化查询以使其更快?

我将此表用于统计目的 (OLAP)。也许有更好的 DBMS 用于此目的?

我正在运行 MySql Server 5.7.17。

谢谢

【问题讨论】:

    标签: mysql sql group-by query-performance filesort


    【解决方案1】:

    对于这个查询:

    SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact),
           COUNT(DISTINCT p.purchase_id) AS Num
    FROM purchase p
    WHERE p.date_fact >= '2017-01-01' AND
          p.date_fact <= '2017-01-31' AND
          p.fact_type = 3 AND
          p.purchase_id_primary IS NULL
    GROUP BY p.date_fact;
    

    我建议在(fact_type, purchase_id_primary, date_fact, purchase_id) 上使用复合索引。前两个键在WHERE 中具有相等条件。第三个有不等式,第四个允许索引“覆盖”查询(查询中的所有列都在索引中)。

    我还要补充一点:如果您不需要COUNT(DISTINCT),请不要使用它。 purchase_idpurchase 中可能已经是唯一的。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-09-22
      • 2020-05-16
      • 2019-08-16
      • 1970-01-01
      • 1970-01-01
      • 2020-03-18
      相关资源
      最近更新 更多