【问题标题】:Why MySQL optimizer doesn't use all columns index?为什么 MySQL 优化器不使用所有列索引?
【发布时间】:2018-08-11 12:35:00
【问题描述】:

Percona MySQL 5.7

表方案:

CREATE TABLE Developer.Rate (
  ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (ID),
  INDEX IDX_TIME (TIME),
  UNIQUE INDEX IDX_UK (BASE, QUOTE, TIME)
)
ENGINE = INNODB
ROW_FORMAT = COMPRESSED;

我尝试在所选时间段之前请求最新数据。优化器使用不完整的唯一键,只有 2 列,每列 3。

如果我以普通方式请求:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1
;

“解释”表示只使用了索引的前 2 列:BASE、QUOTE

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "ref",
        "possible_keys": [
          "IDX_UK",
          "IDX_TIME"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE"
        ],
        "key_length": "22",
        "ref": [
          "const",
          "const"
        ],
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 22983231,
        "filtered": "50.00",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "4596646.20",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "1G"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate`.`BASE` <=> 'EUR') and (`Developer`.`Rate`.`QUOTE` <=> 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

但是如果你强制优化器使用 IDX_UK,MySQL 会使用请求中的所有 3 列:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate FORCE INDEX(IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "range",
        "possible_keys": [
          "IDX_UK"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 15320621,
        "filtered": "100.00",
        "index_condition": "((`Developer`.`Rate`.`BASE` = 'EUR') and (`Developer`.`Rate`.`QUOTE` = 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "3064124.31",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "818M"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ]
      }
    }
  }
}

为什么优化器在没有明确声明索引的情况下不使用所有 3 列?

添加:

我理解对了吗,我应该使用这样的请求吗?

请求示例:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  BASE DESC, QUOTE DESC, TIME DESC
LIMIT 1

如果我理解正确,Explain 的输出不会更好。没有 TIME 的情况下仍然只使用了 2 列

解释输出

{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "10384642.20" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "Rate", "access_type": "ref", "possible_keys": [ "IDX_UK", "IDX_TIME" ], "key": "IDX_UK", "used_key_parts": [ "BASE", "QUOTE" ], "key_length": "22", "ref": [ "const", "const" ], "rows_examined_per_scan": 46734411, "rows_produced_per_join": 23367205, "filtered": "50.00", "index_condition": "((Developer.Rate.BASE <=> 'EUR') and (Developer.Rate.QUOTE <=> 'USD') and (Developer.Rate.TIME <= ((now() - interval 1 month))))", "cost_info": { "read_cost": "1037760.00", "eval_cost": "4673441.10", "prefix_cost": "10384642.20", "data_read_per_join": "1G" }, "used_columns": [ "ID", "TIME", "BASE", "QUOTE", "BID" ] } } } }


添加 2:

我提出了这 4 个请求:

— 1 —


<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

——2——

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';
</code>

——3——

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

——4——

<code>
FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

除了请求3之外,所有请求中session_status的输出都是一样的。在请求3的输出中:Handler_read_prev = 486474; 在所有其他请求的输出中:Handler_read_prev = 0;

添加 3:

我复制了表格,删除了 Id 字段,将 UNIQUE 键提升为 PRIMARY。

方案:

CREATE TABLE Developer.Rate2 (
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (BASE, QUOTE, TIME),
  INDEX IDX_BID_ASK (BID, ASK)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 26
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = COMPRESSED;

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "9673452.20"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate2",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 48023345,
        "rows_produced_per_join": 16006180,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "68783.20",
          "eval_cost": "3201236.12",
          "prefix_cost": "9673452.20",
          "data_read_per_join": "732M"
        },
        "used_columns": [
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate2`.`BASE` = 'EUR') and (`Developer`.`Rate2`.`QUOTE` = 'USD') and (`Developer`.`Rate2`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

现在请求确实有效,Explain 显示所有 3 列都已使用。此变体有效。

【问题讨论】:

    标签: mysql sql database optimization percona


    【解决方案1】:

    去掉ID,没用。将您的 UNIQUE 密钥提升为 PRIMARY。现在,神奇的是,查询会更快,而您提出的问题将变得毫无意义。 (您可能还需要 lorraine 建议的 DESC 技巧。)

    这是另一种比较性能的技术:

    FLUSH STATUS;
    SELECT ...;
    SHOW SESSION STATUS LIKE 'Handler%';
    

    我有兴趣查看SHOW 的输出,无论是否使用DESC 技巧。有/没有你提到的FORCE INDEX

    为什么更快?您的查询使用了二级索引,但它需要bid,它没有被索引“覆盖”。要获得bid,需要在“数据”中向下钻取PRIMARY KEY。通过更改它以便使用 PK,可以避免这种额外的向下钻取。

    【讨论】:

    • 感谢您的建议。我尝试创建一个没有 ID 的表副本并按照建议进行操作。 我很想看看使用和不使用 DESC 技巧的 SHOW 的输出。并且有/没有你提到的 FORCE INDEX。 我在我的帖子中添加了 4 个请求。 Read "Added 2:" session_status 的输出在除请求 3 之外的所有请求中都是相同的。在请求 3 的输出中: Handler_read_prev = 486474;在所有其他请求的输出中:Handler_read_prev = 0;
    • 案例 1 和 2 无趣,因为它们没有 ORDER BY,这(我认为)对于您要查找的内容是强制性的。案例 3 和 4 让我很困惑。需要进一步抓挠。
    • 我尝试复制表,删除 Id 字段,将 UNIQUE 键提升为 PRIMARY。请阅读帖子中“添加3:”中的方案。现在请求确实有效,并且解释显示所有 3 列都已使用。此变体有效。
    • Rick,我创建了索引 IDX_BID_ASK (BID, ASK),预计请求只能使用此索引进行处理。但它不会发生。我正在等待看到字符串“using_index = true”,但它没有发生。你知道为什么吗?据我所知,实际上每个btree索引都包含PRIMARY Key,因此索引IDX_BID_ASK实际上包含(BASE,QUOTE,TIME,BID,ASK)。在这种情况下,索引完全覆盖了请求。那么为什么 MySQL 只使用 PRIMARY Key 呢?
    • @mr_blond - 你看不到“使用索引”的SELECT 是什么?主键与数据聚集在一起,因此如果 PK 用于查询,那么它实际上是“使用索引”,但不会这么说。
    【解决方案2】:

    您描述的行为(引用访问而不是对更多列的范围访问)让我想起了Bug#81341Bug#87613。这些错误分别在 MySQL 5.7.17 和 5.7.21 中得到修复。您使用的是哪个版本?

    【讨论】:

    • 我使用的是 MySQL 5.7.20-19。如果行为相同,我会更新 MySQL 并写信给你
    • 更新到5.7.21,行为相同。
    猜你喜欢
    • 1970-01-01
    • 2017-03-02
    • 2016-06-24
    • 2011-11-09
    • 1970-01-01
    • 2016-10-23
    • 2021-07-13
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多