MySQL插入子选择慢答案

【问题标题】：MySQL insert with sub select slowMySQL插入子选择慢
【发布时间】：2016-06-17 10:47:50
【问题描述】：

我们正在使用 Python 和 LOAD DATA INFILE 将数据从 CSV 加载到我们的暂存数据库中。从登台开始，我们有 sql 脚本将数据移动到我们的实际生产数据库中。

与从暂存中选择行并将它们插入生产相比，LOAD DATA INFILE 速度快如闪电。

我们在 5.7 上，使用 InnoDB，我们应用了以下配置来优化我们的插入：

将 innodb_autoinc_lock_mode 设置为 2
将 innodb 缓冲池大小设置为内存的一半 (16GB)
将日志缓冲区大小设置为 4GB
我们正在使用 TRANSACTIONS
使用 SET autocommit=0;

与 LOAD DATA INFILE 相比，从一个表到另一个表的插入仍然明显慢。

当我查看 IO 写入时，加载数据 infile 时最高可达 30 MB/s，而正常插入时最高为 500KB/秒。

我们有什么方法可以提高这种性能，还是我们需要完全重新考虑我们的方法。我可以考虑使用 OUTFILE 进行子查询并使用 INFILE 将其加载回来，但这听起来不是正确的方法。

还有声明：

INSERT INTO documentkey (documentClassCode,dId,fileTypeCode,internet,pathId,creationTime,signature,CSVimportId) 
SELECT case when csv.`Document Class` is null
                then (select classCode from mydb.class where classDesc = 'Empty'
                    And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'C' and EntityLookedup = 'documentkey')
                    )
                else (select classCode from mydb.class where    classDesc = csv.`Document Class`
                    And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'C' and EntityLookedup = 'documentkey')
                    )
        end,
        csv.`dId`,
        (select typeCode from mydb.type
                Where typeDesc = csv.`File Type`
                And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'T' and EntityLookedup = 'documentkey')
        ),
        case    when csv.`message ID` is null
                then (select messageIncrId from message where internetdesc = 'Empty')
                else case   when    exists (select internetMessageIncrId from internetMessage where internetdesc = csv.`Internet Message ID`)
                            then    (select internetMessageIncrId from internetMessage where internetdesc = csv.`Internet Message ID`)
                            else    0
                    end
        end,
        case    when exists (select pathId from Path where pathDesc = csv.`path`)
                then    (select pathId from Path where pathDesc = csv.`path`)
                else 0
        end,
        case when csv.`Creation Time` <> '' then STR_TO_DATE(csv.`Creation Time`, '%d/%m/%Y  %H:%i:%s') else '2016-06-16 10:00:00' end,
        #STR_TO_DATE(csv.`Creation Time`, '%Y-%m-%d %H:%i:%s'),
        csv.`Signature Hash`,
        1
        #csv.`CSV import id`
FROM `mydb_stage`.`csvDocumentKey` csv
where csv.`dId` is not null and csv.threadId = @thread;

选择查询的一部分只需要几分之一秒。

解释：

'1', 'PRIMARY', 'csv', NULL, 'ALL', NULL, NULL, NULL, NULL, '1', '100.00', 'Using where'
'12', 'DEPENDENT SUBQUERY', 'path', NULL, 'eq_ref', 'pathDesc_UNIQUE', 'pathDesc_UNIQUE', '1026', 'func', '1', '100.00', 'Using where; Using index'
'11', 'DEPENDENT SUBQUERY', 'path', NULL, 'eq_ref', 'pathDesc_UNIQUE', 'pathDesc_UNIQUE', '1026', 'func', '1', '100.00', 'Using where; Using index'
'10', 'SUBQUERY', 'message', NULL, 'const', 'messageDesc_UNIQUE', 'messageDesc_UNIQUE', '2050', 'const', '1', '100.00', 'Using index'
'9', 'DEPENDENT SUBQUERY', 'message', NULL, 'eq_ref', 'messageDesc_UNIQUE', 'messageDesc_UNIQUE', '2050', 'func', '1', '100.00', 'Using where; Using index'
'8', 'DEPENDENT SUBQUERY', 'message', NULL, 'eq_ref', 'messageDesc_UNIQUE', 'messageDesc_UNIQUE', '2050', 'func', '1', '100.00', 'Using where; Using index'
'6', 'DEPENDENT SUBQUERY', 'type', NULL, 'eq_ref', 'typeDesc_UNIQUE', 'typeDesc_UNIQUE', '1026', 'func', '1', '100.00', 'Using index condition; Using where'
'7', 'SUBQUERY', 'Lookup', NULL, 'ref', 'PRIMARY', 'PRIMARY', '6', 'const', '3', '10.00', 'Using where'
'4', 'SUBQUERY', 'class', NULL, 'const', 'classDesc_UNIQUE', 'classDesc_UNIQUE', '1026', 'const', '1', '100.00', NULL
'5', 'SUBQUERY', 'Lookup', NULL, 'ref', 'PRIMARY', 'PRIMARY', '6', 'const', '2', '10.00', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'class', NULL, 'eq_ref', 'classDesc_UNIQUE', 'classDesc_UNIQUE', '1026', 'func', '1', '20.00', 'Using index condition; Using where'
'3', 'SUBQUERY', 'Lookup', NULL, 'ref', 'PRIMARY', 'PRIMARY', '6', 'const', '2', '10.00', 'Using where'

【问题讨论】：

LOAD DATA 速度快的一个原因是它实际上并没有做任何数据库工作，而 INSERT 是。
@TimBiegeleisen 我假设它使用某些对用户透明的配置设置运行，我想你可以实现类似的配置，从而实现 INSERT 的性能。这只是一个如何做的问题。
如何选择和插入数据。你能告诉我们查询吗
@BerndBuffen 我已经添加了声明
@L4zl0w - 谢谢，能否向我们展示 SELECT 语句的解释（不带 INSERT...）：EXPLAIN SELECT case when csv.Document Class is null then (select classCode from mydb.class where classDesc = 'Empty' ....... ; 看起来您从表中选择数据的查询非常慢

标签： mysql performance innodb

【解决方案1】：

您没有提及为什么要改变方法，尤其是在您的主要目标是性能的情况下。
SELECT 不可以像在表中转储文件一样快，并且已经在 MySQL 文档中明确说明
来自insert-speed

从文本文件加载表时，使用 LOAD DATA INFILE。这是通常比使用 INSERT 语句快 20 倍。见部分 14.2.6，“加载数据文件语法”。

【讨论】：

谢谢，我们需要在 12 小时内将 10 亿条记录加载到暂存状态，然后再进入生产阶段。现在将记录放入暂存不是问题，但从暂存到生产需要很长时间。我不明白的是，如果您的 SELECT 语句很快，那么为什么来自 select 的 INSERT 很慢。我想select语句的结果在内存中，应该可以快速插入。但我显然错过了一些东西，因为这不是我正在经历的。
INSERT 很慢，因为 MySQL 符合 ACID 并使用磁盘的 1 个 I/O 来执行插入。在事务中对插入进行分组可以让您的磁盘使用 1 个 I/O，但使用更多的带宽。 LOAD DATA INFILE 避开了 SQL 层，有效地使用了磁盘。您遇到缓慢的原因是因为您的硬盘驱动器。为您的数据库服务器购买已售出的状态驱动器，并注意它们的 IOPS 计数。您不会通过编写最佳代码来解决任何问题 - 这也与硬件有关，您仅靠代码就可以做到这一点。
我理解，但是这个语句比其他更简单的插入语句慢得多。我说的慢了~20倍。此语句仅在磁盘上写入 150-400KB。这不是 IO 瓶颈。
@L4zl0w: INSERT INTO FROM SELECT 是一个事务。因此，除了将数据转储到表中之外，还必须做很多事情。
您插入的行有多分散？可能是要插入 into 的块分散，从磁盘 reads 是瓶颈。