【发布时间】:2014-07-16 06:30:49
【问题描述】:
我正在尝试从具有主/聚集索引的 InnoDB MySQL 表中删除大量行(>1000 万,约占表中所有记录的 1/3)。
字段id 是主/聚集索引,它是连续的,没有间隙。至少应该是,我不删除中间的记录。但是有可能某些插入查询失败并且 innodb 分配了一些未使用的 id(我不确定它是否属实)。我只删除不再需要的旧记录。表包含 varchar 列,因此行没有固定大小。
首先我的尝试:
DELETE FROM `table` WHERE id<=10000000
大 io 操作失败。似乎是mysql杀死了这个查询并回滚了所有更改。查询执行的大约时间为 6 小时,回滚大约相同。 我的大错误是事务日志大小是标准的 5mb,请注意。必须放大。
第二次尝试删除 10 000 条记录,例如:
DELETE FROM `table` WHERE id<=10000;
COMMIT;
DELETE FROM `table` WHERE id<=20000;
COMMIT;
等等。从一开始的查询执行时间约为 10 秒(在笔记本电脑上)。执行时间逐渐增加,执行 6 小时后,每个查询大约需要 300 秒。
第三次尝试进行平均执行时间少于 1 秒的查询。 php代码:
protected function deleteById($table, $id) {
$MinId = $this->getMinFromTable($table, 'id');
$PackDeleteCount= $this->PackDeleteCount;
$timerTotal = new Timer();
$delCountTotal = 0;
$delCountReport = 0;
$delInfo = array();
$PackMinTime = round($this->PackDeleteTime - $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$PackMaxTime = round($this->PackDeleteTime + $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$this->LogString(sprintf('Del `%s`, PackMinTime: %s; PackMaxTime: %s', $table, $PackMinTime, $PackMaxTime));
for (; $MinId < $id;) {
$MinId += $PackDeleteCount;
$delCountReport += $PackDeleteCount;
if ($MinId > $id) {
$MinId = $id;
}
$timer = new Timer();
$sql = sprintf('DELETE FROM `%s` WHERE id<=%s', $table, $MinId);
$this->s->Query($sql, __FILE__, __LINE__);
$delCount = $this->s->AffectedRows();
$this->s->CommitT();
$RoundTime = round($timer->end(), 3);
$delInfo[] = array(
'time' => $RoundTime,
'rows' => $PackDeleteCount,
);
$delCountTotal += $delCount;
if ($delCountReport >= $this->PackDeleteReport) {
$delCountReport = 0;
$delSqlCount = count($delInfo);
$EvTime = 0;
$PackTime = 0;
$EvCount = 0;
$PackCount = 0;
foreach ($delInfo as $v) {
$PackTime += $v['time'];
$PackCount += $v['rows'];
}
$EvTime = round($PackTime/$delSqlCount, 2);
$PackTime = round($PackTime, 2);
$EvCount = round($PackCount/$delSqlCount);
$TotalTime = $this->readableTime(intval($timerTotal->end()));
$this->LogString(sprintf('Del `%s`, Sql query count: %d; Time: %s; Count: %d; Evarage Time %s; Evarage count per delete: %d; Del total: %s; Del Total Time: %s; id <= %s', $table, $delSqlCount, $PackTime, $PackCount, $EvTime, $EvCount, $delCountTotal, $TotalTime, $MinId));
$delInfo = array();
}
$PackDeleteCountOld = $PackDeleteCount;
if ($RoundTime < $PackMinTime) {
$PackDeleteCount = intval($PackDeleteCount + $PackDeleteCount*(1 - $RoundTime/$this->PackDeleteTime));
} elseif ($RoundTime > $PackMaxTime) {
$PackDeleteCount = intval($PackDeleteCount - $PackDeleteCount*(1 - $this->PackDeleteTime/$RoundTime));
}
//$this->LogString(sprintf('Del `%s`, round time: %s; row count old: %d; row count new: %d', $table, $RoundTime, $PackDeleteCountOld, $PackDeleteCount));
}
$this->LogString(sprintf('Finished del `%s`: time: %s', $table, round($timerTotal->end(), 2)));
}
它有一些依赖关系,但它们是不言自明的,可以很容易地用标准改变。 我将仅解释此处使用的输入变量:
$table - target table, where rows needs to be deleted
$id - all records up to this id should be deleted
$MinId - Minimal id in the target table
$this->PackDeleteCount - Initial count of records, to start from. Then it recalculates row count to be deleted each new query.
$this->PackDeleteTime - desirable query execution time in average. I used 0.5
$this->PackDeleteDiv - acceptable deviation from $this->PackDeleteTime. In percentage. I used 0.3
$this->PackDeleteReport - Each N records should print statistic information about deleting
此变体性能稳定。
性能不佳的原因是数据库引擎必须物理地重新处理受影响的叶子中的所有记录数据。这是我的理解,如果您的知识更深入,欢迎您添加对实际情况的详细描述。也许它会提供一些新的想法。
问题:是否可以计算叶子上的行分布并删除整个叶子甚至分支,因此数据库引擎不必处理数据? 或许您对本案例的性能优化还有其他想法。
【问题讨论】:
-
也许尝试在删除之前删除索引,然后重新定义它们
标签: php mysql innodb bigdata clustered-index