【发布时间】:2021-08-03 17:43:49
【问题描述】:
我有一张大桌子,我想把它变小。它有 ~2.3 亿 行。
两列都有索引。结构是:
+--------------+------------+
| id_my_value | id_ref |
+--------------+------------+
| YYYY | XXXX |
+--------------+------------+
我必须删除具有特定“id_ref”值的值。我尝试了以下方法:
sql = f"SELECT id_ref FROM REFS"
cursor.execute(sql)
refs = cursor.fetchall()
limit = 1000
for current in refs:
id = current["id_ref"]
sql = f"DELETE FROM MY_VALUES WHERE id_ref = {id} LIMIT {limit}"
while True:
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
无论我设置为“限制”的值如何,查询都非常慢:
DELETE FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10;
我也尝试过相反的方法。选择与特定 id_ref 关联的 id_value,然后删除:
SELECT id_value FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10
DELETE FROM MY_VALUES WHERE id_value = YYYY
这是我的解释。
EXPLAIN DELETE FROM MY_VALUES WHERE id_ref = YYYY LIMIT 1000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+-------+----------+----------+-------------+
| 1 | DELETE | MY_VALUES | NULL | range | id_ref | id_ref | 5 | const | 20647922 | 100.00 | Using where |
它确实使用了正确的索引。
让这个操作在他的服务器上运行几天不会有任何问题。
- 进行这种“清洁”的正确方法是什么?
编辑
这是 SHOW CREATE TABLE MY_VALUES 的输出
MY_VALUES | CREATE TABLE `MY_VALUES` (
`id_my_value` int NOT NULL AUTO_INCREMENT,
`id_document` int NOT NULL,
`id_ref` int DEFAULT NULL,
`value` mediumtext CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`weigth` int DEFAULT NULL,
`id_analysis` int DEFAULT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`domain` varchar(64) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`filetype` varchar(16) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`id_domain` int DEFAULT NULL,
`id_city` int DEFAULT NULL,
`city_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`is_hidden` tinyint NOT NULL DEFAULT '0',
`id_company` int DEFAULT NULL,
`is_hidden_by_user` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_my_value`),
KEY `id_ref` (`id_ref`),
KEY `id_document` (`id_document`),
KEY `id_analysis` (`id_analysis`),
KEY `weigth` (`weigth`),
KEY `id_domain` (`id_domain`),
KEY `id_city` (`id_city`),
KEY `id_company` (`id_company`),
KEY `value` (`value`(15))
更新
我只是试图删除一个寄存器:
DELETE FROM MY_VALUES WHERE id_MY_VALUE = 8
该操作需要“永远”。为了防止超时,我关注了this SO question,所以我设置了:
show variables like 'innodb_lock_wait_timeout';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| innodb_lock_wait_timeout | 100000 |
+--------------------------+--------+
【问题讨论】:
-
您的查询不会运行,因为 VALUES 是一个保留字,如果您不关心它是否会持续数天,那么当删除查询运行速度极慢时,为什么会打扰您,无论这意味着什么
-
向我们展示
SHOW CREATE TABLE tablename的输出。请edit您的问题。 -
@nbk,我已经编辑了这个问题以避免这种混淆。该列未命名为“VALUES”。我只是反映了它。
-
好,这让我很恼火,你的问题看看,如果你的 qiery 使用 id_ref 上的索引,请使用 EYPLAIN 看看它是否被使用
-
解释删除 id_ref = YYYY LIMIT 1000 的 MY_VALUES; |编号 |选择类型 |表|隔断 |类型 |可能的键 |关键 | key_len |参考 |行 |过滤 |额外 | +----+-------------+------------+------------+---- ---+---------------+------------+---------+------- +----------+----------+-------------+ | 1 |删除 | MY_VALUES |空 |范围 | id_ref | id_ref | 5 |常量 | 20647922 | 100.00 |使用位置 |它确实使用了正确的索引
标签: python mysql query-optimization