【问题标题】:Long running mysql "cleaning up" transaction长时间运行的mysql“清理”事务
【发布时间】:2015-08-19 14:03:16
【问题描述】:

我一直在尝试调试 MySQL (AWS RDS) v5.6.19a 中的“超出锁定等待超时”错误,当我尝试使用主 ID 选择一行进行更新时偶尔会抛出该错误,即:

SELECT primary_id FROM tbl_widgets WHERE primary_id = 5 FOR UPDATE

经过几个小时的调试,我排除了我的应用程序的另一部分“直接”锁定同一行(这是明显的罪魁祸首)。因此,我开始深入挖掘 mysql 锁定的兔子洞,并注意到引发的“超过锁定等待超时”错误与以下相关性:

SHOW ENGINE INNODB STATUS;

cleaning up 状态下似乎有一个长时间运行的 TRANSACTION,它正在锁定缓慢增加的行数长达约 10 分钟,这里是来自 10 个手动 INNODB 的该事务的相关行状态查询:

2015-08-19 13:29:04
---TRANSACTION 25861246681, ACTIVE 158 sec
10 lock struct(s), heap size 1184, 21 row lock(s), undo log entries 20
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7146839061 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:29:42
---TRANSACTION 25861246681, ACTIVE 196 sec
13 lock struct(s), heap size 2936, 28 row lock(s), undo log entries 27
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147149416 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:10
---TRANSACTION 25861246681, ACTIVE 224 sec
13 lock struct(s), heap size 2936, 31 row lock(s), undo log entries 30
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147321023 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:41
---TRANSACTION 25861246681, ACTIVE 255 sec
13 lock struct(s), heap size 2936, 35 row lock(s), undo log entries 34
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147511090 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:12
---TRANSACTION 25861246681, ACTIVE 286 sec
15 lock struct(s), heap size 2936, 38 row lock(s), undo log entries 37
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147604774 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:30
---TRANSACTION 25861246681, ACTIVE 304 sec
21 lock struct(s), heap size 2936, 42 row lock(s), undo log entries 39
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147789789 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:57
---TRANSACTION 25861246681, ACTIVE 331 sec
21 lock struct(s), heap size 2936, 46 row lock(s), undo log entries 43
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147837536 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:32:28
---TRANSACTION 25861246681, ACTIVE 362 sec
22 lock struct(s), heap size 2936, 51 row lock(s), undo log entries 48
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147905807 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:16
---TRANSACTION 25861246681, ACTIVE 410 sec
23 lock struct(s), heap size 2936, 58 row lock(s), undo log entries 55
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148317478 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:49
---TRANSACTION 25861246681, ACTIVE 443 sec
24 lock struct(s), heap size 2936, 64 row lock(s), undo log entries 61
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148471519 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

我看到了以下博客文章 (http://databaseblog.myname.nl/2014/10/when-your-query-is-blocked-but-there-is_26.html),它提供了一个潜在的解决方案,可以帮助确定这个长期运行的事务中发生了什么,特别是设置:

set GLOBAL innodb_status_output_locks=ON;

很遗憾,由于权限受限,无法在 RDS 上执行此操作。

我想请求一些调试帮助,以了解如何解决这个清理事务中发生的情况,以及如何一起避免这一切。

编辑添加:MySQL 实例的平均 CPU 使用率为 20%

【问题讨论】:

    标签: mysql amazon-web-services transactions locking amazon-rds


    【解决方案1】:

    在我的情况下,我的“清理”锁在我杀死运行调试器的 JVM 后消失了。显然,它们是我在清理事务之前中断的早期调试运行的残余。

    这可能对您没有帮助,但这里有一些在这种情况下调试的建议。

    1. 您确实有一条信息,即锁的数量。使用断点,您可以在不同的地方暂停应用程序,以尝试准确定位计数何时上升。 (或者可能只有在日志中出现某些错误后才会上升;或者只有在某些用户操作后才会上升。)

    2. 1234563您也许可以将它散布在您的代码周围,可能还带有额外的日志记录,以查明阻塞开始的位置。
    3. 考虑针对本地安装的 MySQL 数据库临时调试应用程序。这可以安装在本地服务器上,也可以安装在您的开发机器上。这可能设置起来很麻烦,但可以带来许多其他好处(例如,用于 db 脚本的测试台;能够在离线时在笔记本电脑上工作。)

    所有这些都假定锁定是由您自己的代码引起的,而不是由其他工作引起的。 (在您的日志中,清理用户是“mfuser”。)这使您可以按需重现问题。

    【讨论】:

    • 我想起了另一个调试技巧;作为第二项插入。我还添加了最后一段,说明了我所做的假设。
    猜你喜欢
    • 1970-01-01
    • 2018-05-28
    • 2023-04-05
    • 1970-01-01
    • 2011-02-26
    • 1970-01-01
    • 2011-07-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多