简单更新查询的postgres死锁答案

【问题标题】：deadlock in postgres on simple update query简单更新查询的postgres死锁
【发布时间】：2013-08-18 09:01:37
【问题描述】：

我正在使用 postgres 9.1 并在过度执行简单的更新方法时出现死锁异常。

根据日志，死锁是由于同时执行两个相同的更新而发生的。

更新 public.vm_action_info 设置 last_on_demand_task_id=$1, version=version+1

两个相同的简单更新如何相互死锁？

我在日志中遇到的错误

2013-08-18 11:00:24 IDT HINT:  See server log for query details.
2013-08-18 11:00:24 IDT STATEMENT:  update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2
2013-08-18 11:00:25 IDT ERROR:  deadlock detected
2013-08-18 11:00:25 IDT DETAIL:  Process 31533 waits for ShareLock on transaction 4228275; blocked by process 31530.
        Process 31530 waits for ExclusiveLock on tuple (0,68) of relation 70337 of database 69205; blocked by process 31533.
        Process 31533: update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2
        Process 31530: update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2
2013-08-18 11:00:25 IDT HINT:  See server log for query details.
2013-08-18 11:00:25 IDT STATEMENT:  update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2
2013-08-18 11:00:25 IDT ERROR:  deadlock detected
2013-08-18 11:00:25 IDT DETAIL:  Process 31530 waits for ExclusiveLock on tuple (0,68) of relation 70337 of database 69205; blocked by process 31876.
        Process 31876 waits for ShareLock on transaction 4228275; blocked by process 31530.
        Process 31530: update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2
        Process 31876: update public.vm_action_info set last_on_demand_task_id=$1, version=version+1 where id=$2

架构是：

CREATE TABLE vm_action_info(
  id integer NOT NULL,
  version integer NOT NULL DEFAULT 0,
  vm_info_id integer NOT NULL,
 last_exit_code integer,
  bundle_action_id integer NOT NULL,
  last_result_change_time numeric NOT NULL,
  last_completed_vm_task_id integer,
  last_on_demand_task_id bigint,
  CONSTRAINT vm_action_info_pkey PRIMARY KEY (id ),
  CONSTRAINT vm_action_info_bundle_action_id_fk FOREIGN KEY (bundle_action_id)
      REFERENCES bundle_action (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT vm_discovery_info_fk FOREIGN KEY (vm_info_id)
      REFERENCES vm_info (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT vm_task_last_on_demand_task_fk FOREIGN KEY (last_on_demand_task_id)
      REFERENCES vm_task (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,

  CONSTRAINT vm_task_last_task_fk FOREIGN KEY (last_completed_vm_task_id)
      REFERENCES vm_task (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (OIDS=FALSE);

ALTER TABLE vm_action_info
  OWNER TO vadm;

-- Index: vm_action_info_vm_info_id_index

-- DROP INDEX vm_action_info_vm_info_id_index;

CREATE INDEX vm_action_info_vm_info_id_index
  ON vm_action_info
  USING btree (vm_info_id );

CREATE TABLE vm_task
(
  id integer NOT NULL,
  version integer NOT NULL DEFAULT 0,
  vm_action_info_id integer NOT NULL,
  creation_time numeric NOT NULL DEFAULT 0,
  task_state text NOT NULL,
  triggered_by text NOT NULL,
  bundle_param_revision bigint NOT NULL DEFAULT 0,
  execution_time bigint,
  expiration_time bigint,
  username text,
  completion_time bigint,
  completion_status text,
  completion_error text,
  CONSTRAINT vm_task_pkey PRIMARY KEY (id ),
  CONSTRAINT vm_action_info_fk FOREIGN KEY (vm_action_info_id)
  REFERENCES vm_action_info (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)
 WITH (
OIDS=FALSE
);
ALTER TABLE vm_task
  OWNER TO vadm;

-- Index: vm_task_creation_time_index

-- DROP INDEX vm_task_creation_time_index     ;

CREATE INDEX vm_task_creation_time_index
  ON vm_task
  USING btree
 (creation_time );

【问题讨论】：

它们没那么简单。字段上有一个 FK 常量（这导致需要更新索引）也许尝试 deferable 最初 deferred ？（不要认为这有什么不同）
我不喜欢更改 FK 约束，因为我不完全确定它将如何影响 genral 中的系统。在代码中添加一个限制，即在给定时间只能执行单个查询可以解决问题，但我不明白查询如何导致自身死锁。所有的锁都是以相同的顺序获取的，所以绝对不应该发生。 postgres 是否有可能检测到实际上并不存在的死锁？
你写了all lock are acquired at the same order，是不是意味着它不仅仅是一个简单的更新，而是整个事务包含的锁定命令比这个单一的更新要多？如果是，请向我们展示整个代码。
事务执行以下操作：1。将新条目添加到任务表 2. 更新 vm_action_info 中 vm_task 表中的相应条目。但是日志中的错误只指定了vm_action_info
两个相同的简单更新如何相互死锁？：见postgres deadlock without explicit locking

标签： postgresql postgresql-9.1

【解决方案1】：

可能只是您的系统异常繁忙。您说您只在“过度执行”查询时看到过这种情况。

目前的情况是这样的：

pid=31530 wants to lock tuple (0,68) on rel 70337 (vm_action_info I suspect) for update
    it is waiting behind pid=31533, pid=31876
pid=31533 is waiting behind transaction 4228275
pid=31876 is waiting behind transaction 4228275

所以 - 我们似乎有四个事务同时更新这一行。事务 4228275 尚未提交或回滚，并且正在阻止其他事务。其中两个一直在等待deadlock_timeout 秒，否则我们不会看到超时。超时到期，死锁检测器查看，看到一堆相互交织的事务并取消其中一个。严格来说可能不是死锁，但我不确定检测器是否足够聪明，可以解决这个问题。

尝试以下之一：

降低更新率
获得更快的服务器
增加 deadlock_timeout

可能#3 是最简单的 :-) 可能也需要设置 log_lock_waits 以便您可以查看系统是否/何时处于这种压力之下。

【讨论】：

更新速度较慢时不会发生这种情况。关于＃3建议：根据postgres文档， deadloak_timeout 参数仅定义执行死锁检测机制之前的时间量，并且不会影响是否声明了死锁情况。来自文档：“他是在检查是否存在死锁条件之前等待锁的时间量，以毫秒为单位。死锁检查相对昂贵，因此服务器不会每次都运行它等待锁”
升级到 9.2 版也可能会有所帮助，它在锁定行为和整体速度方面进行了多项改进。
条件是死锁，除非其中一个事务被中止。

【解决方案2】：

我的猜测是问题的根源是您的表中的循环外键引用。

表 vm_action_info
==> 外键 (last_completed_vm_task_id) 参考 vm_task (id)

表 vm_task
==> 外键 (vm_action_info_id) 参考 vm_action_info (id)

交易包括两个步骤：

向任务表添加新条目

更新 vm_task 表中 vm_action_info 中的相应条目。

当两个事务要同时更新vm_action_info 表中的同一记录时，这将以死锁结束。

看简单的测试用例：

CREATE TABLE vm_task
(
  id integer NOT NULL,
  version integer NOT NULL DEFAULT 0,
  vm_action_info_id integer NOT NULL,
  CONSTRAINT vm_task_pkey PRIMARY KEY (id )
)
 WITH ( OIDS=FALSE );

 insert into vm_task values 
 ( 0, 0, 0 ), ( 1, 1, 1 ), ( 2, 2, 2 );

CREATE TABLE vm_action_info(
  id integer NOT NULL,
  version integer NOT NULL DEFAULT 0,
  last_on_demand_task_id bigint,
  CONSTRAINT vm_action_info_pkey PRIMARY KEY (id )
)
WITH (OIDS=FALSE);
insert into vm_action_info values 
 ( 0, 0, 0 ), ( 1, 1, 1 ), ( 2, 2, 2 );

alter table vm_task
add  CONSTRAINT vm_action_info_fk FOREIGN KEY (vm_action_info_id)
  REFERENCES vm_action_info (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
  ;
Alter table vm_action_info
 add CONSTRAINT vm_task_last_on_demand_task_fk FOREIGN KEY (last_on_demand_task_id)
      REFERENCES vm_task (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
      ;

在会话 1 中，我们向 vm_task 添加一条记录，该记录引用 vm_action_info 中的 id=2

session1=> begin;
BEGIN
session1=> insert into vm_task values( 100, 0, 2 );
INSERT 0 1
session1=>

同时在会话 2 中另一个事务开始：

session2=> begin;
BEGIN
session2=> insert into vm_task values( 200, 0, 2 );
INSERT 0 1
session2=>

然后第一个事务执行更新：

session1=> update vm_action_info set last_on_demand_task_id=100, version=version+1
session1=> where id=2;

但此命令挂起并等待锁定.....

然后第二个会话执行更新........

session2=> update vm_action_info set last_on_demand_task_id=200, version=version+1 where id=2;
BŁĄD:  wykryto zakleszczenie
SZCZEGÓŁY:  Proces 9384 oczekuje na ExclusiveLock na krotka (0,5) relacji 33083 bazy danych 16393; zablokowany przez 380
8.
Proces 3808 oczekuje na ShareLock na transakcja 976; zablokowany przez 9384.
PODPOWIEDŹ:  Przejrzyj dziennik serwera by znaleźć szczegóły zapytania.
session2=>

检测到死锁!!!

这是因为由于外键引用，两个 INSERT 到 vm_task 都会在 vm_action_info 表中的行 id=2 上放置一个共享锁。然后第一个更新尝试在该行上放置写锁并挂起，因为该行被另一个（第二个）事务锁定。然后第二次更新尝试将同一条记录锁定为写入模式，但它被第一个事务锁定为共享模式。这会导致死锁。

我认为如果您在 vm_action_info 中记录写入锁可以避免这种情况，整个事务必须包含 5 个步骤：

 begin;
 select * from vm_action_info where id=2 for update;
 insert into vm_task values( 100, 0, 2 );
 update vm_action_info set last_on_demand_task_id=100, 
         version=version+1 where id=2;
 commit;

【讨论】：