【发布时间】:2021-08-19 19:28:19
【问题描述】:
在包含大量 blob 的数据库上运行 pg_dump,当执行此查询时 PostgreSQL 崩溃:
pg_dump: reading large objects
pg_dump: error: query failed: SSL SYSCALL error: EOF detected
pg_dump: error: query was: SELECT l.oid, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl, NULL AS initlomacl, NULL AS initrlomacl FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid = pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND pip.objsubid = 0)
我已经对查询进行了实验,lomacl 和 rlomacl 的两个 array_agg() 列似乎是罪魁祸首。
这是 AWS Aurora PostgreSQL 11:
SELECT version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit
日志:
2021-08-19 19:47:46 UTC::@:[46753]:LOG: server process (PID 21837) was terminated by signal 9: Killed
2021-08-19 19:47:46 UTC::@:[46753]:DETAIL: Failed process was running: SELECT l.oid, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl, NULL AS initlomacl, NULL AS initrlomacl FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid = pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND pip.objsubid = 0)
2021-08-19 19:47:46 UTC::@:[46753]:LOG: terminating any other active server processes
2021-08-19 19:47:46 UTC::@:[46753]:FATAL: Can't handle storage runtime process crash
2021-08-19 19:47:46 UTC::@:[46753]:LOG: database system is shut down
任何故障排除步骤/建议?
【问题讨论】:
-
您的确切 Postgres 版本是什么(
select version()会告诉您)以及您使用的是哪个操作系统? -
signal 9: Killed似乎表明它严重崩溃了? -
可能被 OOM 杀手杀死。查看 /var/log/kern.log
-
尝试
pg_dump和--no-blobs以不要转储它们,只是为了确认它们是问题所在。 -
是的,做了一个没有斑点的转储,也没有问题。
标签: postgresql