【问题标题】:setting up postgresql monitoring in zabbix, error在zabbix中设置postgresql监控,报错
【发布时间】:2025-11-29 09:45:01
【问题描述】:

请告诉我,我正在通过 zabbix 4.2 设置 postgresql 监控。我正在使用标准的内置 postgresql 模板。所有数据都正确显示,除了来自 pgsql.query.time.sql 查询的指标,来自 pgsql.query.time.sql 的数据不显示

我尝试手动执行这个请求,我得到一个错误:

psql -qtAX -h "$1" -p "$2" -U "$3" -d "$4" -v tmax=$5 -f "/var/lib/zabbix/postgresql/pgsql.query.time.sql"
psql:/var/lib/zabbix/postgresql/pgsql.query.time.sql:31: ERROR: syntax error (near position: ")")
STRING 22: ...'epoch' FROM (clock_timestamp() - query_start)) > )::integer...

这是来自 /var/lib/zabbix/postgresql/pgsql.query.time.sql 的查询本身

WITH T AS
        (SELECT db.datname,
                        coalesce(T.query_time_max, 0) query_time_max,
                        coalesce(T.tx_time_max, 0) tx_time_max,
                        coalesce(T.mro_time_max, 0) mro_time_max,
                        coalesce(T.query_time_sum, 0) query_time_sum,
                        coalesce(T.tx_time_sum, 0) tx_time_sum,
                        coalesce(T.mro_time_sum, 0) mro_time_sum,
                        coalesce(T.query_slow_count, 0) query_slow_count,
                        coalesce(T.tx_slow_count, 0) tx_slow_count,
                        coalesce(T.mro_slow_count, 0) mro_slow_count
        FROM pg_database db NATURAL
        LEFT JOIN (
                SELECT datname,
                        extract(epoch FROM now())::integer ts,
                        coalesce(max(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle', 'idle in transaction', 'idle in transaction (aborted)') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) query_time_max,
                        coalesce(max(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) tx_time_max,
                        coalesce(max(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle') AND query ~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) mro_time_max,
                        coalesce(sum(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle', 'idle in transaction', 'idle in transaction (aborted)') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) query_time_sum,
                        coalesce(sum(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) tx_time_sum,
                        coalesce(sum(extract('epoch' FROM (clock_timestamp() - query_start))::integer * (state NOT IN ('idle') AND query ~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) mro_time_sum,

                        coalesce(sum((extract('epoch' FROM (clock_timestamp() - query_start)) > :tmax)::integer * (state NOT IN ('idle', 'idle in transaction', 'idle in transaction (aborted)') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) query_slow_count,
                        coalesce(sum((extract('epoch' FROM (clock_timestamp() - query_start)) > :tmax)::integer * (state NOT IN ('idle') AND query !~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) tx_slow_count,
                        coalesce(sum((extract('epoch' FROM (clock_timestamp() - query_start)) > :tmax)::integer * (state NOT IN ('idle') AND query ~* E'^(\\s*(--[^\\n]*\\n|/\\*.*\\*/|\\n))*(autovacuum|VACUUM|ANALYZE|REINDEX|CLUSTER|CREATE|ALTER|TRUNCATE|DROP)')::integer), 0) mro_slow_count
                FROM pg_stat_activity
                WHERE pid <> pg_backend_pid()
                GROUP BY 1) T
        WHERE NOT db.datistemplate )
SELECT json_object_agg(datname, row_to_json(T))
FROM T

【问题讨论】:

    标签: postgresql zabbix


    【解决方案1】:

    文档说:

    -c <em><strong>command</strong></em>

    [...]

    <em><strong>command</strong></em> 必须是服务器完全可解析的命令字符串(即,它不包含特定于 psql 的功能),或者是单个反斜杠命令。

    所以你不能那样使用变量。

    您可以使用“这里的文档”:

    psql <<EOF
    \set x $5
    SELECT :x
    EOF
    

    或者你直接在语句中使用shell变量:

    psql -c "SELECT $5"
    

    【讨论】:

      【解决方案2】:

      这里的问题是您将一个空值传递给tmax=,这由错误指示

      psql:/var/lib/zabbix/postgresql/pgsql.query.time.sql:31: ERROR: syntax error (near position: ")")
      STRING 22: ...'epoch' FROM (clock_timestamp() - query_start)) > )::integer...
                                                                     ^-- missing value here
      

      我相信{$PG.SLOW_QUERIES.MAX.WARN} 宏是这里的罪魁祸首。确保:

      1. 存在于模板宏中;
      2. 主机宏不会将其覆盖为空值。

      现在,这可能不是监控不起作用的实际原因。为了确定到底出了什么问题,我们需要查看 zabbix 代理日志,您可以在 /var/log/zabbix/zabbix_agentd.log 中找到这些日志。您可能需要将代理配置中的DebugLevel 更改为3 甚至4(疲倦的4 级会产生大量输出)。

      【讨论】:

        【解决方案3】:

        我启用了调试级别 = 4,并且在 zabbix 代理的日志中我现在看到了正确的命令:

        psql -qtAX -h "127.0.0.1" -p "5432" -U "zbx_monitor" -d "test2" -v tmax=30 -f "/var/lib/zabbix/postgresql/pgsql.query.time.sql"
        

        但是在命令的输出中,我得到所有数据库的所有指标的 0 值,尽管我对 test2 数据库运行了一个并行查询,这需要 14 秒(我已经多次运行此查询)。这是为什么呢?

        这里是命令执行结果的输出

        psql -qtAX -h "127.0.0.1" -p "5432" -U "zbx_monitor" -d "test2" -v tmax=30 -f "/var/lib/zabbix/postgresql/pgsql.query.time.sql"
        { "postgres" : {"datname":"postgres","query_time_max":0,"tx_time_max":0,"mro_time_max":0,"query_time_sum":0,"tx_time_sum":0,"mro_time_sum":0,"query_slow_count":0,"tx_slow_count":0,"mro_slow_count":0}, "test" : {"datname":"test","query_time_max":0,"tx_time_max":0,"mro_time_max":0,"query_time_sum":0,"tx_time_sum":0,"mro_time_sum":0,"query_slow_count":0,"tx_slow_count":0,"mro_slow_count":0}, "test2" : {"datname":"test2","query_time_max":0,"tx_time_max":0,"mro_time_max":0,"query_time_sum":0,"tx_time_sum":0,"mro_time_sum":0,"query_slow_count":0,"tx_slow_count":0,"mro_slow_count":0} }
        

        【讨论】: