Postgres 慢查询（慢速索引扫描）答案

【问题标题】：Postgres slow query (slow index scan)Postgres 慢查询（慢速索引扫描）
【发布时间】：2014-03-24 23:37:22
【问题描述】：

我有一个包含 300 万行和 1.3GB 大小的表。在具有 4GB RAM 的笔记本电脑上运行 Postgres 9.3。

explain analyze
select act_owner_id from cnt_contacts where act_owner_id = 2

我在 cnt_contacts.act_owner_id 上的 btree 键定义为：

CREATE INDEX cnt_contacts_idx_act_owner_id 
   ON public.cnt_contacts USING btree (act_owner_id, status_id);

查询运行大约 5 秒

cnt_contacts 上的位图堆扫描（成本=2598.79..86290.73 行=6208 宽度=4）（实际时间=5865.617..5875.302 行=5444 循环=1）重新检查条件：（act_owner_id = 2） -> 在 cnt_contacts_idx_act_owner_id 上扫描位图索引（成本=0.00..2597.24 行=6208 宽度=0）（实际时间=5865.407..5865.407 行=5444 循环=1）指数条件：（act_owner_id = 2）总运行时间：5875.684 ms" 为什么要花这么长时间？

work_mem = 1024MB; 
shared_buffers = 128MB;
effective_cache_size = 1024MB
seq_page_cost = 1.0         # measured on an arbitrary scale
random_page_cost = 15.0         # same scale as above
cpu_tuple_cost = 3.0

【问题讨论】：

cnt_contacts_idx_act_owner_id索引的定义是什么？
在 public.cnt_contacts 上使用 btree (act_owner_id, status_id) 创建索引 cnt_contacts_idx_act_owner_id；
你应该创建另一个只有act_owner_id的索引。
你为什么要增加这么多的random_page_cost？（如果我没记错的话，默认是 4.0）。这样你就告诉 Postgres 你有一个令人难以置信的慢速硬盘和非常高的延迟。而且 cpu_tuple_cost 看起来也很奇怪（假设默认是 0.01）。即使在我相当旧的慢速桌面上，将 random_page_cost 降低到 2.5 也改进了 Postgres 创建的执行计划
而 work_mem=1GB 也是荒谬的。

标签： sql postgresql postgresql-9.2

【解决方案1】：

好的，你有大表、索引和 PG 的长时间执行。让我们想想如何改进您的计划并减少时间。您写入和删除行。 PG 写入和删除元组和表和索引可能会膨胀。为了良好的搜索，PG 将索引加载到共享缓冲区。而且您需要尽可能保持索引清洁。对于选择 PG 读取共享缓冲区而不是搜索。尝试设置缓冲内存并减少索引和表膨胀，保持数据库清洁。

您的工作和想法：

1) 只需检查索引重复并且您的索引有很好的选择：

 WITH table_scans as (
    SELECT relid,
        tables.idx_scan + tables.seq_scan as all_scans,
        ( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
                pg_relation_size(relid) as table_size
        FROM pg_stat_user_tables as tables
),
all_writes as (
    SELECT sum(writes) as total_writes
    FROM table_scans
),
indexes as (
    SELECT idx_stat.relid, idx_stat.indexrelid,
        idx_stat.schemaname, idx_stat.relname as tablename,
        idx_stat.indexrelname as indexname,
        idx_stat.idx_scan,
        pg_relation_size(idx_stat.indexrelid) as index_bytes,
        indexdef ~* 'USING btree' AS idx_is_btree
    FROM pg_stat_user_indexes as idx_stat
        JOIN pg_index
            USING (indexrelid)
        JOIN pg_indexes as indexes
            ON idx_stat.schemaname = indexes.schemaname
                AND idx_stat.relname = indexes.tablename
                AND idx_stat.indexrelname = indexes.indexname
    WHERE pg_index.indisunique = FALSE
),
index_ratios AS (
SELECT schemaname, tablename, indexname,
    idx_scan, all_scans,
    round(( CASE WHEN all_scans = 0 THEN 0.0::NUMERIC
        ELSE idx_scan::NUMERIC/all_scans * 100 END),2) as index_scan_pct,
    writes,
    round((CASE WHEN writes = 0 THEN idx_scan::NUMERIC ELSE idx_scan::NUMERIC/writes END),2)
        as scans_per_write,
    pg_size_pretty(index_bytes) as index_size,
    pg_size_pretty(table_size) as table_size,
    idx_is_btree, index_bytes
    FROM indexes
    JOIN table_scans
    USING (relid)
),
index_groups AS (
SELECT 'Never Used Indexes' as reason, *, 1 as grp
FROM index_ratios
WHERE
    idx_scan = 0
    and idx_is_btree
UNION ALL
SELECT 'Low Scans, High Writes' as reason, *, 2 as grp
FROM index_ratios
WHERE
    scans_per_write <= 1
    and index_scan_pct < 10
    and idx_scan > 0
    and writes > 100
    and idx_is_btree
UNION ALL
SELECT 'Seldom Used Large Indexes' as reason, *, 3 as grp
FROM index_ratios
WHERE
    index_scan_pct < 5
    and scans_per_write > 1
    and idx_scan > 0
    and idx_is_btree
    and index_bytes > 100000000
UNION ALL
SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp 
FROM index_ratios, all_writes
WHERE
    ( writes::NUMERIC / ( total_writes + 1 ) ) > 0.02
    AND NOT idx_is_btree
    AND index_bytes > 100000000
ORDER BY grp, index_bytes DESC )
SELECT reason, schemaname, tablename, indexname,
    index_scan_pct, scans_per_write, index_size, table_size
FROM index_groups;

2) 检查表和索引是否膨胀？

     SELECT
        current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
        ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
        CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
      iname, /*ituples::bigint, ipages::bigint, iotta,*/
      ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
      CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
    FROM (
      SELECT
        schemaname, tablename, cc.reltuples, cc.relpages, bs,
        CEIL((cc.reltuples*((datahdr+ma-
          (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
        COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
        COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
      FROM (
        SELECT
          ma,bs,schemaname,tablename,
          (datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
          (maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
        FROM (
          SELECT
            schemaname, tablename, hdr, ma, bs,
            SUM((1-null_frac)*avg_width) AS datawidth,
            MAX(null_frac) AS maxfracsum,
            hdr+(
              SELECT 1+COUNT(*)/8
              FROM pg_stats s2
              WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
            ) AS nullhdr
          FROM pg_stats s, (
            SELECT
              (SELECT current_setting('block_size')::NUMERIC) AS bs,
              CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
              CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
            FROM (SELECT version() AS v) AS foo
          ) AS constants
          GROUP BY 1,2,3,4,5
        ) AS foo
      ) AS rs
      JOIN pg_class cc ON cc.relname = rs.tablename
      JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
      LEFT JOIN pg_index i ON indrelid = cc.oid
      LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
    ) AS sml
    ORDER BY wastedbytes DESC

3) 您是否从硬盘中清除未使用的元组？是时候抽真空了吗？

SELECT 
    relname AS TableName
    ,n_live_tup AS LiveTuples
    ,n_dead_tup AS DeadTuples
FROM pg_stat_user_tables;

4) 考虑一下。如果您在 db 中有 10 条记录，并且 10 条中有 8 条 id = 2，这意味着您的索引选择性不好，这样 PG 将扫描所有 8 条记录。但是你们尝试使用 id != 2 索引会很好用。尝试设置好的选择索引。

5) 使用正确的列类型获取数据。如果您可以为您的列使用较少的 kb 类型，只需将其转换即可。

6) 只需检查您的数据库和条件。检查这个开始去page 只需尝试查看您在数据库中的表中有未使用的数据，必须清理索引，检查索引的选择性。尝试对数据使用其他 brin 索引，尝试重新创建索引。

【讨论】：

【解决方案2】：

您正在选择分散在笔记本电脑上的 1.3 GB 表中的 5444 条记录。您预计需要多长时间？

看起来您的索引没有被缓存，要么是因为它无法在缓存中持续存在，要么是因为这是您第一次使用它的那部分。如果重复运行完全相同的查询会发生什么？相同的查询但具有不同的常量？

在“explain (analyze,buffers)”下运行查询将有助于获取更多信息，尤其是在您先打开 track_io_timing 时。

【讨论】：

关于使用EXPLAIN (ANALYZE, BUFFERS) ... 的很好的提示，它给出了共享缓冲区命中的输出。它帮助我意识到增加共享缓冲区的大小可以提高性能，如果瓶颈真的存在的话，在我的例子中是真的。