如何提高我的 postgresql 查询的性能？答案

【问题标题】：How can I improve the performance of my postgresql query?如何提高我的 postgresql 查询的性能？
【发布时间】：2021-07-30 11:57:04
【问题描述】：

我有一个查询，它返回按帐户分组的时间间隔内的买入、卖出和转账的总和，问题是它很慢，我只在过去 24 小时内进行交易，我想能够为所有交易运行此功能（2 年内 800,000 次）。我该如何优化它？

select
    i.interval, ca.contract_address,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
from
    (
        select contract_address
        from addresses a
        where not exists (select 1 from address_tags at where at.address = a.contract_address and at.tag_id = 3)
    ) ca
cross join
    (
        SELECT date_trunc('hour', dd) as interval
        FROM generate_series
        (
            (now() at time zone 'utc') - interval '1 day',
            (now() at time zone 'utc'),
            '1 hour'::interval
        ) dd
    ) i
left join transfers t on (t.from = ca.contract_address or t.to = ca.contract_address) and date_trunc('hour', t.timestamp at time zone 'utc') = i.interval
group by i.interval, ca.contract_address;

示例输出：

      interval       |              contract_address              | amount_ampl_bought | amount_ampl_sold | amount_ampl_transferred |     percent_ampl_bought     |     percent_ampl_sold      |  percent_ampl_transferred  
---------------------+--------------------------------------------+--------------------+------------------+-------------------------+-----------------------------+----------------------------+----------------------------
 2021-05-08 11:00:00 | 0x0000000000000000000000000000000000000000 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000000000000000000000000000dead |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000006f6502b7f2bbac8c30a3f67e9a |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000084e91743124a982076c59f10084 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000000eb4ec62758aae93400b3e5f7f18 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x00000000000017c75025d397b91d284bbe8fc7f2 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000005117dd3a72e64a705198753fdd54 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000740a22fa209cf6806d38f7605385 |                  0 |                0 |                       0 |                           0 |                          0 |                          0

链接到可视化查询：

https://explain.depesz.com/s/SrLf

我在转账时创建的索引：

 CREATE INDEX transfers_from_to_index ON public.transfers USING btree ("from", "to")
 CREATE INDEX transfers_timestamp_index ON public.transfers USING btree ("timestamp")
 CREATE INDEX transfers_action_index ON public.transfers USING btree (action)
 CREATE UNIQUE INDEX transfers_pkey ON public.transfers USING btree (transaction_hash, log_index)
 CREATE INDEX transfers_supply_percentage_index ON public.transfers USING btree (supply_percentage)
 CREATE INDEX transfers_amount_index ON public.transfers USING btree (amount)
 CREATE INDEX transfers_supply_percentage_timestamp_log_index_index ON public.transfers USING btree (supply_percentage, "timestamp", log_index)
 CREATE INDEX transfers_date_trunc_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp")))
 CREATE INDEX transfers_to_index ON public.transfers USING btree ("to")

我在地址上创建的索引：

 CREATE UNIQUE INDEX addresses_pkey ON public.addresses USING btree (contract_address)
 CREATE INDEX addresses_supply_percentage_index ON public.addresses USING btree (supply_percentage)

非常感谢您对此优化的帮助！

【问题讨论】：

如果您限定了 all 列引用，这会有所帮助，以便清楚列的来源。
感谢您的反馈，我现在就去做！
如果从加入中删除 LEFT 需要多长时间？也许只获取存在的数据并通过不同的机制填充缺失值会更快。

标签： sql postgresql performance optimization query-optimization

【解决方案1】：

我很确定问题是transfers 上的JOIN 条件中的or。在合理的假设下，您应该能够将其拆分为两个单独的 left joins：

select i.interval, a.contract_address,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as amount_ampl_bought,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as amount_ampl_sold,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as amount_ampl_transferred,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as percent_ampl_bought,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as percent_ampl_sold,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as percent_ampl_transferred
from addresses a cross join
     generate_series(date_trunc('hour', (now() at time zone 'utc') - interval '1 hour'),
                     date_trunc('hour', now() at time zone 'utc'),
                     '1 hour'::interval
                    ) i left join
      transfers tf
      on tf.from = ca.contract_address and
         date_trunc('hour', tf.timestamp at time zone 'utc') = i.interval left join
      transfers tt
      on t.to = ca.contract_address and
         date_trunc('hour', tt.timestamp at time zone 'utc') = i.interval
where not exists (select 1
                  from address_tags at
                  where at.address = a.contract_address and at.tag_id = 3
                 )
group by i.interval, ca.contract_address;

那么对于这个查询，你需要索引：

address_tags(address, tag_id)
transfers(to, timestamp)
transfers(from, timestamp)

（请注意，to 和 from 是非常糟糕的列名称，因为它们是 SQL 关键字。）

timetamp 到 UTC 的转换也可能会造成问题。我建议您修复您的数据，以便时间戳都在一个共同的时区中——为此我建议使用 UTC（以避免夏令时问题）。

【讨论】：

我所有的时间戳都已经是 UTC 格式了，但是要在 date_trunc 上创建一个索引 postgres 抱怨它需要不是不可变的，所以我在那里指定了时区 UTC，然后在查询中指定了以确保?但是数据全部以UTC格式存储。在最初创建之后，我意识到 to 和 from 是关键字，对此我深表歉意。我会对此进行测试并报告，非常感谢您的帮助！

【解决方案2】：

看起来它已经完成了所有时间段的大部分工作，只是在完成大部分工作后过滤掉了你没有要求的工作。所以如果你想要一个不同的时间段，那就去做吧。如果这仍然太慢，然后发布计划。那么至少我们会优化正确的查询。

【讨论】：

我在另一个查询中所做的只是增加我取消间隔的数量，例如在生成系列的所有时间里，我现在会做减去一年的时间。有没有更好的方法来做到这一点？
如果没有看到 EXPLAIN（ANALYZE、BUFFERS），或者至少没有看到 EXPLAIN，我不会知道。
这是一年（所有时间大约是 3 年）：explain.depesz.com/s/M3Ur

【解决方案3】：

你能在下面试一试吗？ AFAIK 没有理由将所有内容都塞进 1 个查询中，所以我拆分了其中的一些部分。我还将or 分成两部分，它应该可以更好地使用索引。然后注意到这正是 Gordon 在上面所做的（到目前为止，我认为找到一种可能比 UNION ALL 更快的解决方法非常聪明=）

还添加了 WHERE on action，不确定是否有除 0、1、2 以外的其他值。如果没有，您可以再次删除。

PS：这里未经测试和盲目工作，只是好奇（和希望=）

DROP TABLE IF EXISTS _combined;

WITH intervals
  AS ( 
       SELECT i as interval            
          FROM generate_series(
                                date_trunc('hour', (now() at time zone 'utc') - interval '1 day'),
                                date_trunc('hour', (now() at time zone 'utc')),
                                '1 hour'::interval
                            ) ,
     adrs 
  AS (
        SELECT a.contract_address
          FROM addresses a 
        EXCEPT
        SELECT at.address 
          FROM address_tags at
         WHERE at.tag_id = 3)
         
SELECT a.contract_address, i.interval
  INTO TEMPORARY TABLE _combined
  FROM intervals i
 CROSS JOIN adrs a
           
CREATE UNIQUE INDEX uq_combined ON _combined (interval, contract_address)

SELECT c.interval, 
       c.contract_address,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
  FROM _combined c

  LEFT OUTER JOIN transfers tf 
               ON tf.from = c.contract_address  
              AND date_trunc('hour', tf.timestamp at time zone 'utc') = c.interval
              AND tf.action IN (0, 1, 2)

  LEFT OUTER JOIN transfers tt 
               ON tt.to = c.contract_address 
              AND date_trunc('hour', tt.timestamp at time zone 'utc') = c.interval
              AND tt.action IN (0, 1, 2)
       
group by c.interval, c.contract_address;

这个查询的理想索引是：

CREATE INDEX transfers_date_trunc_to_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), to)) INCLUDE (action, amount, supply_percentage) 
CREATE INDEX transfers_date_trunc_from_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), from)) INCLUDE (action, amount, supply_percentage)

【讨论】：

今晚我会试一试，感谢您回复我！