【问题标题】:How can I improve the performance of my postgresql query?如何提高我的 postgresql 查询的性能?
【发布时间】:2021-07-30 11:57:04
【问题描述】:

我有一个查询,它返回按帐户分组的时间间隔内的买入、卖出和转账的总和,问题是它很慢,我只在过去 24 小时内进行交易,我想能够为所有交易运行此功能(2 年内 800,000 次)。我该如何优化它?

select
    i.interval, ca.contract_address,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
from
    (
        select contract_address
        from addresses a
        where not exists (select 1 from address_tags at where at.address = a.contract_address and at.tag_id = 3)
    ) ca
cross join
    (
        SELECT date_trunc('hour', dd) as interval
        FROM generate_series
        (
            (now() at time zone 'utc') - interval '1 day',
            (now() at time zone 'utc'),
            '1 hour'::interval
        ) dd
    ) i
left join transfers t on (t.from = ca.contract_address or t.to = ca.contract_address) and date_trunc('hour', t.timestamp at time zone 'utc') = i.interval
group by i.interval, ca.contract_address;

示例输出:

      interval       |              contract_address              | amount_ampl_bought | amount_ampl_sold | amount_ampl_transferred |     percent_ampl_bought     |     percent_ampl_sold      |  percent_ampl_transferred  
---------------------+--------------------------------------------+--------------------+------------------+-------------------------+-----------------------------+----------------------------+----------------------------
 2021-05-08 11:00:00 | 0x0000000000000000000000000000000000000000 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000000000000000000000000000dead |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000006f6502b7f2bbac8c30a3f67e9a |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000084e91743124a982076c59f10084 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000000eb4ec62758aae93400b3e5f7f18 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x00000000000017c75025d397b91d284bbe8fc7f2 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000005117dd3a72e64a705198753fdd54 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000740a22fa209cf6806d38f7605385 |                  0 |                0 |                       0 |                           0 |                          0 |                          0

链接到可视化查询:

https://explain.depesz.com/s/SrLf

我在转账时创建的索引:

 CREATE INDEX transfers_from_to_index ON public.transfers USING btree ("from", "to")
 CREATE INDEX transfers_timestamp_index ON public.transfers USING btree ("timestamp")
 CREATE INDEX transfers_action_index ON public.transfers USING btree (action)
 CREATE UNIQUE INDEX transfers_pkey ON public.transfers USING btree (transaction_hash, log_index)
 CREATE INDEX transfers_supply_percentage_index ON public.transfers USING btree (supply_percentage)
 CREATE INDEX transfers_amount_index ON public.transfers USING btree (amount)
 CREATE INDEX transfers_supply_percentage_timestamp_log_index_index ON public.transfers USING btree (supply_percentage, "timestamp", log_index)
 CREATE INDEX transfers_date_trunc_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp")))
 CREATE INDEX transfers_to_index ON public.transfers USING btree ("to")

我在地址上创建的索引:

 CREATE UNIQUE INDEX addresses_pkey ON public.addresses USING btree (contract_address)
 CREATE INDEX addresses_supply_percentage_index ON public.addresses USING btree (supply_percentage)

非常感谢您对此优化的帮助!

【问题讨论】:

  • 如果您限定了 all 列引用,这会有所帮助,以便清楚列的来源。
  • 感谢您的反馈,我现在就去做!
  • 如果从加入中删除 LEFT 需要多长时间?也许只获取存在的数据并通过不同的机制填充缺失值会更快。

标签: sql postgresql performance optimization query-optimization


【解决方案1】:

我很确定问题是transfers 上的JOIN 条件中的or。在合理的假设下,您应该能够将其拆分为两个单独的 left joins:

select i.interval, a.contract_address,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as amount_ampl_bought,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as amount_ampl_sold,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as amount_ampl_transferred,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as percent_ampl_bought,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as percent_ampl_sold,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as percent_ampl_transferred
from addresses a cross join
     generate_series(date_trunc('hour', (now() at time zone 'utc') - interval '1 hour'),
                     date_trunc('hour', now() at time zone 'utc'),
                     '1 hour'::interval
                    ) i left join
      transfers tf
      on tf.from = ca.contract_address and
         date_trunc('hour', tf.timestamp at time zone 'utc') = i.interval left join
      transfers tt
      on t.to = ca.contract_address and
         date_trunc('hour', tt.timestamp at time zone 'utc') = i.interval
where not exists (select 1
                  from address_tags at
                  where at.address = a.contract_address and at.tag_id = 3
                 )
group by i.interval, ca.contract_address;

那么对于这个查询,你需要索引:

  • address_tags(address, tag_id)
  • transfers(to, timestamp)
  • transfers(from, timestamp)

(请注意,tofrom 是非常糟糕的列名称,因为它们是 SQL 关键字。)

timetamp 到 UTC 的转换也可能会造成问题。我建议您修复您的数据,以便时间戳都在一个共同的时区中——为此我建议使用 UTC(以避免夏令时问题)。

【讨论】:

  • 我所有的时间戳都已经是 UTC 格式了,但是要在 date_trunc 上创建一个索引 postgres 抱怨它需要不是不可变的,所以我在那里指定了时区 UTC,然后在查询中指定了以确保?但是数据全部以UTC格式存储。在最初创建之后,我意识到 to 和 from 是关键字,对此我深表歉意。我会对此进行测试并报告,非常感谢您的帮助!
【解决方案2】:

看起来它已经完成了所有时间段的大部分工作,只是在完成大部分工作后过滤掉了你没有要求的工作。所以如果你想要一个不同的时间段,那就去做吧。如果这仍然太慢,然后发布计划。那么至少我们会优化正确的查询。

【讨论】:

  • 我在另一个查询中所做的只是增加我取消间隔的数量,例如在生成系列的所有时间里,我现在会做减去一年的时间。有没有更好的方法来做到这一点?
  • 如果没有看到 EXPLAIN(ANALYZE、BUFFERS),或者至少没有看到 EXPLAIN,我不会知道。
  • 这是一年(所有时间大约是 3 年):explain.depesz.com/s/M3Ur
【解决方案3】:

你能在下面试一试吗? AFAIK 没有理由将所有内容都塞进 1 个查询中,所以我拆分了其中的一些部分。 我还将or 分成两部分,它应该可以更好地使用索引。然后注意到这正是 Gordon 在上面所做的(到目前为止,我认为找到一种可能比 UNION ALL 更快的解决方法非常聪明=)

还添加了 WHERE on action,不确定是否有除 0、1、2 以外的其他值。如果没有,您可以再次删除。

PS:这里未经测试和盲目工作,只是好奇(和希望=)

DROP TABLE IF EXISTS _combined;

WITH intervals
  AS ( 
       SELECT i as interval            
          FROM generate_series(
                                date_trunc('hour', (now() at time zone 'utc') - interval '1 day'),
                                date_trunc('hour', (now() at time zone 'utc')),
                                '1 hour'::interval
                            ) ,
     adrs 
  AS (
        SELECT a.contract_address
          FROM addresses a 
        EXCEPT
        SELECT at.address 
          FROM address_tags at
         WHERE at.tag_id = 3)
         
SELECT a.contract_address, i.interval
  INTO TEMPORARY TABLE _combined
  FROM intervals i
 CROSS JOIN adrs a
           
CREATE UNIQUE INDEX uq_combined ON _combined (interval, contract_address)

SELECT c.interval, 
       c.contract_address,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
  FROM _combined c

  LEFT OUTER JOIN transfers tf 
               ON tf.from = c.contract_address  
              AND date_trunc('hour', tf.timestamp at time zone 'utc') = c.interval
              AND tf.action IN (0, 1, 2)

  LEFT OUTER JOIN transfers tt 
               ON tt.to = c.contract_address 
              AND date_trunc('hour', tt.timestamp at time zone 'utc') = c.interval
              AND tt.action IN (0, 1, 2)
       
group by c.interval, c.contract_address;

这个查询的理想索引是:

CREATE INDEX transfers_date_trunc_to_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), to)) INCLUDE (action, amount, supply_percentage) 
CREATE INDEX transfers_date_trunc_from_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), from)) INCLUDE (action, amount, supply_percentage)

【讨论】:

  • 今晚我会试一试,感谢您回复我!
猜你喜欢
  • 1970-01-01
  • 2021-08-03
  • 2013-02-17
  • 2018-10-19
  • 2017-11-23
  • 1970-01-01
  • 1970-01-01
  • 2013-12-15
  • 2015-07-11
相关资源
最近更新 更多