【问题标题】:How could I speed up this SQL query?我怎样才能加快这个 SQL 查询?
【发布时间】:2021-07-31 21:48:09
【问题描述】:

我有这个问题:

select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
group by grouping sets ( (time_interval), () ) 
order by time_interval desc nulls last;

有了这个解释计划:

https://explain.depesz.com/s/gJXC

我有这些索引:

CREATE UNIQUE INDEX mv_30day_daily_aggregate_buys_sells_transfers_primary
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (time_interval, contract_address);

CREATE INDEX mv_30day_daily_aggregate_buys_sells_transfers_time_interval
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (time_interval);

CREATE INDEX v_30day_daily_aggregate_buys_sells_transfers_contract_address
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (contract_address)

是否可以进一步优化?这张表只有 30 个时间间隔,所以我觉得我应该能够更快地获得它。

【问题讨论】:

  • 能否将表名(或表别名)添加到该行的列中:where address = addresses.contract_address and tag_id = 3
  • 您加入addresses 表是为了检查地址是否存在?还是只是为了方便address_tags中的NOT EXISTS()?如果是后者,我看你根本不需要那个加入? NOT EXISTS() 只能引用mv.contract_address?)
  • 另外,请在所有列前加上表别名。您可能很清楚每列的来源,但它不适合我们。
  • 我将从添加索引开始:create index ix1 on address_tags (tag_id, address);
  • 我添加了别名并简化查询我删除了连接并更新了解释计划(稍后当我限制返回给顶级持有者的地址时需要连接但它没有太大影响关于性能,所以暂时将其删除)。地址标签和地址具有正确的索引。

标签: sql postgresql performance query-optimization


【解决方案1】:

这里的一个主要限制(至少如果您有空闲的 CPU)是 GROUPING SETS 不支持并行执行。据我所知,这并没有根本原因,只是还没有人能把它连接起来。只有 32 个组,如果愿意使用它应该可以很好地并行化。因此,您可以做的一件事是使用 UNION ALL 将其重写为两个查询,这样您就可以获得并行化。

select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
group by time_interval 
union all 
select 
    NULL, 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
order by time_interval desc nulls last;

但您不妨利用我们的知识,即总和可以从任意分割的部分拼凑起来:

with t as (select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
    from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
    group by time_interval 
) 
select * from t 
union all 
select NULL, sum(percent_ampl_bought), sum(percent_ampl_sold), sum(percent_ampl_transferred),sum(amount_ampl_bought), sum(amount_ampl_sold), sum(amount_ampl_transferred) from t 
order by time_interval desc nulls last;

使用最后一个,我的速度大约是原来的 3 倍,但如果我的测试盒上有 2 个以上的 CPU,速度会更快。

此外,如果您将类型更改为双精度而不是数字,那么它会更快。

【讨论】:

    猜你喜欢
    • 2021-08-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-21
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多