【问题标题】:(TimescaleDB) - How can I optimise a JOIN query on a dataset of ~2 billion rows. (too slow right now)(TimescaleDB) - 如何优化约 20 亿行数据集的 JOIN 查询。 (现在太慢了)
【发布时间】:2019-08-23 12:57:24
【问题描述】:

我有一个数据库,其中包含大量传感器数据(如物联网) - 我将这些数据存储在一个名为 events 的表中。 (结构如下)

现在用户还可以订阅特定信号的变化(通过 id),他们可以请求从特定点开始的所有信号事件(也只有他们订阅的信号)。

问题是这个查询真的很慢(1 分钟+),我不明白为什么这么慢。我是否缺少某种索引?

我要执行的查询:

SELECT events.*
FROM Events
   JOIN subscription_signal
      ON subscription_signal.subscription_id = $1 AND
         events.signal_id = subscription_signal.signal_id
WHERE source_time > $2
ORDER BY source_time ASC
LIMIT $3
create table if not exists events
(
    source_time timestamp with time zone not null,
    added_time timestamp with time zone default now(),
    signal_id integer not null,
    data_type text,
    analog_value numeric(18,4) default 0,
    digital_value boolean default false
);

create index if not exists events_source_time_idx
    on new_events (source_time desc);

create index if not exists vents_signal_id_source_time_index
    on new_events (signal_id, source_time);

create index if not exists events_source_time_signal_id_index
    on new_events (source_time, signal_id);

create index if not exists events_signal_id_index
    on new_events (signal_id);

create table if not exists subscription_signal
(
    subscription_id integer not null,
    signal_id integer not null,
    constraint subscription_signal_subscription_id_signal_id_uindex
        unique (subscription_id, signal_id)
);

我已经尝试添加多个索引,但似乎没有任何改善查询时间。

下面是一个 EXPLAIN ANALYZE 使用 11 的 subscription_id,在 subscription_signal 表中有 114 个 signal_ids。

EXPLAIN ANALYZE:

Limit  (cost=0.71..0.71 rows=500 width=33) (actual time=24018.215..372251.399 rows=274 loops=1)
  ->  Nested Loop  (cost=0.71..0.30 rows=72839547 width=33) (actual time=24018.214..372251.287 rows=274 loops=1)
        ->  Append  (cost=0.43..0.00 rows=0 width=33) (actual time=0.038..141484.028 rows=133615131 loops=1)
              ->  Index Scan Backward using _hyper_1_2_chunk_events_source_timestamp_idx on _hyper_1_2_chunk  (cost=0.43..184291.13 rows=5852043 width=32) (actual time=0.038..9359.166 rows=5852114 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_3_chunk_events_source_timestamp_idx on _hyper_1_3_chunk  (cost=0.43..119619.56 rows=3733367 width=33) (actual time=0.096..4841.893 rows=3733367 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_4_chunk_events_source_timestamp_idx on _hyper_1_4_chunk  (cost=0.43..101122.76 rows=3188104 width=33) (actual time=0.065..2088.177 rows=3188104 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_5_chunk_events_source_timestamp_idx on _hyper_1_5_chunk  (cost=0.43..103410.32 rows=3223702 width=33) (actual time=0.055..3959.425 rows=3223702 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_6_chunk_events_source_timestamp_idx on _hyper_1_6_chunk  (cost=0.43..130778.92 rows=4116765 width=33) (actual time=0.039..2917.270 rows=4116765 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_7_chunk_events_source_timestamp_idx on _hyper_1_7_chunk  (cost=0.43..160960.05 rows=5047161 width=33) (actual time=0.030..3801.518 rows=5047161 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_8_chunk_events_source_timestamp_idx on _hyper_1_8_chunk  (cost=0.43..161138.72 rows=5067119 width=33) (actual time=0.046..3106.090 rows=5067119 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_9_chunk_events_source_timestamp_idx on _hyper_1_9_chunk  (cost=0.43..148498.73 rows=4523834 width=33) (actual time=0.050..4114.502 rows=4535789 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_10_chunk_events_source_timestamp_idx on _hyper_1_10_chunk  (cost=0.43..149195.92 rows=4690233 width=33) (actual time=0.029..4904.397 rows=4690344 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_11_chunk_events_source_timestamp_idx on _hyper_1_11_chunk  (cost=0.43..148274.33 rows=4627748 width=33) (actual time=0.034..4086.343 rows=4645264 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_12_chunk_events_source_timestamp_idx on _hyper_1_12_chunk  (cost=0.43..147687.64 rows=4608886 width=33) (actual time=0.054..4305.309 rows=4612240 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_13_chunk_events_source_timestamp_idx on _hyper_1_13_chunk  (cost=0.43..137341.97 rows=4280236 width=33) (actual time=0.049..6224.372 rows=4301055 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_14_chunk_events_source_timestamp_idx on _hyper_1_14_chunk  (cost=0.43..141762.95 rows=4460189 width=33) (actual time=0.054..2724.318 rows=4460189 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_15_chunk_events_source_timestamp_idx on _hyper_1_15_chunk  (cost=0.43..143731.09 rows=4522774 width=33) (actual time=0.029..3981.750 rows=4522875 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_16_chunk_events_source_timestamp_idx on _hyper_1_16_chunk  (cost=0.43..150538.02 rows=4721239 width=33) (actual time=0.062..6109.135 rows=4722110 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_17_chunk_events_source_timestamp_idx on _hyper_1_17_chunk  (cost=0.43..142216.56 rows=4475544 width=33) (actual time=0.060..3955.418 rows=4475511 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_18_chunk_events_source_timestamp_idx on _hyper_1_18_chunk  (cost=0.43..137131.47 rows=4293396 width=33) (actual time=0.072..5002.332 rows=4294763 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_19_chunk_events_source_timestamp_idx on _hyper_1_19_chunk  (cost=0.43..136908.18 rows=4284368 width=33) (actual time=0.100..5866.330 rows=4279079 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_20_chunk_events_source_timestamp_idx on _hyper_1_20_chunk  (cost=0.43..142385.12 rows=4475205 width=33) (actual time=0.081..3407.288 rows=4475211 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_21_chunk_events_source_timestamp_idx on _hyper_1_21_chunk  (cost=0.43..154334.53 rows=4854714 width=33) (actual time=0.082..3049.434 rows=4855427 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_22_chunk_events_source_timestamp_idx on _hyper_1_22_chunk  (cost=0.43..149148.27 rows=4690356 width=33) (actual time=0.051..5394.662 rows=4690489 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_23_chunk_events_source_timestamp_idx on _hyper_1_23_chunk  (cost=0.43..147018.61 rows=4624741 width=33) (actual time=0.042..2781.965 rows=4624838 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_24_chunk_events_source_timestamp_idx on _hyper_1_24_chunk  (cost=0.43..151841.97 rows=4774145 width=33) (actual time=0.053..5188.953 rows=4776442 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_25_chunk_events_source_timestamp_idx on _hyper_1_25_chunk  (cost=0.43..141470.65 rows=4449075 width=32) (actual time=0.066..4113.208 rows=4451125 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_26_chunk_events_source_timestamp_idx on _hyper_1_26_chunk  (cost=0.43..133228.19 rows=4192152 width=32) (actual time=0.073..3794.901 rows=4192089 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_27_chunk_events_source_timestamp_idx on _hyper_1_27_chunk  (cost=0.43..93412.39 rows=2946643 width=33) (actual time=0.034..2600.225 rows=2946741 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_28_chunk_events_source_timestamp_idx on _hyper_1_28_chunk  (cost=0.43..107666.24 rows=3380023 width=33) (actual time=0.061..4010.668 rows=3380023 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_29_chunk_events_source_timestamp_idx on _hyper_1_29_chunk  (cost=0.43..139070.65 rows=4376229 width=33) (actual time=0.058..3597.805 rows=4376329 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_30_chunk_events_source_timestamp_idx on _hyper_1_30_chunk  (cost=0.43..133595.88 rows=4204134 width=33) (actual time=0.047..3998.751 rows=4204442 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_31_chunk_events_source_timestamp_idx on _hyper_1_31_chunk  (cost=0.43..139554.54 rows=4391212 width=33) (actual time=0.054..4415.468 rows=4391126 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_32_chunk_events_source_timestamp_idx on _hyper_1_32_chunk  (cost=0.43..78562.22 rows=2463045 width=33) (actual time=0.054..1672.220 rows=2482753 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_1_chunk_events_source_timestamp_idx on _hyper_1_1_chunk  (cost=0.14..5.59 rows=62 width=32) (actual time=12.567..12.603 rows=62 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_65_chunk_events_source_timestamp_idx on _hyper_1_65_chunk  (cost=0.15..16.92 rows=267 width=73) (actual time=8.766..8.768 rows=1 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_88_chunk_events_source_timestamp_idx on _hyper_1_88_chunk  (cost=0.15..12.90 rows=364 width=32) (actual time=7.869..8.008 rows=374 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
              ->  Index Scan Backward using _hyper_1_89_chunk_events_source_timestamp_idx on _hyper_1_89_chunk  (cost=0.14..6.16 rows=94 width=32) (actual time=0.017..0.047 rows=108 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
        ->  Index Only Scan using subscription_signal_subscription_id_signal_id_uindex on subscription_signal  (cost=0.28..0.30 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=133615131)
              Index Cond: ((subscription_id = 11) AND (signal_id = _hyper_1_2_chunk.signal_id))
              Heap Fetches: 274
Planning time: 8.924 ms
Execution time: 372252.454 ms

EXPLAIN (ANALYZE, BUFFERS):

Limit  (cost=0.71..0.71 rows=500 width=33) (actual time=19504.097..315933.258 rows=274 loops=1)
  Buffers: shared hit=269362667 read=1352546
  ->  Nested Loop  (cost=0.71..0.30 rows=72839557 width=33) (actual time=19504.096..315933.158 rows=274 loops=1)
        Buffers: shared hit=269362667 read=1352546
        ->  Append  (cost=0.43..0.00 rows=0 width=33) (actual time=14.950..96402.213 rows=133615135 loops=1)
              Buffers: shared hit=2132123 read=1352546
              ->  Index Scan Backward using _hyper_1_2_chunk_events_source_timestamp_idx on _hyper_1_2_chunk  (cost=0.43..184291.13 rows=5852043 width=32) (actual time=0.019..4707.774 rows=5852114 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=265328 read=20465
              ->  Index Scan Backward using _hyper_1_3_chunk_events_source_timestamp_idx on _hyper_1_3_chunk  (cost=0.43..119619.56 rows=3733367 width=33) (actual time=0.057..2192.118 rows=3733367 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=72014 read=25960
              ->  Index Scan Backward using _hyper_1_4_chunk_events_source_timestamp_idx on _hyper_1_4_chunk  (cost=0.43..101122.76 rows=3188104 width=33) (actual time=0.064..1964.159 rows=3188104 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=60389 read=23035
              ->  Index Scan Backward using _hyper_1_5_chunk_events_source_timestamp_idx on _hyper_1_5_chunk  (cost=0.43..103410.32 rows=3223702 width=33) (actual time=0.033..1769.853 rows=3223702 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=61428 read=23456
              ->  Index Scan Backward using _hyper_1_6_chunk_events_source_timestamp_idx on _hyper_1_6_chunk  (cost=0.43..130778.92 rows=4116765 width=33) (actual time=0.029..2138.480 rows=4116765 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=72968 read=30233
              ->  Index Scan Backward using _hyper_1_7_chunk_events_source_timestamp_idx on _hyper_1_7_chunk  (cost=0.43..160960.05 rows=5047161 width=33) (actual time=0.029..2628.668 rows=5047161 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=93741 read=35257
              ->  Index Scan Backward using _hyper_1_8_chunk_events_source_timestamp_idx on _hyper_1_8_chunk  (cost=0.43..161138.72 rows=5067119 width=33) (actual time=0.036..2523.536 rows=5067119 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=93406 read=35384
              ->  Index Scan Backward using _hyper_1_9_chunk_events_source_timestamp_idx on _hyper_1_9_chunk  (cost=0.43..148498.73 rows=4523834 width=33) (actual time=0.024..2890.820 rows=4535789 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=86274 read=32252
              ->  Index Scan Backward using _hyper_1_10_chunk_events_source_timestamp_idx on _hyper_1_10_chunk  (cost=0.43..149195.92 rows=4690233 width=33) (actual time=0.025..2765.170 rows=4690344 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=84883 read=37839
              ->  Index Scan Backward using _hyper_1_11_chunk_events_source_timestamp_idx on _hyper_1_11_chunk  (cost=0.43..148274.33 rows=4627748 width=33) (actual time=0.027..2902.346 rows=4645264 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=64046 read=59908
              ->  Index Scan Backward using _hyper_1_12_chunk_events_source_timestamp_idx on _hyper_1_12_chunk  (cost=0.43..147687.64 rows=4608886 width=33) (actual time=0.049..3556.359 rows=4612240 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=63025 read=59693
              ->  Index Scan Backward using _hyper_1_13_chunk_events_source_timestamp_idx on _hyper_1_13_chunk  (cost=0.43..137341.97 rows=4280236 width=33) (actual time=0.027..3361.873 rows=4301055 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=54986 read=55571
              ->  Index Scan Backward using _hyper_1_14_chunk_events_source_timestamp_idx on _hyper_1_14_chunk  (cost=0.43..141762.95 rows=4460189 width=33) (actual time=0.028..2859.309 rows=4460189 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=50767 read=57846
              ->  Index Scan Backward using _hyper_1_15_chunk_events_source_timestamp_idx on _hyper_1_15_chunk  (cost=0.43..143731.09 rows=4522774 width=33) (actual time=0.028..3143.475 rows=4522875 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=54841 read=58390
              ->  Index Scan Backward using _hyper_1_16_chunk_events_source_timestamp_idx on _hyper_1_16_chunk  (cost=0.43..150538.02 rows=4721239 width=33) (actual time=0.028..2854.384 rows=4722110 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=56488 read=61480
              ->  Index Scan Backward using _hyper_1_17_chunk_events_source_timestamp_idx on _hyper_1_17_chunk  (cost=0.43..142216.56 rows=4475544 width=33) (actual time=0.060..2522.463 rows=4475511 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=54403 read=57798
              ->  Index Scan Backward using _hyper_1_18_chunk_events_source_timestamp_idx on _hyper_1_18_chunk  (cost=0.43..137131.47 rows=4293396 width=33) (actual time=0.051..2854.544 rows=4294763 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=49427 read=56050
              ->  Index Scan Backward using _hyper_1_19_chunk_events_source_timestamp_idx on _hyper_1_19_chunk  (cost=0.43..136908.18 rows=4284368 width=33) (actual time=0.033..2673.792 rows=4279079 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=49055 read=55966
              ->  Index Scan Backward using _hyper_1_20_chunk_events_source_timestamp_idx on _hyper_1_20_chunk  (cost=0.43..142385.12 rows=4475205 width=33) (actual time=0.029..2793.156 rows=4475211 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=60428 read=57064
              ->  Index Scan Backward using _hyper_1_21_chunk_events_source_timestamp_idx on _hyper_1_21_chunk  (cost=0.43..154334.53 rows=4854714 width=33) (actual time=0.022..2701.886 rows=4855427 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=63812 read=62451
              ->  Index Scan Backward using _hyper_1_22_chunk_events_source_timestamp_idx on _hyper_1_22_chunk  (cost=0.43..149148.27 rows=4690356 width=33) (actual time=0.028..2763.383 rows=4690489 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=62684 read=60247
              ->  Index Scan Backward using _hyper_1_23_chunk_events_source_timestamp_idx on _hyper_1_23_chunk  (cost=0.43..147018.61 rows=4624741 width=33) (actual time=0.025..2823.549 rows=4624838 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=52489 read=59932
              ->  Index Scan Backward using _hyper_1_24_chunk_events_source_timestamp_idx on _hyper_1_24_chunk  (cost=0.43..151841.97 rows=4774145 width=33) (actual time=0.060..2819.668 rows=4776442 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=48917 read=62622
              ->  Index Scan Backward using _hyper_1_25_chunk_events_source_timestamp_idx on _hyper_1_25_chunk  (cost=0.43..141470.65 rows=4449075 width=32) (actual time=0.057..2554.301 rows=4451125 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=40129 read=58563
              ->  Index Scan Backward using _hyper_1_26_chunk_events_source_timestamp_idx on _hyper_1_26_chunk  (cost=0.43..133228.19 rows=4192152 width=32) (actual time=0.049..2467.922 rows=4192089 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=55849 read=37756
              ->  Index Scan Backward using _hyper_1_27_chunk_events_source_timestamp_idx on _hyper_1_27_chunk  (cost=0.43..93412.39 rows=2946643 width=33) (actual time=0.048..1754.935 rows=2946741 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=47717 read=23505
              ->  Index Scan Backward using _hyper_1_28_chunk_events_source_timestamp_idx on _hyper_1_28_chunk  (cost=0.43..107666.24 rows=3380023 width=33) (actual time=0.027..1896.157 rows=3380023 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=66850 read=23284
              ->  Index Scan Backward using _hyper_1_29_chunk_events_source_timestamp_idx on _hyper_1_29_chunk  (cost=0.43..139070.65 rows=4376229 width=33) (actual time=0.028..2469.809 rows=4376329 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=77118 read=31953
              ->  Index Scan Backward using _hyper_1_30_chunk_events_source_timestamp_idx on _hyper_1_30_chunk  (cost=0.43..133595.88 rows=4204134 width=33) (actual time=0.032..2862.879 rows=4204442 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=64898 read=33521
              ->  Index Scan Backward using _hyper_1_31_chunk_events_source_timestamp_idx on _hyper_1_31_chunk  (cost=0.43..139554.54 rows=4391212 width=33) (actual time=0.038..2431.625 rows=4391126 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=68406 read=34357
              ->  Index Scan Backward using _hyper_1_32_chunk_events_source_timestamp_idx on _hyper_1_32_chunk  (cost=0.43..78562.22 rows=2463045 width=33) (actual time=0.060..1396.815 rows=2482753 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=35270 read=20706
              ->  Index Scan Backward using _hyper_1_1_chunk_events_source_timestamp_idx on _hyper_1_1_chunk  (cost=0.14..5.59 rows=62 width=32) (actual time=0.030..0.095 rows=62 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=4 read=1
              ->  Index Scan Backward using _hyper_1_65_chunk_events_source_timestamp_idx on _hyper_1_65_chunk  (cost=0.15..16.92 rows=267 width=73) (actual time=0.042..0.043 rows=1 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=2 read=1
              ->  Index Scan Backward using _hyper_1_88_chunk_events_source_timestamp_idx on _hyper_1_88_chunk  (cost=0.15..12.90 rows=364 width=32) (actual time=0.022..0.140 rows=374 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=72
              ->  Index Scan Backward using _hyper_1_89_chunk_events_source_timestamp_idx on _hyper_1_89_chunk  (cost=0.14..6.48 rows=112 width=32) (actual time=0.019..0.053 rows=112 loops=1)
                    Index Cond: (source_timestamp > '2018-07-05 09:00:00.88+00'::timestamp with time zone)
                    Buffers: shared hit=9
        ->  Index Only Scan using subscription_signal_subscription_id_signal_id_uindex on subscription_signal  (cost=0.28..0.30 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=133615135)
              Index Cond: ((subscription_id = 11) AND (signal_id = _hyper_1_2_chunk.signal_id))
              Heap Fetches: 274
              Buffers: shared hit=267230544
Planning time: 144.220 ms
Execution time: 315933.896 ms

【问题讨论】:

  • 请显示EXPLAIN (ANALYZE, BUFFERS)查询的输出。
  • 劳伦兹您好,感谢您的回答。我更新了我的问题,包括分析,也将使用缓冲区运行
  • source_time desc 为什么是DESC
  • 据我回忆,那是 timescaledb 创建的默认索引。我可能是错的。您认为将其更改为 ASC 会产生巨大的影响吗?
  • 如果ASC“向后扫描”是向前扫描,我想...

标签: postgresql join indexing query-performance timescaledb


【解决方案1】:

events 没有被source_timestamp 分区,对吧?

PostgreSQL别无选择,只能找到source_timestamp > '2018-07-05 09:00:00.88+00'的所有行,然后用events连接结果。这永远不会很快。

如果省略 LIMIT 会使查询更快,您可能会通过使用嵌套循环连接将 PostgreSQL 推向 not 来获得一些好处,但考虑到所涉及的行数,这似乎不太可能.

如果表按source_timestamp 分区,查询可能会快得多。

【讨论】:

    【解决方案2】:

    尝试删除除(signal_id, source_time) 上的索引之外的所有索引并使用以下查询:

    SELECT *
    FROM events
    WHERE signal_id IN (
        SELECT signal_id
        FROM subscription_signal
        WHERE subscription_id = $1)
    AND source_time > $2
    ORDER BY source_time
    LIMIT $3
    

    那么 postgresql 应该使用剩余的索引来快速查找给定 signal_id 列表的行,然后使用索引的第二部分来查找带有 source_time > $2 的行。

    查询可能在以下情况下运行缓慢:

    • 如果匹配的行数过多(例如,超过表中总行数的 ~10%)。在这种情况下,postgresql 可能会选择全表扫描而不是索引查找。
    • 如果匹配行数适中 (iops 和读取带宽的限制。

    【讨论】:

      【解决方案3】:

      我不知道你是否已经解决了你的问题,但是当我遇到这个问题时,你可能想自己查看解释输出:https://explain.depesz.com/s/4GYT

      我不知道确切的原因(目前),但是查询规划器应该生成一个MergeAppend 而不是Append 执行节点,以便将LIMIT 下推到每个分区(索引)扫描中,如下所示我的测试设置:https://explain.depesz.com/s/Fi9C

      为了对比,同样的查询不限制显示我这里使用的测试数据量:https://explain.depesz.com/s/AJ6(是的,只有37M的中间行,但它显示了效果)

      对于不耐烦的数字:

                Without LIMIT: 49,174.946 milliseconds
      LIMIT 274 OFFSET 100000: 35,022.269 milliseconds
                    LIMIT 274:      2.710 milliseconds
      

      LIMIT 的不同值以及查询参数而不是常量不会导致任何不同的计划)

      MergeAppend 的优化自 9.1 版以来一直在 PostgreSQL 中(请参阅提交6fbc323),无论是否使用分区的时间刻度,但对于简单的Append 不存在,这告诉我这是这里的关键瓶颈。这反过来可能会受到非最新统计数据的影响(您在两个表上都做了analyze,或者?)。

      即将发布的 PostgreSQL 12 版本应该还通过提交 959d00eLIMIT ... OFFSET ... 下推到 @​​987654335@ 执行节点。

      OTOH,即使在具有 2 多亿行且每个分区有 10 多万行的 Timescale 优化表中,时间也可以大量用于 I/O。要对此进行测试,请执行以下操作并重新创建 EXPLAIN (ANALYZE, BUFFERS, COSTS OFF) 计划:

      SET track_io_timing TO ON;
      

      其中包括执行计划中每个相关节点的 I/O 操作的时间(总和),例如:

       Limit (actual time=1546.517..1546.572 rows=274 loops=1)
         Buffers: shared hit=73438 read=23452
         I/O Timings: read=1316.373
         ->  Sort (actual time=1546.512..1546.536 rows=274 loops=1)
               Sort Key: e.source_time
               Sort Method: top-N heapsort  Memory: 46kB
               Buffers: shared hit=73438 read=23452
               I/O Timings: read=1316.373
               ->  Nested Loop (actual time=7.147..1524.251 rows=69293 loops=1)
                     Buffers: shared hit=73438 read=23452
                     I/O Timings: read=1316.373
                     ->  Index Only Scan using subscription_signal_subscription_id_signal_id_uindex on subscription_signal (actual time=0.038..0.236 rows=114 loops=1)
                           Index Cond: (subscription_id = $1)
                           Heap Fetches: 114
                           Buffers: shared hit=4
                     ->  Append (actual time=0.334..13.184 rows=608 loops=114)
                           Buffers: shared hit=73434 read=23452
                           I/O Timings: read=1316.373
                           ->  Seq Scan on events e (actual time=0.000..0.000 rows=0 loops=114)
                                 Filter: ((source_time > $2) AND (subscription_signal.signal_id = signal_id))
                           ->  Index Scan using _hyper_5_13094_chunk_events_signal_id_source_time_index on _hyper_5_13094_chunk e_1 (actual time=0.014..0.014 rows=0 loops=114)
                                 Index Cond: ((signal_id = subscription_signal.signal_id) AND (source_time > $2))
                                 Buffers: shared hit=340 read=3
                                 I/O Timings: read=0.851
                           ->  Index Scan using _hyper_5_13095_chunk_events_signal_id_source_time_index on _hyper_5_13095_chunk e_2 (actual time=0.009..0.009 rows=0 loops=114)
                                 Index Cond: ((signal_id = subscription_signal.signal_id) AND (source_time > $2))
                                 Buffers: shared hit=335 read=7
                                 I/O Timings: read=0.398
      ...
      

      这对于检测查询实际上是 CPU 密集型还是 I/O 密集型非常有帮助(哪一个是更大的瓶颈,因此是改善情况的最佳投资)。

      减少中间结果行

      另外,最重要的部分是大幅减少中间行数。正如我们所见,您实际返回的行数约为符合条件的行数的 0.000205%,我强烈建议您重新考虑您的条件。

      在时间序列查询中,您几乎永远不会只使用下限进行过滤,而是提出一个合理的间隔,在 95% 的所有情况下,LIMIT 会返回足够的行,对于其他 5% 行数不够的情况,只需按照后续时间范围再次查询即可。

      所以,只需尝试以下查询:

      SELECT events.*
      FROM Events
         JOIN subscription_signal
            ON subscription_signal.subscription_id = $1 AND
               events.signal_id = subscription_signal.signal_id
      WHERE source_time > $2 AND source_time <= ($2 + '1 week'::interval)
      ORDER BY source_time ASC
      LIMIT $3
      

      仅此一项就可以大大减少查询时间。

      【讨论】:

        【解决方案4】:

        试试这样的:

        SELECT *
        FROM events
        WHERE signal_id IN ( SELECT signal_id
                             FROM subscription_signal
                            WHERE subscription_id = $1
                              AND source_time > $2 )
        ORDER BY source_time ASC
        LIMIT $3
        

        【讨论】:

        • 感谢您的回复 - 此查询现在似乎也运行了 5 分钟以上。如果我在 IN 数组中放入一个信号 ID,它会非常快(~100 毫秒) - 但是当我添加大约 100 时,它会变得非常慢。
        • 是的,我看不出这个查询应该如何使它更快。
        猜你喜欢
        • 1970-01-01
        • 2021-06-23
        • 2012-10-13
        • 2013-08-17
        • 2011-03-16
        • 2011-08-19
        • 1970-01-01
        • 2018-10-06
        • 1970-01-01
        相关资源
        最近更新 更多