【问题标题】:Incorrect populating of materialized view物化视图填充不正确
【发布时间】:2020-11-10 08:13:45
【问题描述】:

“test_sessions”表

CREATE TABLE IF NOT EXISTS test_sessions (
    id UInt64,
    name String,
    created_at DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY name;

“test_sessions”表数据

INSERT INTO test_sessions(id, name, created_at) VALUES
(1, 'start', now()),
(1, 'stop',  now() + INTERVAL 1 day),
(2, 'start',  now() + INTERVAL 1 HOUR );

+----+-------+---------------------+
| id | name  | created_at          |
+----+-------+---------------------+
| 1  | start | 2020-11-10 07:58:19 |
+----+-------+---------------------+
| 2  | start | 2020-11-10 08:58:19 |
+----+-------+---------------------+
| 1  | stop  | 2020-11-11 07:58:19 |
+----+-------+---------------------+

“finished_sessions”物化视图

CREATE MATERIALIZED VIEW finished_sessions (
    id UInt64,
    start_at DateTime,
    end_at DateTime
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(start_at)
ORDER BY (id)
POPULATE AS
SELECT
    id,
    minIf(created_at, name = 'start') AS start_at,
    maxIf(created_at, name = 'stop')  AS end_at
FROM test_sessions
GROUP BY id
HAVING end_at <> '1970-01-01 00:00:00';

“finished_sessions”物化视图数据

SELECT * FROM finished_sessions;

+----+---------------------+---------------------+
| id | start_at            | end_at              |
+----+---------------------+---------------------+
| 1  | 2020-11-10 07:58:19 | 2020-11-11 07:58:19 |
+----+---------------------+---------------------+

到目前为止,一切正常:只有 1 个已关闭会话

第二次会议结束后

INSERT INTO test_sessions(id, name, created_at) VALUES
(2, 'stop', now())

发生错误填充

SELECT * from finished_sessions ORDER BY id;

+----+-------------------------------+---------------------+
| id | start_at                      | end_at              |
+----+-------------------------------+---------------------+
| 1  | 2020-11-10 07:58:19           | 2020-11-11 07:58:19 |
+----+-------------------------------+---------------------+
| 2  | ---> 1970-01-01 00:00:00 <--- | 2020-11-10 08:06:24 |
+----+-------------------------------+---------------------+

如何解决?

【问题讨论】:

    标签: clickhouse


    【解决方案1】:
    1. 你应该使用 AggregateFunction 或更好的 SimpleAggregateFunction

    2. 不能通过 AggregateFunction 对表进行分区。因为 AggregateFunction 是在合并期间计算的,而合并是在分区上执行的。

    3. MV 是一个插入触发器。 https://youtu.be/ckChUkC3Pns?list=PLO3lfQbpDVI-hyw4MyqxEk3rDHw95SzxJhttps://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

    如果不存在则创建表 test_sessions ( 标识 UInt64, 名称字符串, created_at 日期时间 ) 引擎 = 合并树() PARTITION BY toYYYYMM(created_at) 按名称订购; 插入到 test_sessions(id, name, created_at) 值 (1, '开始', now()), (1, '停止', now() + INTERVAL 1 天), (2, '开始', now() + INTERVAL 1 HOUR ); 创建物化视图 finished_sessions 引擎 = 聚合合并树 订购人 (id) 填充为 选择 ID, minStateIf(created_at, name = 'start') AS start_at, maxStateIf(created_at, name = 'stop') AS end_at FROM test_sessions 按 ID 分组 插入到 test_sessions(id, name, created_at) 值 (2, '停止', now()); 选择 ID, minMerge(start_at), 最大合并(end_at) FROM finished_sessions 按 ID 分组 查询编号:d797eee4-6088-40b8-aa12-b10da62b60c5 ┌─id─┬──minMerge(start_at)─┬────maxMerge(end_at)─┐ │ 2 │ 2020-11-10 15:18:19 │ 2020-11-10 14:21:54 │ │ 1 │ 2020-11-10 14:18:19 │ 2020-11-11 14:18:19 │ └────┴────────────────────┴────────────────────┘ 如果不存在则创建表 test_sessions ( 标识 UInt64, 名称字符串, created_at 日期时间 ) 引擎 = 合并树() PARTITION BY toYYYYMM(created_at) 按名称订购; 插入到 test_sessions(id, name, created_at) 值 (1, '开始', now()), (1, '停止', now() + INTERVAL 1 天), (2, '开始', now() + INTERVAL 1 HOUR ); 创建物化视图 finished_sessions ( 标识 UInt64, start_at SimpleAggregateFunction(min,DateTime), end_at SimpleAggregateFunction(max,DateTime) ) 引擎 = 聚合合并树 订购人 (id) 填充为 选择 ID, minIf(created_at, name = 'start') AS start_at, maxIf(created_at, name = 'stop') AS end_at FROM test_sessions 按 ID 分组; 插入到 test_sessions(id, name, created_at) 值 (2,'停止',现在()) 优化表finished_sessions final; 选择 ID, 分钟(开始时间), 最大值(end_at) FROM finished_sessions 按 ID 分组 ┌─id─┬───────min(start_at)─┬──────────max(end_at)─┐ │ 2 │ 1970-01-01 00:00:00 │ 2020-11-10 14:29:30 │ │ 1 │ 2020-11-10 14:29:15 │ 2020-11-11 14:29:15 │ └────┴────────────────────┴────────────────────┘

    【讨论】:

    • 做出了同样的决定:mv body AggregateFunction(minIf, DateTime, UInt8),mv select minIfState(created_at, name = 'start')。在提供的示例中“更好的 SimpleAggregateFunction”工作不正确,你知道为什么吗?
    猜你喜欢
    • 2023-03-23
    • 2018-12-28
    • 2016-06-24
    • 2021-11-19
    • 1970-01-01
    • 2015-09-20
    • 1970-01-01
    • 1970-01-01
    • 2019-12-25
    相关资源
    最近更新 更多