【发布时间】:2026-02-12 13:45:02
【问题描述】:
我想用 ES 来计算用户留存率:
- 1、事件日志到默认索引
- 2、转化为中间索引:以实体为中心的数据,按acc分组
- 3、使用aggs过滤器(或adjacency_matrix)计算每天的相交结果。
问题出在第 2 步:如何生成漂亮的变换
输入事件日志:
POST _bulk
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"create", "timestamp":"2020-08-01 09:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-03 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"create", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"login", "timestamp":"2020-08-02 11:00"}
{"index": {"_index": "test.u1"}}
{"acc":1003, "event":"create", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"create", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"login", "timestamp":"2020-08-03 10:00"}
期望中间索引:
{"acc":1001, "create":"08-01", "login":[08-01, 08-02, 08-03]}
{"acc":1002, "create":"08-01", "login":[08-02]}
{"acc":1003, "create":"08-01", "login":[]}
{"acc":1004, "create":"08-02", "login":[08-02, 08-03]}
如何生成"login"数组? 或者任何更好的设计都是受欢迎的。
【问题讨论】:
-
你在使用 x-pack 转换模块吗? elastic.co/guide/en/elasticsearch/reference/current/…
-
@SahilGupta 是的。 “创建”日期很简单:aggs.filter("event=login").min()
-
中间数据很简单。我不太明白你的第三步,也许你可以用这个?elastic.co/guide/en/elasticsearch/reference/current/…
标签: arrays elasticsearch transform aggregation