非常有趣的问题! :)
step-by-step demo:db<>fiddle
SELECT
the_date,
SUM(balance)
FROM (
SELECT DISTINCT ON (the_date, elems -> 'the_user')
the_date,
elems ->> 'the_user' AS the_user,
(elems ->> 'balance')::int AS balance
FROM (
SELECT
the_date::date AS the_date,
jsonb_agg(
row_to_json(mytable)::jsonb
) OVER (ORDER BY the_date) as agg
FROM
mytable
) s,
jsonb_array_elements(agg) as elems
ORDER BY the_date, elems -> 'the_user', elems -> 'the_date' DESC
) s
GROUP BY the_date
构思草图:
1 累计汇总所有记录。 (为了以后能够访问每一列,这些记录在查询中存储为 JSON 对象)。
这会产生
date data cum_data
Day1 (A:100) [(A:100)]
Day1 (B:50) [(A:100),(B:50)],
Day1 (C:100) [(A:100),(B:50),(C:100)],
Day2 (A:150) [(A:100),(B:50),(C:100),(A:150)],
Day2 (B:20) [(A:100),(B:50),(C:100),(A:150),(B:20)]
您可以看到,每天的最后一条记录包含所有相关数据。每个用户的相关数据是它在数组中的最后一个元素。
2 因此,在此之后,您对 (1.) 每天的最后一条记录以及在此 (2.) 每个用户的最后一条记录感兴趣。所以,你必须先扩展记录:
date cum_data expansion
Day1 [(A:100)] (A:100)
Day1 [(A:100),(B:50)], (A:100)
(B:50)
Day1 [(A:100),(B:50),(C:100)], (A:100) <- last A day1
(B:50) <- last B day1
(C:100) <- last C day1
Day2 [(A:100),(B:50),(C:100),(A:150)], (A:100)
(B:50)
(C:100)
(A:150)
Day2 [(A:100),(B:50),(C:100),(A:150),(B:20)] (A:100)
(B:50)
(C:100) <- last C day2 (unchanged)
(A:150) <- last A day2 (changed)
(B:20) <- last B day2 (changed)
3 因此,下一步是获取每个用户每天的最后一次出现次数。这可以通过DISTINCT ON 完成,它获取有序组的第一条记录。在您的情况下,该组是(date, user),并且顺序是按用户的日期DESC。当然,用户的日期存储在 json 中。所以(A:100) 实际上是(A:100, day1) 而(A:150) 是(A:150, day2)。顺序是由第二个元素决定的。当然,要先获取最新的,顺序需要降序。
这会产生
date cum_data expansion
Day1 [(A:100),(B:50),(C:100)], (A:100) <- last A day1
(B:50) <- last B day1
(C:100) <- last C day1
Day2 [(A:100),(B:50),(C:100),(A:150),(B:20)] (C:100) <- last C day2 (unchanged)
(A:150) <- last A day2 (changed)
(B:20) <- last B day2 (changed)
4这最后可以简单的归为date列:
date sum
Day1 (A:100) + (B:50) + (C:100) = 250
Day2 (C:100) + (A:150) + (B:20) = 270
当然,对于大型数据集,累积的性能会非常低。在这种情况下,我建议编写一个遍历所有记录的简单函数;像这样:
date list := empty list of (date, balance)
user list := empty list of (user, balance)
for all records:
get current date
if current date <> previous date
add element (previous date, sum(all balances in user list)) to date list
get current user
if current user already exists in user list then
replace its balance
else
add current user to user list
return date list
编辑:这是一种可能的功能(比查询快得多)。它完全遵循给定的伪代码。这只是第一次抛出,我相信你可以优化代码,所以也请看这个草图:
demo:db<>fiddle
CREATE OR REPLACE FUNCTION foobar() RETURNS SETOF record
AS $$
DECLARE
_record record;
_date_rec record;
_prev_date date;
_user_balance int;
_date_balance int;
BEGIN
CREATE TEMP TABLE user_recs (the_user text, balance int);
FOR _record IN
SELECT * FROM mytable ORDER BY the_date
LOOP
IF (_prev_date IS NOT NULL AND (_record.the_date::date > _prev_date )) THEN
SELECT
SUM(ur.balance)
FROM
user_recs ur
INTO _date_balance;
_date_rec = (_prev_date , _date_balance);
RETURN NEXT _date_rec;
END IF;
SELECT balance FROM user_recs ur WHERE ur.the_user = _record.the_user
INTO _user_balance;
IF _user_balance IS NULL THEN
INSERT INTO user_recs VALUES (_record.the_user, _record.balance);
ELSE
UPDATE user_recs ur SET balance = _record.balance WHERE ur.the_user = _record.the_user;
END IF;
_prev_date = _record.the_date;
END LOOP;
RETURN QUERY
SELECT
_prev_date,
SUM(ur.balance)::int
FROM
user_recs ur;
END;
$$ LANGUAGE 'plpgsql'