按日期获取独特的观察结果很容易:
daily_observations = df.groupby(level=0).ID.apply(lambda s: s.unique())
>>> daily_observations
2015-01-01 [1, 2, 3]
2015-01-02 [1, 2, 4]
2015-01-03 [1, 3, 5]
Name: ID, dtype: object
积累它们更加困难:
>>> pd.Series({idx: list(set().union(*[v.tolist()
for v in daily_observations.loc[daily_observations.index <= idx].values]))
for idx in daily_observations.index})
2015-01-01 [1, 2, 3]
2015-01-02 [1, 2, 3, 4]
2015-01-03 [1, 2, 3, 4, 5]
dtype: object
该部分代码拉取所有出现的列表的累积列表。
>>> [daily_observations.loc[daily_observations.index <= idx].values for idx in daily_observations.index]
[array([array([1, 2, 3])], dtype=object),
array([array([1, 2, 3]), array([1, 2, 4])], dtype=object),
array([array([1, 2, 3]), array([1, 2, 4]), array([1, 3, 5])], dtype=object)]
然后将其从数组转换为每个索引值的列表列表:
>>> [[v.tolist() for v in daily_observations.loc[daily_observations.index <= idx].values] for idx in daily_observations.index]
[[[1, 2, 3]], [[1, 2, 3], [1, 2, 4]], [[1, 2, 3], [1, 2, 4], [1, 3, 5]]]
这些列表列表然后通过联合转换为集合,然后再转换回列表。
>>> [list(set().union(*[v.tolist()
for v in daily_observations.loc[daily_observations.index <= idx].values]))
for idx in daily_observations.index]
[[1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
然后将上述所有内容包装在字典理解中,然后用于构造系列。