熊猫连胜计数器答案

【问题标题】：Pandas streak counter for a die熊猫连胜计数器
【发布时间】：2017-09-05 23:48:16
【问题描述】：

我正在尝试做一些与this post 非常相似的事情。除非我有死亡的结果，例如1-6，我需要计算骰子所有可能值的条纹。

import numpy as np
import pandas as pd

data = [5,4,3,6,6,3,5,1,6,6]
df = pd.DataFrame(data, columns = ["Outcome"])
df.head(n=10)

def f(x):

    x['c'] = (x['Outcome'] == 6).cumsum()
    x['a'] = (x['c'] == 1).astype(int)
    x['b'] = x.groupby( 'c' ).cumcount()

    x['streak'] = x.groupby( 'c' ).cumcount() + x['a']

    return x

df = df.groupby('Outcome', sort=False).apply(f)

print(df.head(n=10))

   Outcome  c  a  b  streak
0        5  0  0  0       0
1        4  0  0  0       0
2        3  0  0  0       0
3        6  1  1  0       1
4        6  2  0  0       0
5        3  0  0  1       1
6        5  0  0  1       1
7        1  0  0  0       0
8        6  3  0  0       0
9        6  4  0  0       0

我的问题是“c”不正常。每次连胜时它都应该“重置”其计数器，否则 a 和 b 将不正确。

理想情况下，我想要像

这样优雅的东西

def f(x):
    x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() + 
                  ( (x['stat'] != 0).cumsum() == 0).astype(int) 
    return x

按照链接帖子中的建议。

【问题讨论】：

你能添加想要的输出吗？

标签： python pandas

【解决方案1】：

这里有一个cumsum 和cumcount 的解决方案，如上所述，但不像预期的那样“优雅”（即不是单线）。

我首先标记连续值，给出“块”数字：

In [326]: df['block'] = (df['Outcome'] != df['Outcome'].shift(1)).astype(int).cumsum()

In [327]: df
Out[327]: 
   Outcome  block
0        5      1
1        4      2
2        3      3
3        6      4
4        6      4
5        3      5
6        5      6
7        1      7
8        6      8
9        6      8

因为我现在知道何时出现重复值，所以我只需要为每个组递增地计算它们：

In [328]: df['streak'] = df.groupby('block').cumcount()

In [329]: df
Out[329]: 
   Outcome  block  streak
0        5      1       0
1        4      2       0
2        3      3       0
3        6      4       0
4        6      4       1
5        3      5       0
6        5      6       0
7        1      7       0
8        6      8       0
9        6      8       1

如果您想从 1 开始计数，请随时在最后一行添加 + 1。

【讨论】：