基于每组天数的条件答案

【问题标题】：Conditions based on days per groups基于每组天数的条件
【发布时间】：2018-04-25 22:05:21
【问题描述】：

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
1  2002-01-12 2018-04-25 11:00:00    John   6  25
2  2002-01-13 2018-04-25 09:00:00    John   5  30
3  2002-01-13 2018-04-25 11:00:00    John -25   5
4  2002-01-14 2018-04-25 11:00:00    John   1   6
5  2002-01-14 2018-04-25 12:00:00    John  44  50
6  2002-01-25 2018-04-25 11:00:00  George  18  18
7  2002-01-25 2018-04-25 12:00:00  George  12  30
8  2002-01-26 2018-04-25 11:00:00  George  -8  22
9  2002-01-26 2018-04-25 12:00:00  George -10  12
10 2002-01-27 2018-04-25 10:00:00  George  13  25
11 2002-01-27 2018-04-25 11:00:00  George   1  26

df['A'] = df['A'].apply(pd.to_datetime)
df['B'] = df['B'].apply(pd.to_datetime)
df["E"] = df.groupby("C")["D"].cumsum()

我想为每个C 组选择一行，并带有下一个条件：

在E>=20 和B==11:00:00 的第一行，从每个C 组的第二个A 天开始申请。
如果不存在任何满足该条件的行，则取该C 组的第一行。

输出应该是：

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
8  2002-01-26 2018-04-25 11:00:00  George  -8  22

我试过了：

def eleven(g):
    cond = g[g.B==time(11)].E.ge(20)
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[1]

r = df.groupby('C', as_index=False).apply(eleven)

【问题讨论】：

标签： python pandas conditional

【解决方案1】：

我认为需要将条件与链条件进行比较E 比较，第二组通过A 使用factorize，第二组使用>0：

def eleven(g):
    cond = (g.B.dt.hour==11) & (g.E.ge(20) & pd.factorize(g.A)[0]) > 0
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[0]

r = df.groupby('C', as_index=False, sort=False).apply(eleven)
print (r)
           A                   B       C   D   E
0 2002-01-12 2018-04-25 10:00:00    John  19  19
1 2002-01-26 2018-04-25 11:00:00  George  -8  22

【讨论】：

但是这个输出不满足第一个条件：“在E>=20和B==11:00:00的第一行，从每个C组的第二天开始应用。” 0 2002-01-12 2018-04-25 11:00:00 John 6 25 属于A 组的第一天C。
@Tie_24 - 刚刚意识到。
完美运行！我不知道factorize 功能。非常感谢耶兹瑞尔！！
@Tie_24 - 很高兴能帮上忙！