循环遍历 pandas 的数据框行并比较列上的值答案

【问题标题】：Looping through pandas' dataframe rows and compare values on columns循环遍历 pandas 的数据框行并比较列上的值
【发布时间】：2018-10-16 14:18:22
【问题描述】：

提前非常感谢。我是python新手，这比我想象的要难。

我有一个数组 [m, n]，其中 m 是玩家的姓名 (0-9)，n 是年份 (A-E)。我已经用“1”标记了每一行，表示该球员是否被带入了之前的团队比赛（如果没有，则为“0”）。鉴于此，我想创建一些分组/类。

  A B C D E
0 1 0 0 1 0
1 1 0 1 0 0
2 0 0 1 1 1
3 1 1 1 1 1
4 0 1 1 0 0
5 0 1 1 1 0
6 1 1 0 1 1
7 0 0 0 0 1
8 1 0 1 1 0
9 1 1 0 1 1

对类的一些修改： 1. 首次入队 (a) 2. 仍然在 tema 并且在过去几年中连续 (b) 3. 仍在团队中，但时断时续 (c) 4. 不在团队中，但之前去过那里 (d) 5. 从未进入团队 (e)

这个想法是为每个类提供函数，然后将它们编译成一个函数。

例如，下面是 Class a 的示例代码：

class_a=[]
for (i, row) in test.iterrows():
    if (test.iloc[i, -1]==1):
        if (test.iloc[i, 0:-2].sum(axis=0))==0:
            class_a.append('Yes')

但是，b 类的示例代码有点困难：

test1=[]
count=0

for (i, row) in test.iterrows():
    row = test.iloc[i, 0:-1]
    for j in range(0, len(row)-1):
        if row[j]>=row[j+1]:
            print(i, row[j], row[j+1], 'Yes')
            count+=1
print(count)

当我打印 i、row[j] 和 row[j+1] 的结果时，我得到以下不正确的值。我推断行值的交互不正确，因为我缺少跨行的索引（j 值）。计数似乎没问题（即使计数错误）：

0 1 0 Yes
0 0 0 Yes
1 1 0 Yes
1 1 0 Yes
2 0 0 Yes
2 1 1 Yes
3 1 1 Yes
3 1 1 Yes
3 1 1 Yes
4 1 1 Yes
4 1 0 Yes
5 1 1 Yes
5 1 1 Yes
6 1 1 Yes
6 1 0 Yes
7 0 0 Yes
7 0 0 Yes
7 0 0 Yes
8 1 0 Yes
8 1 1 Yes
9 1 1 Yes
9 1 0 Yes
22

非常感谢任何帮助和指点。我确实想过使用数组，但它很复杂，因为值在每一行上，而不是在列上。我想错了吗？非常感谢！

【问题讨论】：

标签： python arrays pandas numpy

【解决方案1】：

一种方法是使用 pandas DataFrame.apply。对于每个组您首先创建一个函数，根据他的历史记录告诉您玩家是否在组中，然后您将这个函数应用于每一行。例如，对于您的第一个示例，您可以定义：

def first_time_in_team(series):
    return( (series.iloc[:-1].max()==0) and (series.iloc[-1]==1))

如果球员在今年之前不在球队中并且今年在球队中，则此函数返回 True，在其他情况下返回 false。然后你会这样做：

group_first_time = df.apply(first_time_in_team,axis = 1)

其中 df 是包含您的数组的数据框。这将为您提供一个序列，其中包含玩家列表的索引和一个值为 True 的列，如果玩家是该组，否则为 false。然后，您可以调整第一个函数以适应每个组。

【讨论】：

谢谢，这是一种无需循环的好方法（这是我的第一选择）。但是，最后一列 [-1] 中的所有行都是错误的。
这里是代码： series = Team_df.iloc[:,1:-1] def first_time_in_team(series): return((series.iloc[:-1].max()==0 ) 和 (series.iloc[-1]==1)) group_first_time = Team_df.apply(first_time_in_team, axis=1) group_first_time
我想知道代码中某处的索引是否缺少 , 。不应该是：series.iloc[:, :-1].max()==0。有点困惑。谢谢！