Python Pandas 遍历列并根据应用条件更新列答案

【问题标题】：Python Pandas Iterate over columns and also update columns based on apply conditionsPython Pandas 遍历列并根据应用条件更新列
【发布时间】：2020-06-05 23:23:36
【问题描述】：

我正在尝试根据连续列值更新数据框列。
如果列说 col1 和 col2 具有 >0 和

数据框看起来像：

id  col0    col1    col2    col3    col4    col5 col6   col7    col8    col9    col10
1   0   5   -5  5   -5  0 0 1   4   3   -3 
2   0   0   0   0   0   0 0 4   -4  0   0 
3   0   0   1   2   3   0 0 0   5   6   0

应用逻辑后所需的数据框：

id  col0    col1    col2    col3    col4    col6    col7    col8    col9    col10   fix
1   0   0   0   0   0   0   0 1 4   0   0 0 3
2   0   0   0   0   0   0   0 0 0   0   0 0 1
3   0   0   1   2   3   0   0 0 5   6   0 9 0

我试过了：

def fix_count(row):
    row['fix_cnt'] = 0

    for i in range(0, 10):
        if ((row['col' + str(i)] > 0) & 
            (row['col' + str(i + 1)] < 0)):

            row['col' + str(i + 1)] = row['col' + str(i)] + row['col' + str(i + 1)]
            row['col' + str(i)] = 0

            row['fix_cnt'] += 1

            return (row['col' + str(i)],
                    row['col' + str(i + 1)],
                    row['fix_cnt'])

df.apply(fix_count, axis=1)

失败了IndexError: index 11 is out of bounds for axis 0 with size 11.

我也查看了df.iteritems，但我找不到路。

DDL 生成 DataFrame：

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],
                   'col0': [0, 0, 0],
                   'col1': [5, 0, 0],
                   'col2': [-5, 0, 1],
                   'col3': [5, 0, 2],
                   'col4': [-5, 0, 3],
                   'col5' : [0, 0, 0],
                   'col6': [0, 0, 0],
                   'col7': [1, 4, 0],
                   'col8': [4, -4, 5],
                   'col9': [3, 0, 6],
                   'col10': [-3, 0, 0]})

谢谢！

【问题讨论】：

您的数据中没有“col5”，但您正在使用 range(0,10) 进行迭代，其中循环中包含 5。
感谢指出，对不起，我错过了添加col5。更新了 col5 的查询。
感谢 anky，它的工作就像一个魅力，并且没有循环方法..只是好奇..它是如何使用循环方法完成的？

标签： python pandas dataframe

【解决方案1】：

这是一种没有循环的方法：

c = df.gt(0) & df.shift(-1,axis=1).lt(0)

out = (df.mask(c.shift(axis=1).fillna(False),df+df.shift(axis=1))
      .mask(c,0).assign(Fix=c.sum(1)))

print(out)

   id  col0  col1  col2  col3  col4  col6  col7  col8  col9  col10  Fix
0   1     0     0     0     0     0     0     1     4     0      0    3
1   2     0     0     0     0     0     0     0     0     0      0    1
2   3     0     0     1     2     3     0     0     5     6      0    0

详情：

c 检查当前列是否 > 0 且下一列是否
将当前列添加到下一列中的下一列到c 所在的位置没错。
如果 c 为 True，则将当前列设置为 0。
获取更改的 c 总和完成。

【讨论】：

感谢 anky，它的工作就像一个魅力，并且没有循环方法..只是好奇..它是如何使用循环方法完成的？
@Anku 1：我在使用 pandas 时从没想过循环（尽可能采用矢量化方法） 2：你的逻辑正如 LazyCoder 指出的那样应该有效，但我不会太热衷于使用 apply 因为它只是一个 for 循环的包装器，如果你必须使用循环，你应该首先检查你是否可以使用 numpy+numba
当然很讨厌！我将研究 numpy + numba 的详细信息。
@Anku 我仍然不确定为什么首先会循环，但不管怎样都行

【解决方案2】：

您的代码逻辑很好。从函数返回行时只需稍作修正即可按预期工作：

def fix_count(row):
    row['fix_cnt'] = 0

    for i in range(0, 10):
        if ((row['col' + str(i)] > 0) & 
            (row['col' + str(i + 1)] < 0)):

            row['col' + str(i + 1)] = row['col' + str(i)] + row['col' + str(i + 1)]
            row['col' + str(i)] = 0

            row['fix_cnt'] += 1

    return (row)

df.apply(fix_count, axis=1)

试试这个，让我知道这是否有效！

【讨论】：

是的！所以我只是错过了最后一部分（返回行）。