如何循环遍历熊猫数据框并在条件下修改值？答案

【问题标题】：How to loop through pandas dataframe and modify value under condition?如何循环遍历熊猫数据框并在条件下修改值？
【发布时间】：2019-04-27 00:29:56
【问题描述】：

我有这个熊猫数据框：

df = pd.DataFrame(
    {
    "col1": [1,1,2,3,3,3,4,5,5,5,5]
    }
)
df

如果 col1 中的值不等于下一行中 col1 的值，我想添加另一列显示“last”。它应该是这样的：

到目前为止，如果 col1 中的值不等于下一行中 col1 的值，我可以创建一个包含 True 的列；否则为 False：

df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

现在是这样的

df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

会很好，但这显然是错误的语法。我怎样才能做到这一点？

最后，我还想添加数字来指示一个值在此之前出现了多少次，而最后一个值始终标记为“last”。它应该是这样的：

我不确定这是否是我发展的另一个步骤，或者这是否需要一种新的方法。我读到如果我想在修改值时循环遍历数组，我应该使用 apply()。但是，我不知道如何在其中包含条件。你能帮帮我吗？

非常感谢！

【问题讨论】：

对于它的价值，通常不建议在 Pandas 数据帧中混合类型（在这种情况下为字符串和 int）。这样你会损失很多性能。
对于第一部分，您已经很接近了，因为您已经构建了一个布尔系列。构造一个空列，现在你可以这样做：df['last'][df['col1'] != df['last_row']] = 'last'.

标签： python pandas

【解决方案1】：

这是一种方法。您可以根据col1中的下一个值是否与当前行相同，定义自定义分组器，并取DataFrameGroupBy.cumsum，获取累积计数。然后使用df.shift 使用类似的条件添加last：

g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'

 col1 update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

【讨论】：

效果很好，谢谢！如g = df.col1，可以删除第1行，将第2行替换为df['update'] = df.groupby(df.col1).cumcount()

【解决方案2】：

考虑到索引是增量的，(1)cuncount每组，然后在每组内取(2)max索引并设置字符串

group = df.groupby('col1')

df['last'] = group.cumcount()
df.loc[group['last'].idxmax(), 'last'] = 'last'
#or df.loc[group.apply(lambda x: x.index.max()), 'last'] = 'last'


    col1    last
0   1   0
1   1   last
2   2   last
3   3   0
4   3   1
5   3   last
6   4   last
7   5   0
8   5   1
9   5   2
10  5   last

【讨论】：

我将此作为公认的答案，因为它对我来说是最直接的。谢谢！
type(df.last) 告诉我这个专栏是method。如何将其转换为 pandas.core.series.Series（就像 col1 一样）？
@Julian He 是一个系列，试试type(df['last'])

【解决方案3】：

使用.shift 查找变化的地方。然后您可以使用.where 适当地屏蔽然后.fillna

s = df.col1 != df.col1.shift(-1)
df['Update'] = df.groupby(s.cumsum().where(~s)).cumcount().where(~s).fillna('last')

输出：

    col1 Update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

顺便说一句，update 是 DataFrames 的一种方法，因此您应该避免将列命名为 'update'

【讨论】：

工作正常！ where(~s)到底做了什么？

【解决方案4】：

另一种可能的解决方案。

df['update'] = np.where(df['col1'].ne(df['col1'].shift(-1)), 'last', 0)

【讨论】：