设置数据框列值的 if-else 逻辑答案

【问题标题】：If-else logic to set value of dataframe column设置数据框列值的 if-else 逻辑
【发布时间】：2021-12-03 16:04:20
【问题描述】：

我在数据框(df) 中有数据，类似于下面的结构

ID	Sessions
1234	400
5678	200
9101112	199
13141516	0

我想在数据框中创建一个新列 (new_col)，根据会话值对每个示例进行排名，但我想确保排名中不考虑 0 个会话/清零。

我已尝试应用下面的 lambda，但这不正确：

df['new_col'] = df['Sessions'].apply(lambda x: 0 if x == 0 else df['Sessions'].rank(ascending=True, pct=True))

采样期望的输出

ID	Sessions	new_col
1234	400	1.000000
5678	200	0.999987
9101112	199	0.999974
13141516	0	0

【问题讨论】：

您能再添加几行示例数据吗？ “每个会话值的每个示例”是什么意思？每个ID都是这样吗？预期的输出会很有用
@EmiOB 刚刚添加到我的原始帖子中 - 排名功能对我有用 (df['Sessions'].rank(ascending=True, pct=True)) 我只是想确保是否有 0 个会话 new_col 值为 0 - 现在不是发生

标签： python pandas dataframe lambda

【解决方案1】：

类似的东西？：

df['new_col'] = df.loc[df.Sessions > 0, 'Sessions'].rank(ascending=True, pct=True)

或

df['new_col'] = df['Sessions'].replace(0, np.NaN).rank(pct=True,).fillna(0)

【讨论】：

我使用了第二个选项并且不得不修改，因为我的 0 实际上是一个 str :\。 testx_df['Sessions'].replace('0', np.NaN).rank(ascending=True, pct=True).fillna(0)谢谢！

【解决方案2】：

如果你想要一个安全的切片，assign 就是你的朋友。试试这个。

df.assign(newcol=lambda d: (
    d["Sessions"] # grab the series
    .replace(0, np.NaN) # replace the 0s with NaNs
    .rank(pct=True, ) # rank as percentages
    .fillna(0) # fill zeros back in.
   )
)

此外，通过这种方式，您可以巧妙地将管道包装在一个函数中。

【讨论】：

这看起来很酷，但是当我测试它时，我不断收到错误 - 我认为这是我试图在我的代码中应用它的方式testx_df['new_col'] = testx_df.assign(new_col=lambda d: (d["Sessions"].replace(0, np.NaN).rank(pct=True, ).fillna(0)))
@hansolo。是的。 df.assign(..) 添加新列并返回整个数据框。所以，你不需要df['new_col'] = 部分，只需复制粘贴上面的sn-p即可。