优化for循环的处理？答案

【问题标题】：Optimise processing of for loop?优化for循环的处理？
【发布时间】：2021-05-04 19:49:32
【问题描述】：

我有这个基本数据框：

     dur    type    src    dst
0     0     new     543     1
1     0     new     21      1
2     1     old     4828    2
3     0     new     321     1
...
(total 450000 rows)

我的目标是根据值将 src 中的值替换为 0、1 或 2。我在下面创建了一个 for 循环/if else：

for i in df['src']:
    if i <= 1000:
        df['src'].replace(to_replace = [i], value = [1], inplace = True)
    elif i <= 2500:
        df['src'].replace(to_replace = [i], value = [2], inplace = True)
    elif i <= 5000:
        df['src'].replace(to_replace = [i], value = [3], inplace = True)
    else:
        print('End!')

上述工作按预期工作，但尝试用 450000 行替换整个数据框非常慢（这需要 30 多分钟！）。

有没有更 Pythonic 的方式来加速这个算法？

【问题讨论】：

使用嵌套的where 或cut

标签： python python-3.x pandas for-loop if-statement

【解决方案1】：

尝试numpy.select，适用于多种情况：

cond1 = df.src.le(1000)
cond2 = df.src.le(2500)
cond3 = df.src.le(5000)

condlist = [cond1, cond2, cond3]
choicelist = [1, 2, 3]
df.assign(src=np.select(condlist, choicelist))

    dur     type    src     dst
0   0   new     1   1
1   0   new     1   1
2   1   old     3   2
3   0   new     1   1

【讨论】：

【解决方案2】：

我没有对此进行测试，但我认为这应该可以工作

pd.cut(df.src,  [0, 1000, 2500, 5000], labels=[1,2,3] )

【讨论】：