如何替换循环数组中的子字符串 [pandas]答案

【问题标题】：how to replace a substring in a loop array [pandas]如何替换循环数组中的子字符串 [pandas]
【发布时间】：2019-03-16 00:47:21
【问题描述】：

我有以下数据集：

test_column

AB124
3847937BB
HP111
PG999-HP222
1222HP
HP3333-22HP
111HP3939DN

我想使用以下逻辑：

在测试列中查找所有字母
如果该字母字符串的长度大于 2 并且如果该字符串中有“HP”的实例，则将其从字符串的其余部分中删除一次。
如果该字母字符串的长度大于 2，并且该字符串中没有“HP”实例，则保留整个字符串。
如果该字母字符串的长度小于或等于 2，则保留整个字符串。

所以我想要的输出应该是这样的：

desired_column

AB
BB
HP
PG
HP
HP
DN

我正在尝试循环，但未能成功生成所需的结果。

for index,row in df.iterrows():
target_value = row['test_column']     #array
predefined_code = ['HP']      #array     
for code in re.findall("[a-zA-Z]+", target_value):  #find all alphabets in the target_column
    if (len(code)>2) and not (code in predefined_code):
        possible_code = code
    if (len(code)>2) and (code in predefined_code):
        possible_code = possible_code.Select(code.replace(predefined_code,'',1))
    if (len(code)<=2):
        possible_code = code

【问题讨论】：

标签： python arrays pandas loops for-loop

【解决方案1】：

由于案例互斥且完整，逻辑可以简化为

"对于长度 > 2 且包含 'HP' 的字母子串，删除第一个 'HP'，否则保持子串原样。"

首先使用正则表达式去除每个字符串的非字母部分，然后使用简单的 if-else 语句实现逻辑。

import pandas as pd
import re

df= pd.DataFrame({'test_column': ['AB124','3847937BB','HP111','PG999-HP222','1222HP','HP3333-22HP','111HP3939DN']})

for index,row in df.iterrows():
    target_value = row['test_column']     #array
    regex = re.compile("[^A-Z]")
    code = regex.sub('',target_value)

    if len(code) > 2 and 'HP' in code:
        possible_code = code.replace('HP','',1)
    else:
        possible_code = code
    print(possible_code)

根据需要给予：

AB
BB
HP
PG
HP
HP
DN

【讨论】：